Metrics

Metric nodes represent an aggregation of a measure defined as a single expression in a query that selects from a single source, transform, or dimension node.

Attribute	Description	Type
name	Unique name used by other nodes to select from this node	string
display_name	A human readable name for the node	string
description	A human readable description of the node	string
mode	`published` or `draft` (see Node Mode)	string
query	A SQL query that selects a single expression from a single node	string

Creating Metric Nodes

curl

curl -X POST http://localhost:8000/nodes/metric/ \
-H 'Content-Type: application/json' \
-d '{
    "name": "default.num_repair_orders",
    "description": "Number of repair orders",
    "mode": "published",
    "query": "SELECT count(repair_order_id) as num_repair_orders FROM default.repair_orders"
}'

python

from datajunction import DJBuilder, NodeMode
dj = DJBuilder(DJ_URL)

metric = dj.create_metric(
    name="default.num_repair_orders",
    description="Number of repair orders",
    query="SELECT count(repair_order_id) FROM repair_orders",
    mode=NodeMode.PUBLISHED,  # for draft nodes, use `mode=NodeMode.DRAFT`
)
print(metric.name)

javascript

dj.metrics.create(
    {
        name: "default.num_repair_orders",
        description: "Number of repair orders",
        mode: "published",
        query: `
            SELECT
            count(repair_order_id) as num_repair_orders
            FROM default.repair_orders
        `
    }
).then(data => console.log(data))

Supported Aggregation Functions

DJ supports a wide range of SQL aggregation functions for defining metrics. These functions are categorized by their aggregability - whether they can be pre-aggregated for efficient cube materialization.

Fully Aggregatable Functions

These functions can be pre-aggregated at any grain level, enabling efficient cube materializations.

SUM

Calculates the sum of values.

SELECT SUM(total_repair_cost) FROM default.repair_orders

COUNT

Counts the number of rows.

SELECT COUNT(repair_order_id) FROM default.repair_orders

COUNT_IF

Counts rows that match a condition.

SELECT COUNT_IF(status = 'completed') FROM default.repair_orders

MIN

Returns the minimum value.

SELECT MIN(order_date) FROM default.repair_orders

MAX

Returns the maximum value.

SELECT MAX(total_repair_cost) FROM default.repair_orders

AVG

Calculates the average. Internally decomposed into SUM and COUNT for pre-aggregation.

SELECT AVG(total_repair_cost) FROM default.repair_orders

ANY_VALUE

Returns any value from the group.

SELECT ANY_VALUE(category) FROM default.repair_orders

APPROX_COUNT_DISTINCT

Approximate distinct count using HyperLogLog sketches. Useful for large datasets where exact distinct counts are expensive.

SELECT APPROX_COUNT_DISTINCT(customer_id) FROM default.repair_orders

Statistical Functions

These functions compute statistical measures and are fully decomposable for pre-aggregation.

VAR_POP / VAR_SAMP / VARIANCE

Population variance, sample variance (with Bessel’s correction), and VARIANCE (alias for VAR_SAMP).

SELECT VAR_POP(total_repair_cost) FROM default.repair_orders
SELECT VAR_SAMP(total_repair_cost) FROM default.repair_orders
SELECT VARIANCE(total_repair_cost) FROM default.repair_orders

STDDEV_POP / STDDEV_SAMP / STDDEV

Population standard deviation, sample standard deviation, and STDDEV (alias for STDDEV_SAMP).

SELECT STDDEV_POP(total_repair_cost) FROM default.repair_orders
SELECT STDDEV_SAMP(total_repair_cost) FROM default.repair_orders
SELECT STDDEV(total_repair_cost) FROM default.repair_orders

COVAR_POP / COVAR_SAMP

Population and sample covariance between two variables.

SELECT COVAR_POP(labor_cost, parts_cost) FROM default.repair_orders
SELECT COVAR_SAMP(labor_cost, parts_cost) FROM default.repair_orders

CORR

Pearson correlation coefficient between two variables.

SELECT CORR(labor_cost, parts_cost) FROM default.repair_orders

Non-Aggregatable Functions

These functions require access to the full dataset and cannot be pre-aggregated in cubes.

MAX_BY / MIN_BY

Returns a value from the row with the maximum/minimum value of another column.

SELECT MAX_BY(technician_name, repair_count) FROM default.repair_orders
SELECT MIN_BY(technician_name, repair_count) FROM default.repair_orders

⚠️

Metrics using MAX_BY or MIN_BY cannot be included in materialized cubes because they require access to individual rows.

DISTINCT Aggregations

Adding DISTINCT to any aggregation function makes it limited aggregatable - it can only be pre-aggregated at or below the grain of the distinct columns.

SELECT COUNT(DISTINCT customer_id) FROM default.repair_orders
SELECT SUM(DISTINCT order_amount) FROM default.repair_orders

👉

For large-scale approximate distinct counts, prefer APPROX_COUNT_DISTINCT which uses HyperLogLog sketches and is fully aggregatable.

Derived Metrics

Derived metrics reference other metrics, allowing you to build complex calculations from simpler components. They automatically inherit the components from their base metrics, enabling proper decomposition for cube materialization.

Simple Derived Metrics

The simplest derived metrics combine existing metrics with arithmetic operations.

-- Average order value from two base metrics
SELECT default.total_revenue / default.total_orders AS average_order_value

Cross-Fact Ratio Metrics

These metrics have numerator and denominator from different fact tables. This is useful for metrics like conversion rates or efficiency ratios.

Example: Order Conversion Rate

-- Metric 1: Count of completed orders (from orders fact)
SELECT COUNT(order_id) FROM default.completed_orders

-- Metric 2: Count of page visits (from visits fact)
SELECT COUNT(visit_id) FROM default.page_visits

-- Derived metric: Conversion rate
SELECT default.completed_order_count / default.page_visit_count AS conversion_rate

⚠️

Cross-fact ratio metrics require that both underlying facts share common dimensions for the metric to be meaningful. DJ will join the facts on their shared dimensional grain.

Period-Over-Period Metrics

Period-over-period metrics compare values across time periods, such as week-over-week or year-over-year growth. Since DJ metrics cannot contain WHERE clauses, use conditional aggregation or window functions.

Example: Week-Over-Week Revenue Growth (using conditional aggregation)

Create metrics that filter time periods within the SELECT clause:

-- Current week revenue
SELECT SUM(CASE WHEN order_date >= DATE_SUB(CURRENT_DATE, 7) THEN revenue ELSE 0 END)
FROM default.sales

-- Previous week revenue
SELECT SUM(CASE WHEN order_date >= DATE_SUB(CURRENT_DATE, 14)
                AND order_date < DATE_SUB(CURRENT_DATE, 7) THEN revenue ELSE 0 END)
FROM default.sales

Then create the growth metric:

-- Week-over-week growth rate
SELECT (default.current_week_revenue - default.previous_week_revenue)
       / NULLIF(default.previous_week_revenue, 0) AS wow_revenue_growth

Example: Year-Over-Year Using Window Functions

For YoY comparisons with a date dimension, use LAG:

SELECT (revenue - LAG(revenue, 1) OVER (ORDER BY year))
       / NULLIF(LAG(revenue, 1) OVER (ORDER BY year), 0) AS yoy_revenue_growth
FROM default.yearly_revenue

👉

Period-over-period metrics require careful handling. Use conditional aggregation (CASE WHEN) to filter time periods within the SELECT clause, or use window functions (LAG, LEAD) when working with time-series data.

Trailing N-Day Metrics

Trailing metrics aggregate data over a rolling window, such as trailing 7-day or 30-day averages. These are useful for smoothing out daily fluctuations.

Example: Trailing 7-Day Average Revenue

-- Daily revenue metric
SELECT SUM(revenue) FROM default.sales

-- Trailing 7-day average (as a derived metric with window logic)
SELECT AVG(default.daily_revenue) OVER (
  ORDER BY date_dim
  ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS trailing_7d_avg_revenue

Example: Trailing 30-Day Total Orders

SELECT SUM(default.daily_order_count) OVER (
  ORDER BY date_dim
  ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS trailing_30d_orders

👉

Trailing metrics typically require a time dimension in your cube and use SQL window functions. The window size (7 days, 30 days, etc.) determines how much historical data is included in each calculation.

Combining Derived Metric Patterns

You can combine these patterns to create sophisticated metrics:

-- Trailing 7-day conversion rate (combines trailing window + cross-fact ratio)
SELECT default.trailing_7d_orders / default.trailing_7d_visits AS trailing_7d_conversion_rate

-- YoY growth of category share (combines period-over-period + share of total)
SELECT (default.current_year_category_share - default.previous_year_category_share)
AS category_share_yoy_change

Share of total metrics calculate a proportion where the numerator is grouped by a dimension while the denominator represents the ungrouped total. This pattern requires special handling because DJ computes all metrics at the same grain.

⚠️

DJ doesn’t automatically compute the numerator and denominator at different grains from the same fact. The workarounds below may be needed.

👉

Native support for ratio metrics at different grains is planned. Track progress at GitHub Issue #1695.

Option 1: Conditional Aggregation with Fixed Filter

Create a metric for a specific category’s share:

-- Electronics share of total revenue (hardcoded category)
SELECT SUM(CASE WHEN category = 'Electronics' THEN revenue ELSE 0 END) / SUM(revenue)
FROM default.sales

This approach requires creating a separate metric for each category you want to track.

Option 2: Application Layer

Create separate metrics and compute the ratio at the application layer:

-- Metric 1: Revenue (will be grouped by category when queried)
SELECT SUM(revenue) FROM default.sales

-- Metric 2: Total revenue (query separately without category dimension)
SELECT SUM(revenue) FROM default.sales

Then divide in your application code after querying each metric at its appropriate grain.

Option 2: Pre-computed Totals

If you have a transform node that pre-computes totals, you can reference it:

-- Assuming default.sales_with_totals has both line-level and total columns
SELECT SUM(revenue) / MAX(total_revenue) FROM default.sales_with_totals

Conditional Aggregations

You can use CASE WHEN expressions inside aggregation functions to create metrics that only aggregate values meeting certain conditions.

Example: Revenue from Completed Orders Only

SELECT SUM(CASE WHEN status = 'completed' THEN revenue ELSE 0 END)
FROM default.orders

Example: Completion Rate

SELECT SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) / COUNT(*)
FROM default.orders

Example: Average for High-Value Orders

SELECT AVG(CASE WHEN order_value > 100 THEN order_value ELSE NULL END)
FROM default.orders

👉

For simple conditional counts, prefer COUNT_IF which is more readable: COUNT_IF(status = 'completed')

Metric Metadata

Beyond the SQL query, metrics support additional metadata to improve discoverability and usability.

Required Dimensions

Some metrics only make sense when grouped by specific dimensions. You can specify required dimensions that must be included when querying the metric.

{
  "name": "default.market_share",
  "query": "SELECT SUM(revenue) / SUM(total_market_revenue) FROM default.sales",
  "required_dimensions": ["default.product.category", "default.date.quarter"]
}

When a metric has required dimensions, DJ will enforce that these dimensions are included in any query using this metric.

Metric Direction

Indicate whether higher or lower values are “better” for the metric. This helps consumers understand how to interpret changes.

Direction	Description	Example Metrics
`higher_is_better`	Increasing values are positive	Revenue, Conversion Rate, NPS
`lower_is_better`	Decreasing values are positive	Churn Rate, Error Rate, Latency
`neutral`	Direction doesn’t indicate good/bad	Count of Users, Average Order Size

{
  "name": "default.customer_churn_rate",
  "query": "SELECT ...",
  "direction": "lower_is_better"
}

Metric Units

Specify the unit of measure for the metric to help consumers interpret values correctly.

Unit	Abbreviation	Description
`percentage`	%	Values from 0 to 100
`proportion`		Values from 0 to 1
`dollar`	$	Currency (USD)
`second`	s	Time duration
`minute`	m	Time duration
`hour`	h	Time duration
`day`	d	Time duration
`week`	w	Time duration
`unitless`		No specific unit

{
  "name": "default.avg_response_time",
  "query": "SELECT AVG(response_time_ms) / 1000 FROM default.requests",
  "unit": "second"
}

Metrics

Creating Metric Nodes #

Supported Aggregation Functions #

Fully Aggregatable Functions #

SUM #

COUNT #

COUNT_IF #

MIN #

MAX #

AVG #

ANY_VALUE #

APPROX_COUNT_DISTINCT #

Statistical Functions #

VAR_POP / VAR_SAMP / VARIANCE #

STDDEV_POP / STDDEV_SAMP / STDDEV #

COVAR_POP / COVAR_SAMP #

CORR #

Non-Aggregatable Functions #

MAX_BY / MIN_BY #

DISTINCT Aggregations #

Derived Metrics #

Simple Derived Metrics #

Cross-Fact Ratio Metrics #

Period-Over-Period Metrics #

Trailing N-Day Metrics #

Combining Derived Metric Patterns #

Share of Total Metrics (Limited Support) #

Conditional Aggregations #

Metric Metadata #

Required Dimensions #

Metric Direction #

Metric Units #