Promql3
–
PromQL Workshop – Part 3: Advanced Patterns and Use Cases
Welcome to Part 3 of the PromQL Workshop! This section covers advanced query patterns, real-world use cases, and tips for building powerful dashboards and alerts.
1. Math Between Series
PromQL allows you to perform mathematical operations between different time series. This is useful for calculating ratios, percentages, or combining metrics for deeper insights.
Supported Operators
Operator | Description | Example Usage |
---|---|---|
+ |
Addition | metric_a + metric_b |
- |
Subtraction | metric_a - metric_b |
* |
Multiplication | metric_a * metric_b |
/ |
Division | metric_a / metric_b |
% |
Modulo (remainder) | metric_a % metric_b |
^ |
Exponentiation | metric_a ^ 2 |
== |
Equal | metric_a == metric_b |
!= |
Not equal | metric_a != metric_b |
> |
Greater than | metric_a > metric_b |
< |
Less than | metric_a < metric_b |
>= |
Greater or equal | metric_a >= metric_b |
<= |
Less or equal | metric_a <= metric_b |
Examples
- Addition:
http_requests_total{job="api"} + http_requests_total{job="frontend"}
Sums requests from two jobs.
- Add all equally-labelled series from both sides:
metric_a + metric_b
Adds all series from
metric_a
andmetric_b
that have exactly the same set of labels. - Add series, matching only on the instance and job labels:
metric_a + on(instance, job) metric_b
Adds series from
metric_a
andmetric_b
where theinstance
andjob
labels match, ignoring other labels. - Add series, ignoring the instance and job labels for matching:
metric_a + ignoring(instance, job) metric_b
Adds series from
metric_a
andmetric_b
matching on all labels exceptinstance
andjob
. - Explicitly allow many-to-one matching:
metric_a + on(instance) group_left metric_b
Allows matching multiple
metric_a
series to a singlemetric_b
series based on theinstance
label. - Include the version label from “one” (right) side in the result:
metric_a + on(instance) group_left(version) metric_b
Adds series matching on
instance
and includes theversion
label frommetric_b
in the result. - Division (Ratio):
sum(rate(http_requests_total{status="500"}[5m])) / sum(rate(http_requests_total[5m]))
Calculates the error rate ratio.
- Multiplication (Scaling):
rate(cpu_seconds_total[5m]) * 100
Converts CPU usage to percentage.
- Comparison:
http_requests_total > 1000
Returns series where the value is greater than 1000.
Note:
When combining series, PromQL matches series by their labels. Use on()
or ignoring()
to control label matching if needed.
2. Set Operations
PromQL supports set operations that allow you to compare and filter time series based on their labels. These operations are useful for finding overlaps, differences, or intersections between sets of series.
Supported Set Operators
Operator | Description | Example Usage |
---|---|---|
and |
Intersection: series present in both sides | up and http_requests_total |
or |
Union: series present in either side | up or http_requests_total |
unless |
Exclusion: series on left, not present on right | up unless http_requests_total |
Examples
- Intersection (
and
):up and http_requests_total
Returns series that exist in both
up
andhttp_requests_total
with matching labels. - Union (
or
):up or http_requests_total
Returns all series from both
up
andhttp_requests_total
. - Exclusion (
unless
):up unless http_requests_total
Returns series from
up
that do not have a matching series inhttp_requests_total
.
Note:
Set operators match series by their labels. You can use on()
or ignoring()
to control which labels are used for matching, similar to arithmetic
3. Subqueries
Subqueries in PromQL allow you to run a query over the result of another query, enabling more advanced and flexible analysis. They are especially useful for calculating trends, aggregations, or transformations over dynamically generated data.
Syntax
<query>[<range>:<resolution>]
<range>
: The lookback window for each evaluation point. It defines how much historical data is considered for each calculation. For example,[5m]
means each data point is calculated using the last 5 minutes of data.<resolution>
: The step between each evaluation point within the subquery range. For example,:1m
means the subquery will return a value every 1 minute within the specified range.
Example:
rate(http_requests_total[5m])[30m:5m]
- Here,
rate(http_requests_total[5m])
is evaluated every 5 minutes (:5m
) over the last 30 minutes ([30m]
), using a 5-minute lookback window for each calculation.
This allows you to control both how much data is used for each calculation and how frequently those calculations are performed within the time window.
Examples
- Calculate the 5-minute rate of requests, then view its trend over the last 30 minutes at 5-minute intervals:
rate(http_requests_total[5m])[30m:5m]
Returns the 5-minute rate of
http_requests_total
at every 5-minute step, for the last 30 minutes. - Aggregate over a subquery:
sum(rate(http_requests_total[1m])[10m:1m])
For each minute in the last 10 minutes, calculates the per-minute rate and then sums across all series.
- Apply a function to a subquery:
max_over_time(rate(cpu_seconds_total[1m])[15m:1m])
For each minute in the last 15 minutes, finds the maximum rate of CPU usage.
- Nested subqueries for advanced analysis:
avg_over_time( sum(rate(http_requests_total[1m]))[30m:5m] [2h:30m])
Calculates the average of the sum of 1-minute request rates, sampled every 30 minutes, over the last 2 hours.
Note:
Subqueries are powerful for dashboarding and alerting, but can be computationally expensive. Use them thoughtfully to
4. Manipulating Labels
PromQL provides several operators and functions to manipulate labels in your queries. This is useful for renaming, grouping, or filtering series based on label values, and for shaping your query results for dashboards and alerts.
Label Modifiers in Binary Operations
- on()
Specifies which labels must match for the operation to be applied.metric_a + on(instance, job) metric_b
Only series with matching
instance
andjob
labels are combined. - ignoring()
Specifies which labels to ignore when matching series.metric_a + ignoring(instance) metric_b
Combines series, ignoring the
instance
label. - group_left / group_right
Allows many-to-one or one-to-many matching and can include extra labels from one side.metric_a + on(instance) group_left(version) metric_b
Combines series matching on
instance
and adds theversion
label frommetric_b
to the result.
Label Replacement with label_replace()
The label_replace()
function lets you create or modify labels using regular expressions.
label_replace(up, "new_label", "$1", "instance", "(.*):.*")
Extracts the value before the colon in the
instance
label and stores it in a new label callednew_label
.
Changing Label Sets with aggregation
Aggregation operators like sum
, avg
, min
, max
, etc., allow you to group or remove labels:
sum(rate(http_requests_total[5m])) by (job)
Sums the rate of requests, grouping by the
job
label.
sum(rate(http_requests_total[5m])) without (instance)
Sums the rate of requests, removing the
instance
label from the result.
Dropping or Keeping Labels
- by(): Keeps only the listed labels.
- without(): Drops the listed labels.
Examples:
avg by (job, datacenter) (rate(http_requests_total[5m]))
Averages the rate, keeping only
job
anddatacenter
labels.
sum without (instance) (rate(http_requests_total[5m]))
Sums the rate, dropping the
instance
label.
Tip:
Manipulating labels is essential for shaping your PromQL results for meaningful dashboards, alerts,
5. Dealing with Missing Data
In real-world scenarios, time series data may have gaps due to network issues, service downtime, or scraping failures. PromQL provides several strategies and functions to help you handle missing data gracefully.
Functions for Handling Missing Data
- absent()
Returns a series with value 1 if the input series does not exist, or no output if it does.absent(up{job="api"})
Returns 1 if there are no
up
series for theapi
job. - absent_over_time()
Checks for the absence of data over a range vector.absent_over_time(up{job="api"}[5m])
Returns 1 if there was no data for
up{job="api"}
in the last 5 minutes. - or operator
You can use theor
operator to provide fallback values when data is missing.(up == 1) or on(instance) vector(0)
Returns 1 if
up
is present, otherwise 0 for missing instances.
Filling Gaps and Smoothing Data
- increase() and rate()
These functions interpolate missing points within the range window, smoothing over short gaps.rate(http_requests_total[5m])
Handles short gaps by interpolating between available points.
- last_over_time()
Returns the last value in a range, which can be used to fill gaps in dashboards.last_over_time(up[10m])
Shows the last known value in the last 10 minutes.
Best Practices
- Use
absent()
andabsent_over_time()
for alerting on missing data. - Use the
or
operator to provide default or fallback values in dashboards. - For visualization, consider using functions like
last_over_time()
to avoid gaps in graphs.
Tip:
Always consider the impact of missing data on your alerts and dashboards, and use these tools to make your monitoring more
6. Recording Rules
Recording rules let you precompute frequently used or expensive queries and save them as new time series.
Example
groups:
- name: example.rules
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
This creates a new metric
job:http_requests:rate5m
for use in dashboards and alerts.
7. Alerting Rules
Alerting rules evaluate queries and trigger alerts when conditions are met.
Example
groups:
- name: example.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status="500"}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
Triggers an alert if the error rate exceeds 0.05 requests/sec for 5 minutes.
8. Combining Metrics
You can combine different metrics using arithmetic and logical operators.
Example: Error Rate Percentage
100 * sum(rate(http_requests_total{status="500"}[5m]))
/ sum(rate(http_requests_total[5m]))
Calculates the percentage of requests that resulted in errors.
9. Useful Dashboard Patterns
- Current value:
http_requests_total
- Rate over time:
rate(http_requests_total[5m])
- Rolling average:
avg_over_time(rate(http_requests_total[1m])[30m:1m])
- Percentile latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
10. Tips and Best Practices
- Use
rate()
for counters, notgauge
metrics. - Always specify a range for rate functions (e.g.,
[5m]
). - Use
by
clauses to group results for dashboards. - Precompute expensive queries with recording rules.
- Use subqueries for advanced time-based analysis.