Promql3
–
PromQL Workshop – Part 3: Advanced Patterns and Use Cases
Welcome to Part 3 of the PromQL Workshop! This section covers advanced query patterns, real-world use cases, and tips for building powerful dashboards and alerts.
1. Math Between Series
PromQL allows you to perform mathematical operations between different time series. This is useful for calculating ratios, percentages, or combining metrics for deeper insights.
Supported Operators
| Operator | Description | Example Usage |
|---|---|---|
+ |
Addition | metric_a + metric_b |
- |
Subtraction | metric_a - metric_b |
* |
Multiplication | metric_a * metric_b |
/ |
Division | metric_a / metric_b |
% |
Modulo (remainder) | metric_a % metric_b |
^ |
Exponentiation | metric_a ^ 2 |
== |
Equal | metric_a == metric_b |
!= |
Not equal | metric_a != metric_b |
> |
Greater than | metric_a > metric_b |
< |
Less than | metric_a < metric_b |
>= |
Greater or equal | metric_a >= metric_b |
<= |
Less or equal | metric_a <= metric_b |
Examples
- Addition:
http_requests_total{job="api"} + http_requests_total{job="frontend"}Sums requests from two jobs.
- Add all equally-labelled series from both sides:
metric_a + metric_bAdds all series from
metric_aandmetric_bthat have exactly the same set of labels. - Add series, matching only on the instance and job labels:
metric_a + on(instance, job) metric_bAdds series from
metric_aandmetric_bwhere theinstanceandjoblabels match, ignoring other labels. - Add series, ignoring the instance and job labels for matching:
metric_a + ignoring(instance, job) metric_bAdds series from
metric_aandmetric_bmatching on all labels exceptinstanceandjob. - Explicitly allow many-to-one matching:
metric_a + on(instance) group_left metric_bAllows matching multiple
metric_aseries to a singlemetric_bseries based on theinstancelabel. - Include the version label from “one” (right) side in the result:
metric_a + on(instance) group_left(version) metric_bAdds series matching on
instanceand includes theversionlabel frommetric_bin the result. - Division (Ratio):
sum(rate(http_requests_total{status="500"}[5m])) / sum(rate(http_requests_total[5m]))Calculates the error rate ratio.
- Multiplication (Scaling):
rate(cpu_seconds_total[5m]) * 100Converts CPU usage to percentage.
- Comparison:
http_requests_total > 1000Returns series where the value is greater than 1000.
Note:
When combining series, PromQL matches series by their labels. Use on() or ignoring() to control label matching if needed.
2. Set Operations
PromQL supports set operations that allow you to compare and filter time series based on their labels. These operations are useful for finding overlaps, differences, or intersections between sets of series.
Supported Set Operators
| Operator | Description | Example Usage |
|---|---|---|
and |
Intersection: series present in both sides | up and http_requests_total |
or |
Union: series present in either side | up or http_requests_total |
unless |
Exclusion: series on left, not present on right | up unless http_requests_total |
Examples
- Intersection (
and):up and http_requests_totalReturns series that exist in both
upandhttp_requests_totalwith matching labels. - Union (
or):up or http_requests_totalReturns all series from both
upandhttp_requests_total. - Exclusion (
unless):up unless http_requests_totalReturns series from
upthat do not have a matching series inhttp_requests_total.
Note:
Set operators match series by their labels. You can use on() or ignoring() to control which labels are used for matching, similar to arithmetic
3. Subqueries
Subqueries in PromQL allow you to run a query over the result of another query, enabling more advanced and flexible analysis. They are especially useful for calculating trends, aggregations, or transformations over dynamically generated data.
Syntax
<query>[<range>:<resolution>]
<range>: The lookback window for each evaluation point. It defines how much historical data is considered for each calculation. For example,[5m]means each data point is calculated using the last 5 minutes of data.<resolution>: The step between each evaluation point within the subquery range. For example,:1mmeans the subquery will return a value every 1 minute within the specified range.
Example:
rate(http_requests_total[5m])[30m:5m]
- Here,
rate(http_requests_total[5m])is evaluated every 5 minutes (:5m) over the last 30 minutes ([30m]), using a 5-minute lookback window for each calculation.
This allows you to control both how much data is used for each calculation and how frequently those calculations are performed within the time window.
Examples
- Calculate the 5-minute rate of requests, then view its trend over the last 30 minutes at 5-minute intervals:
rate(http_requests_total[5m])[30m:5m]Returns the 5-minute rate of
http_requests_totalat every 5-minute step, for the last 30 minutes. - Aggregate over a subquery:
sum(rate(http_requests_total[1m])[10m:1m])For each minute in the last 10 minutes, calculates the per-minute rate and then sums across all series.
- Apply a function to a subquery:
max_over_time(rate(cpu_seconds_total[1m])[15m:1m])For each minute in the last 15 minutes, finds the maximum rate of CPU usage.
- Nested subqueries for advanced analysis:
avg_over_time( sum(rate(http_requests_total[1m]))[30m:5m] [2h:30m])Calculates the average of the sum of 1-minute request rates, sampled every 30 minutes, over the last 2 hours.
Note:
Subqueries are powerful for dashboarding and alerting, but can be computationally expensive. Use them thoughtfully to
4. Manipulating Labels
PromQL provides several operators and functions to manipulate labels in your queries. This is useful for renaming, grouping, or filtering series based on label values, and for shaping your query results for dashboards and alerts.
Label Modifiers in Binary Operations
- on()
Specifies which labels must match for the operation to be applied.metric_a + on(instance, job) metric_bOnly series with matching
instanceandjoblabels are combined. - ignoring()
Specifies which labels to ignore when matching series.metric_a + ignoring(instance) metric_bCombines series, ignoring the
instancelabel. - group_left / group_right
Allows many-to-one or one-to-many matching and can include extra labels from one side.metric_a + on(instance) group_left(version) metric_bCombines series matching on
instanceand adds theversionlabel frommetric_bto the result.
Label Replacement with label_replace()
The label_replace() function lets you create or modify labels using regular expressions.
label_replace(up, "new_label", "$1", "instance", "(.*):.*")
Extracts the value before the colon in the
instancelabel and stores it in a new label callednew_label.
Changing Label Sets with aggregation
Aggregation operators like sum, avg, min, max, etc., allow you to group or remove labels:
sum(rate(http_requests_total[5m])) by (job)
Sums the rate of requests, grouping by the
joblabel.
sum(rate(http_requests_total[5m])) without (instance)
Sums the rate of requests, removing the
instancelabel from the result.
Dropping or Keeping Labels
- by(): Keeps only the listed labels.
- without(): Drops the listed labels.
Examples:
avg by (job, datacenter) (rate(http_requests_total[5m]))
Averages the rate, keeping only
jobanddatacenterlabels.
sum without (instance) (rate(http_requests_total[5m]))
Sums the rate, dropping the
instancelabel.
Tip:
Manipulating labels is essential for shaping your PromQL results for meaningful dashboards, alerts,
5. Dealing with Missing Data
In real-world scenarios, time series data may have gaps due to network issues, service downtime, or scraping failures. PromQL provides several strategies and functions to help you handle missing data gracefully.
Functions for Handling Missing Data
- absent()
Returns a series with value 1 if the input series does not exist, or no output if it does.absent(up{job="api"})Returns 1 if there are no
upseries for theapijob. - absent_over_time()
Checks for the absence of data over a range vector.absent_over_time(up{job="api"}[5m])Returns 1 if there was no data for
up{job="api"}in the last 5 minutes. - or operator
You can use theoroperator to provide fallback values when data is missing.(up == 1) or on(instance) vector(0)Returns 1 if
upis present, otherwise 0 for missing instances.
Filling Gaps and Smoothing Data
- increase() and rate()
These functions interpolate missing points within the range window, smoothing over short gaps.rate(http_requests_total[5m])Handles short gaps by interpolating between available points.
- last_over_time()
Returns the last value in a range, which can be used to fill gaps in dashboards.last_over_time(up[10m])Shows the last known value in the last 10 minutes.
Best Practices
- Use
absent()andabsent_over_time()for alerting on missing data. - Use the
oroperator to provide default or fallback values in dashboards. - For visualization, consider using functions like
last_over_time()to avoid gaps in graphs.
Tip:
Always consider the impact of missing data on your alerts and dashboards, and use these tools to make your monitoring more
6. Recording Rules
Recording rules let you precompute frequently used or expensive queries and save them as new time series.
Example
groups:
- name: example.rules
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
This creates a new metric
job:http_requests:rate5mfor use in dashboards and alerts.
7. Alerting Rules
Alerting rules evaluate queries and trigger alerts when conditions are met.
Example
groups:
- name: example.rules
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status="500"}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
Triggers an alert if the error rate exceeds 0.05 requests/sec for 5 minutes.
8. Combining Metrics
You can combine different metrics using arithmetic and logical operators.
Example: Error Rate Percentage
100 * sum(rate(http_requests_total{status="500"}[5m]))
/ sum(rate(http_requests_total[5m]))
Calculates the percentage of requests that resulted in errors.
9. Useful Dashboard Patterns
- Current value:
http_requests_total - Rate over time:
rate(http_requests_total[5m]) - Rolling average:
avg_over_time(rate(http_requests_total[1m])[30m:1m]) - Percentile latency:
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
10. Tips and Best Practices
- Use
rate()for counters, notgaugemetrics. - Always specify a range for rate functions (e.g.,
[5m]). - Use
byclauses to group results for dashboards. - Precompute expensive queries with recording rules.
- Use subqueries for advanced time-based analysis.