Is there a templating feature in Alert? That is, I'd like to be able to send an alert with additional information such as the source that triggers this alert. I will be receiving metrics from sources/hosts and each of these metrics will have the same name (e.g. number-of-open-files) but tagged differently to distinguish the application (e.g. process-name). I have the alert query that will let me know if any application has too many open files but I can't find a way to deliver other type of information in the alert.
Hi kuncoro_salim,
The functionality of highlighting specific time series that triggered an alert actually already exists in our default alert template, and your query would only need a small tweak to group by the source name first in addition to grouping by process name:
avg(ts(number-of-open-files, source='*'), sources, process_name) > 512
And the email you're going to receive when your alert fires would look something like this:
OPENED
Alert: Too many open files by a single process
Condition: avg(ts(number-of-open-files, source='*'), sources, process_name) > 512
Created: <date/time>
URL: <url to a chart with alert details>
Affected since: <date/time>
Event started: <date/time>
Sources/Labels Affected:
[source=server1][process_name=process-1]
[source=server1][process_name=process-2]
Hope this answers your question!
-Vasily
Hi Salim,
The alert created will pass the source labels in the alert itself. For example, here is a json response schema of an alert which provides information on the fired alert
{ "customerTagsWithCounts": {}, "userTagsWithCounts": {}, "created": 0, "name": "string", "conditionQBEnabled": false, "conditionQBSerialization":"string", "displayExpressionQBEnabled": false, "displayExpressionQBSerialization": "string", "condition": "string", "displayExpression": "string", "minutes": 0, "resolveAfterMinutes": 0, "target": "string", "snoozed": 0, "event": { "name": "string", "startTime": 0, "endTime": 0, "annotations": {}, "hosts": [ "string" ], "summarizedEvents": 0, "tags": [ "string" ], "isUserEvent": false, "isEphemeral": false, "table":"string" }, "failingHostLabelPairs": [ { "host": "string", "label": "string", "tags": {}, "observed": 0, "firing": 0 } ], "updated": 0, "severity": "INFO", "queryFailing": false, "additionalInformation": "string", "activeMaintenanceWindows": [ "string" ], "inMaintenanceHostLabelPairs": [ { "host": "string", "label": "string", "tags": {}, "observed": 0, "firing": 0 } ], "prefiringHostLabelPairs": [ { "host": "string", "label": "string", "tags": {}, "observed": 0, "firing": 0 } ], "alertStates": [ "string" ], "lastFailedTime": 0, "lastErrorMessage": "string", "inTrash": false, "numMetricsUsed": 0, "numHostsUsed": 0 }
In your case, you will get the source information on the "number-of-open-files" metric when this alert fires.
Hope this helps and don't hesitate to reach out if you have any further questions.
Parag
When you say "response schema of an alert", does it mean after I receive the alert, I can call an API to get detailed information about the alert? If so, that's not what I am looking for.
For example, I have this Condition for an Alert:
avg(ts(number-of-open-files, source='*'), process_name) > 512
How do I let the Alert carry information on which source or sources triggered the alert (process_name having more than 512 open files)?
Do I need to set something in the Display Expression field? Or in the Additional Information field?
I apologize if I misunderstood.
Hi kuncoro_salim,
The functionality of highlighting specific time series that triggered an alert actually already exists in our default alert template, and your query would only need a small tweak to group by the source name first in addition to grouping by process name:
avg(ts(number-of-open-files, source='*'), sources, process_name) > 512
And the email you're going to receive when your alert fires would look something like this:
OPENED
Alert: Too many open files by a single process
Condition: avg(ts(number-of-open-files, source='*'), sources, process_name) > 512
Created: <date/time>
URL: <url to a chart with alert details>
Affected since: <date/time>
Event started: <date/time>
Sources/Labels Affected:
[source=server1][process_name=process-1]
[source=server1][process_name=process-2]
Hope this answers your question!
-Vasily
Yes, that did it. So I had to modify the alerting query itself to include the desired tag. Thanks!