В очередной раз пришлось настраивать сбор метрик с Qrator, прошлая моя заметка на этот счет жила в виде Issue в репозитории StupidScience/qrator-exporter (в проекте используются deprecated-методы), но там она пропала, поэтому опишу здесь, чтобы уж точно не потерялось.
Сбор данных будет осуществляться через telegraf и, с помощью него же, отдаваться в виде метрик формата Prometheus.
Для начала потребуется получить API-токен для получения данных из Qrator, для этого переходим в раздел с ключами в личном кабинете и выпускаем токен.
Далее переходим в список доменов и сохраняем их идентификаторы, по ним будет обращение к методам API:
Здесь 11111 и 11222 – как раз те самые идентификаторы доменов, теперь описываем конфигурацию для телеграфа:
[[inputs.http]]
name_prefix = "qrator_blocks_"
method = "POST"
urls = [
"https://api.qrator.net/request/domain/11111",
"https://api.qrator.net/request/domain/11222",
]
headers = {"X-Qrator-Auth" = "${QRATOR_API_KEY}", "Content-Type" = "application/json"}
body = '{"method":"statistics_current_blocks"}'
data_format = "json"
timeout = "30s"
[[inputs.http]]
name_prefix = "qrator_http_"
method = "POST"
urls = [
"https://api.qrator.net/request/domain/11111",
"https://api.qrator.net/request/domain/11222",
]
headers = {"X-Qrator-Auth" = "${QRATOR_API_KEY}", "Content-Type" = "application/json"}
body = '{"method":"statistics_current_http"}'
data_format = "json"
timeout = "30s"
[[inputs.http]]
name_prefix = "qrator_ip_"
method = "POST"
urls = [
"https://api.qrator.net/request/domain/11111",
"https://api.qrator.net/request/domain/11222",
]
headers = {"X-Qrator-Auth" = "${QRATOR_API_KEY}", "Content-Type" = "application/json"}
body = '{"method":"statistics_current_ip"}'
data_format = "json"
timeout = "30s"
[[inputs.http]]
name_prefix = "qrator_locations_"
method = "POST"
urls = [
"https://api.qrator.net/request/domain/11111",
"https://api.qrator.net/request/domain/11222",
]
headers = {"X-Qrator-Auth" = "${QRATOR_API_KEY}", "Content-Type" = "application/json"}
body = '{"method":"statistics_current_locations"}'
data_format = "json"
timeout = "30s"
[[outputs.prometheus_client]]
listen = ":9273"
В поле urls передается массив из ссылок на ресурсы (включают в себя идентификаторы доменов), в поле body – метод, а для передачи API-ключа используется переменная окружения QRATOR_API_KEY, нам нужно будет её дополнительно передать телеграфу, чтобы не хранять напрямую в конфигурации.
Осталось только запустить. Минифицированный Deployment для kustomize может выглядеть так:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: qrator-exporter
spec:
template:
spec:
containers:
- name: telegraf
image: telegraf:1.21.4
ports:
- name: metrics
containerPort: 9273
env:
- name: QRATOR_API_KEY
value: CHANGE_ME
securityContext:
runAsUser: 1001
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
volumeMounts:
- name: config
mountPath: "/etc/telegraf"
readOnly: true
- name: cache
mountPath: "/.cache"
volumes:
- name: config
secret:
secretName: qrator-exporter
- name: cache
emptyDir: {}
Сам секрет qrator-exporter описывается в файле kustomization.yaml, например:
secretGenerator:
- name: qrator-exporter
files:
- config/telegraf.conf
Не забываем описать сервис и Service Monitor:
---
apiVersion: v1
kind: Service
metadata:
name: qrator-exporter
spec:
type: ClusterIP
ports:
- name: metrics
port: 9273
targetPort: 9273
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: qrator-exporter
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
selector:
matchLabels:
app.kubernetes.io/name: qrator-exporter
app.kubernetes.io/component: service
app.kubernetes.io/part-of: monitoring
namespaceSelector:
any: true
Селектор по лейблам, которые заданы в kustomization.yaml:
---
commonLabels:
app.kubernetes.io/name: qrator-exporter
app.kubernetes.io/component: service
app.kubernetes.io/part-of: monitoring
После этого мы начнем собирать метрики, однако в качестве url в метриках будет непонятный адрес ресурса Qrator, поэтому добавляем релейбл:
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: qrator-exporter
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
metricRelabelings:
- sourceLabels: ["url"]
regex: https://api.qrator.net/request/domain/(.+)
replacement: $1
targetLabel: domain_id
action: replace
- sourceLabels: ["url"]
regex: https://api.qrator.net/request/domain/11111
replacement: domain.ru
targetLabel: domain_name
action: replace
- sourceLabels: ["url"]
regex: https://api.qrator.net/request/domain/11222
replacement: super-domain.ru
targetLabel: domain_name
action: replace
selector:
matchLabels:
app.kubernetes.io/name: qrator-exporter
app.kubernetes.io/component: service
app.kubernetes.io/part-of: monitoring
namespaceSelector:
any: true
Теперь в domain_name будет читаемый параметр, который можно использовать для селекторов в Grafana или в алертах.
Примеры алертов для Prometheus Operator:
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: qrator-exporter
labels:
app: prometheus-operator
release: "monitoring"
spec:
groups:
- name: QratorExporter
rules:
- alert: QratorHighBandwidthInput
expr: qrator_ip_http_result_bandwidth_input > 5000000
for: 5m
labels:
severity: warning
domain: "{{ $labels.domain_name }}"
annotations:
summary: Большой входящий трафик на {{ $labels.domain_name }}
description: На домене {{ $labels.domain_name }} в Qrator фиксируется повышенный входящий трафик, более 5Мбит/с
- alert: QratorHighBandwidthOutput
expr: qrator_ip_http_result_bandwidth_input > 5000000
for: 5m
labels:
severity: warning
domain: "{{ $labels.domain_name }}"
annotations:
summary: Большой исходящий трафик на {{ $labels.domain_name }}
description: На домене {{ $labels.domain_name }} в Qrator фиксируется повышенный исходящий трафик, более 5Мбит/с
- alert: QratorHigh5xxRate
expr: qrator_http_http_result_errors_total >= 0.1
for: 5m
labels:
severity: critical
domain: "{{ $labels.domain_name }}"
annotations:
summary: В Qrator на {{ $labels.domain_name }} фиксируется рост числа ошибок
description: В Qrator на домене {{ $labels.domain_name }} в течении 5 минут фиксируется рост числа 50x ошибок
Перед добавлением алертов стандартная рекомендация – пособирайте некоторые время метрики, чтобы определить для себя граничные значения, удобнее всего за этим наблюдать в Grafana, поэтому в качестве базового можно взять этот дашборд:
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 1,
"id": 106,
"iteration": 1647973063127,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": null,
"description": "Alerts:nn* QratorHighBandwidthInputn",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bits"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [
"max"
],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "multi"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum(qrator_ip_http_result_bandwidth_input{domain_name="$domain"})",
"interval": "",
"legendFormat": "input",
"refId": "A"
},
{
"exemplar": true,
"expr": "sum(qrator_ip_http_result_bandwidth_output{domain_name="$domain"})",
"hide": false,
"interval": "",
"legendFormat": "output",
"refId": "B"
}
],
"title": "Traffic",
"type": "timeseries"
},
{
"datasource": null,
"description": "Alerts:nn* QratorHigh5xxRate",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"decimals": 2,
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 0
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "multi"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum({__name__=~"qrator_http_http_result_errors_.+", domain_name="$domain"})by(__name__)",
"interval": "",
"legendFormat": "{{ __name__ }}",
"refId": "A"
}
],
"title": "Errors",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_errors_(.*)",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 9
},
"id": 7,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum(qrator_http_http_result_requests{domain_name="$domain"})",
"interval": "",
"legendFormat": "total",
"refId": "A"
}
],
"title": "Requests",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0000_0(.*)",
"renamePattern": "Less $1 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_0(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1000_1500",
"renamePattern": "1 - 1.5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1500_2000",
"renamePattern": "1.5 - 2 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_2000_5000",
"renamePattern": "2 - 5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_5000_inf",
"renamePattern": "More 5 s"
}
}
],
"type": "timeseries"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 9
},
"id": 10,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum({__name__=~"qrator_http_http_result_responses_.+", domain_name="$domain"})by(__name__)",
"interval": "",
"legendFormat": "{{ __name__ }}",
"refId": "A"
}
],
"title": "Requests by response time",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0000_0(.*)",
"renamePattern": "Less $1 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_0(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1000_1500",
"renamePattern": "1 - 1.5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1500_2000",
"renamePattern": "1.5 - 2 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_2000_5000",
"renamePattern": "2 - 5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_5000_inf",
"renamePattern": "More 5 s"
}
}
],
"type": "timeseries"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "pps"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 18
},
"id": 11,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "multi"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum(qrator_ip_http_result_packets_input{domain_name="$domain"})",
"interval": "",
"legendFormat": "input",
"refId": "A"
},
{
"exemplar": true,
"expr": "sum(qrator_ip_http_result_packets_output{domain_name="$domain"})",
"hide": false,
"interval": "",
"legendFormat": "output",
"refId": "B"
}
],
"title": "Packets",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0000_0(.*)",
"renamePattern": "Less $1 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_0(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_0(.*)_(.*)",
"renamePattern": "$1 - $2 ms"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1000_1500",
"renamePattern": "1 - 1.5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_1500_2000",
"renamePattern": "1.5 - 2 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_2000_5000",
"renamePattern": "2 - 5 s"
}
},
{
"id": "renameByRegex",
"options": {
"regex": "qrator_http_http_result_responses_5000_inf",
"renamePattern": "More 5 s"
}
}
],
"type": "timeseries"
},
{
"datasource": null,
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 18
},
"id": 5,
"options": {
"legend": {
"calcs": [
"max",
"last"
],
"displayMode": "table",
"placement": "right"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"exemplar": true,
"expr": "sum({__name__=~"qrator_locations_http_result_locations_.+", domain_name="$domain"}>0)by(__name__)",
"interval": "",
"legendFormat": "{{ __name__ }}",
"refId": "A"
}
],
"title": "Black list",
"transformations": [
{
"id": "renameByRegex",
"options": {
"regex": "qrator_locations_http_result_locations_(.*)",
"renamePattern": "$1"
}
}
],
"type": "timeseries"
}
],
"schemaVersion": 32,
"style": "dark",
"tags": [
"WIP"
],
"templating": {
"list": [
{
"allValue": null,
"current": {
"selected": false,
"text": "qlean.ru",
"value": "qlean.ru"
},
"datasource": null,
"definition": "label_values(qrator_http_http_id, domain_name)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "Domain",
"multi": false,
"name": "domain",
"options": [],
"query": {
"query": "label_values(qrator_http_http_id, domain_name)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-12h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Qrator",
"uid": "gM2arMHnk",
"version": 23
}
TODO: Перенести дашборд в https://grafana.com/grafana/dashboards/.
Источник: https://levaminov.ru/-GTUP-QWYFV