-
Notifications
You must be signed in to change notification settings - Fork 25
SRE - Weekly report #1755
Copy link
Copy link
Open
Labels
BackendMost or all of the changes for this issue will be in the backend code.Most or all of the changes for this issue will be in the backend code.MetricsRelated to open metrics, measurements or usage dataRelated to open metrics, measurements or usage dataenhancementNew feature or requestNew feature or request
Metadata
Metadata
Assignees
Labels
BackendMost or all of the changes for this issue will be in the backend code.Most or all of the changes for this issue will be in the backend code.MetricsRelated to open metrics, measurements or usage dataRelated to open metrics, measurements or usage dataenhancementNew feature or requestNew feature or request
Type
Projects
Status
No status
Summarize usage to indicate how well the Dashboard is behaving.
Basically take https://mon.kernelci.org/public-dashboards/715f7faddb014b0e99fd025f4ae19a7a?from=now-1h&to=now&timezone=browser&var-summary_mode=range and format as an email to the Working Group.
If possible we'd like to see at least "Requests Count" grouped by "Response Status Code", which should give us a rough idea of failed requests, including timeouts.
Ideally we should get the uptime, to see how often the server crashed or otherwise was offline during releases deployments. That might require a periodical prometheus exporter hitting the "status" endpoint.
If this can be achieved directly by Grafana or Prometheus, it is an acceptable solution.