The Observability Stack for Kubernetes.

All bells and whistles included observability stack for Kubernetes.

Kube-Prometheus the Kubernetes monitoring stack
- Prometheus to collect metrics
- AlertManager to fire the alerts
- Grafana to visualize what's going on
- Node-Exporter to export metrics from the nodes
- Kube-State-Metrics to get metrics from kubernetes api-server
- Prometheus-Operator to manage the life-cycle of Prometheus and AlertManager custom resource definitions (CRDs)
Loki for the logs
Tempo distributed tracing

For a quick start, run helmfile apply. The helmfile.yaml is a template for the helm deployments. This can be modified to suit your needs.

Notes

Prerequisites

Install helmfile, ref, https://github.com/helmfile/helmfile
Install helm-diff by running helm plugin install https://github.com/databus23/helm-diff

If you encounter 'no matches for kind' errors when applying helmfile, use --skip-diff-on-install flag to bypass the initial diff when running apply:

helmfile apply --skip-diff-on-install

This should bring up Grafana, Prometheus, Loki and Tempo. The Grafana from kube-prometheus stack will by default have the Prometheus and Alert-manager data sources enabled. We are also adding the extra datasource for Loki and Tempo to Grafana from values/kube-prometheus-stack.yaml.gotmpl.

Ideally, if everything is good, we should see the below services,

k get svc
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                                                                                   AGE
alertmanager-operated     ClusterIP      None            <none>         9093/TCP,9094/TCP,9094/UDP                                                                                3d16h
grafana                   LoadBalancer   10.43.86.20     10.96.194.35   80:30205/TCP                                                                                              3d16h
kube-state-metrics        ClusterIP      10.43.91.244    <none>         8080/TCP                                                                                                  3d16h
loki                      ClusterIP      10.43.204.28    <none>         3100/TCP                                                                                                  3d16h
loki-headless             ClusterIP      None            <none>         3100/TCP                                                                                                  3d16h
loki-memberlist           ClusterIP      None            <none>         7946/TCP                                                                                                  3d16h
node-exporter             ClusterIP      10.43.195.165   <none>         9100/TCP                                                                                                  3d16h
prometheus-alertmanager   ClusterIP      10.43.229.249   <none>         9093/TCP,8080/TCP                                                                                         3d16h
prometheus-operated       ClusterIP      None            <none>         9090/TCP                                                                                                  3d16h
prometheus-operator       ClusterIP      10.43.114.50    <none>         443/TCP                                                                                                   3d16h
prometheus-prometheus     ClusterIP      10.43.200.3     <none>         9090/TCP,8080/TCP                                                                                         3d16h
tempo                     ClusterIP      10.43.66.58     <none>         3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP   3d16h

Grafana can be accessed with the loadbalancer ip, if your cluster doesn't support LoadBalancer, you can always port forward the grafana service/pod.

We will have to add the data sources for Loki and Tempo, take a look at

Note: if the alert manager and prometheus pods are going to restart the loop, it's most likely that you have prometheus already running in the cluster. You will have to delete those.

Tracing with Hot R.O.D

Hotrod is a popular demo app that uses OpenTracing API, you may find more details about the app here For the demo, let's deploy it to the same namespace,

k apply -f hotrod-sample-app.yml

# let's check the deployment and service
k get deploy,po,svc -l app=hotrod
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/hotrod   1/1     1            1           3d

NAME                         READY   STATUS    RESTARTS   AGE
pod/hotrod-fcd546f58-zsmcx   1/1     Running   0          3d

NAME             TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
service/hotrod   LoadBalancer   10.43.209.194   10.96.196.137   80:30743/TCP   3d

Access the Hotrod using loadbalancer ip or port-forward to your local. Click on a few rides from the UI and head over to Grafana. Go to explore view and select Loki as the data source, from the filer use app=hotrod

Distributed Tracing straight from the Loki logs to Tempo

Datasources in Grafana

Cleanup

To delete the observability stack,

helmfile delete

This will not clean up all the CRDs created, please refer on how to delete all the CRDs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Observability Stack for Kubernetes.

Notes

Prerequisites

Tracing with Hot R.O.D

Distributed Tracing straight from the Loki logs to Tempo

Datasources in Grafana

Cleanup

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Observability Stack for Kubernetes.

Notes

Prerequisites

Tracing with Hot R.O.D

Distributed Tracing straight from the Loki logs to Tempo

Datasources in Grafana

Cleanup