- Published on
Advanced Monitoring with NATS surveyor
5 min read
- Sujit Nalawade
Table of Contents
In this article, we'll set up nats-surveyor for advanced monitoring of our NATS servers through Prometheus and Grafana.
What is NATS Surveyor?
NATS surveyor polls the NATS server for Statz messages to generate data for Prometheus. This allows a single exporter to connect to any NATS server and get an entire picture of a NATS deployment without requiring extra monitoring components or sidecars.
It's really powerful as we can now just connect the data generated for Prometheus and setup dashboards on observability platforms like Grafana.
Let's setup our local super cluster and start our surveyor service.
To setup our local super cluster, we can use this repo. Here's the topology.
$ git clone https://github.com/ColinSullivan1/nats-local-supercluster.git $ cd nats-local-supercluster $ ./start_supercluster.sh
Now that our local super cluster is up and running, we can setup nats-surveyor.
For now, we'll do it with docker and docker-compose.
Note: We can also install
nats-surveyor directly from the Github releases as well
$ git clone https://github.com/nats-io/nats-surveyor.git $ cd nats-surveyor/docker-compose $ ./survey.sh "nats://$(ipconfig getifaddr en0):4000" 9 ../../nats-local-supercluster/auth/nkeys/creds/myoperator/SYS/SYS.creds [+] Running 3/0 ⠿ Container nats-surveyor Created 0.0s ⠿ Container prometheus Created 0.0s ⠿ Container grafana Created 0.0s Attaching to grafana, nats-surveyor, prometheus ...
Notice how we use
ipconfig getifaddr en0 to get the current IP of the system and
SYS.creds with NATS surveyor.
Generating demo data
For generating traffic we can use the
nats bench command
Note: Learn more about NATS CLI in the previous article.
$ nats bench -s 127.0.0.1:4000 --msgs 100000000 --pub 1 --sub 1 --creds ../../nats-local-supercluster/auth/nkeys/creds/myoperator/myaccount/myuser.creds subject 16:38:53 Starting pub/sub benchmark [subject=subject, msgs=100,000,000, msgsize=128 B, pubs=1, subs=1] 16:38:53 Starting subscriber, expecting 100,000,000 messages 16:38:53 Starting publisher, publishing 100,000,000 messages Finished 40s [==========================================] 100% Finished 40s [==========================================] 100% NATS Pub/Sub stats: 4,924,665 msgs/sec ~ 601.16 MB/sec Pub stats: 2,462,354 msgs/sec ~ 300.58 MB/sec Sub stats: 2,462,346 msgs/sec ~ 300.58 MB/sec
Yes, we just transferred 100 Million messages in just 40s alongside running a super cluster on the same machine! NATS has amazing performance.
We can also use
nats bench with
--pubsleep flag to simulate real-time traffic in the background while we look at the dashboards.
$ nats bench -s 127.0.0.1:4000 --msgs 100000000 --pubsleep 1ms --pub 1 --sub 1 --creds ../../nats-local-supercluster/auth/nkeys/creds/myoperator/myaccount/myuser.creds subject 14:24:20 Starting pub/sub benchmark [subject=subject, msgs=100,000,000, msgsize=128 B, pubs=1, subs=1, js=false, pubsleep=1ms, subsleep=0s] 14:24:20 Starting subscriber, expecting 100,000,000 messages 14:24:20 Starting publisher, publishing 100,000,000 messages Receiving 18s [--------------------------------------------------------------] 0% Publishing 18s [--------------------------------------------------------------] 0%
Now we should be able to go to Grafana running on
[localhost:3000/dashboards](http://localhost:3000/dashboards) and see all the available monitoring dashboards.
Note: You might be presented with a login screen, the default user is
admin and the password is
Here we can see we have different dashboards such as Clients, Clusters, NATS Overview, Network Usage, Super Cluster, etc. So let's explore these dashboards one by one!
In the client dashboard, we can monitor things like slow consumers, subscriptions, connections per second, and much more.
In the cluster dashboard, we can see how many clusters we are running with bandwidth and messages per second.
The overview dashboard provides basic information about how many servers and clusters we are running with route or gateway connections.
Check out that insane 300k messages/sec, and that's on a development machine!
The network dashboard is all about how much data is being sent or received in our clusters.
Node Resource Usage
This dashboard provides information about individual nodes and provides metrics like CPU and memory usage of our nodes.
This dashboard works at the super cluster level and provides metrics like super cluster bandwidth, connections, message rate, and much more.
This makes it really easy to monitor multiple super clusters.
In this article, we set up NATS Surveyor, which is an incredible tool that makes it easy to setup monitoring for our NATS services as easily as a single command. It's a must have if you're running distributed systems with NATS at scale. Make sure to checkout the docs for more info.
I hope this article was helpful, feel free to reachout to me if you face any issues. Have a great day!