CPU Swimlanes

CPU swimlanes are a mashup of a percentage plots and swimlane diagram. The goal is to graphically show the usage of each processor and its temporal relation to usage of other processors. The swimlane diagram is often used in process flow diagrams to show functional or organizational processes and how they operate in parallel, when possible. The result is a visualization of the data available in tools such as the command-line oriented mpstat, commonly found in UNIX/Linux/MacOS distros.

Performance engineers and capacity planners often use CPU usage for systems analysis. For multiprocessor machines, summary data as often seen on CPU usage dashboard or tools can hide system bottlenecks. For example, a 10-CPU system where one CPU is 100% busy running a single-threaded application will appear to be only 10% busy in aggregate. Tools like mpstat are useful for analyzing per-processor usage, but quickly become unwieldy when there are many processors and are not well suited to show trends over time. Also, when a process is migrated to another CPU, mpstat is not well suited to correlate this movement with other temporal changes to the system. This is the perfect job for a nice dashboard.

Dashboard Design

The CPU percentage plots are designed to allow the observer to quickly differentiate between "user" usage by applications versus the "system" usage by the kernel. This is accomplished by layering the user, idle, and system usage metrics from bottom to top. This is visually effective because as user usage often causes system usage. As the CPUs become busier, the idle time in the middle gets squeezed and can disappear entirely. The balance of user to system time is readily discernible.

For many systems, user usage is good and follows the tenets of good things go up and to the right and good things are green. Similarly, system usage is overhead and grows down from the ceiling, reinforcing the tenets that bad things go down and to the right and worrisome things are amber. Idleness is a wide open blue sky.

Another design element to the dashboard is that the per-CPU swimlanes are not encumbered by axes. This allows the dashboard to scale to dozens of CPUs without becoming cluttered by redundant text.


A test of good dashboard design is if you can glance at the images and instantly conclude whether all is well or action is required. By choosing good color schemes and showing meaningful data consistently, a dashboard can speed systems analysis. CPU swimlanes can instantly show imbalances in CPU use in two dimensions: balance across CPUs and balance of user vs kernel resources.

Scaling

For large systems, it can be useful to modify this basic dashboard:
  • reorganize the per-CPU enumeration to reflect NUMA associations
  • for dozens of CPUs, the row sizes can be reduced to show hundreds of CPUs on a relatively small screen

Sharing

The CPU swimlane dashboard screenshot above is developed for MacOS systems using grafana v4.1, influxdb, and the telegraf metrics aggregator where the CPU usage details are available for only user, system, and idle usage.

You can get a copy of the grafana dashboard for the CPU swimlanes above from my git repo https://github.com/richardelling/grafana-dashboards Share and enjoy

Comments