Create graph for Linux Processes grouped by states using Grafana, InfluxDB and collectd

This article shows how to make a graph showing a Linux machine’s processes states. This plugin could gather the number of the processes grouped by their state or metadata per the selected process defined in the configuration (metadata includes process state, size of the resident segment size (RSS), system/user time used, and so on.). The purpose of this article is to make a graph with all the processes grouped by their state. Graphs per process data are not included here.

main menu
Processes states of a live web server.

The Linux machine is using collectd to gather the processes statistics and send them to the time series back-end – InfluxDB. Grafana is used to visualize the data stored in the time series back-end InfluxDB and organize the graphs in panels and dashboards. Check out the previous articles on the subject to install and configure such software to collect, store and visualize data – Monitor and analyze with Grafana, influxdb 1.8 and collectd under CentOS Stream 9, Monitor and analyze with Grafana, influxdb 1.8 and collectd under Ubuntu 22.04 LTS and Create graph for Linux CPU usage using Grafana, InfluxDB and collectd
The collectd daemon is used to gather data on the Linux system and to send it to the back-end InfluxDB.

Key knowledge for the Processes collectd plugin

  • The collectd plugin Processes official page – https://collectd.org/wiki/index.php/Plugin:Processes
  • The Processes plugin options – https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_processes
  • to enable the Processes plugin, load the plugin with the load directive in /etc/collectd.conf
    LoadPlugin processes
    
  • The Processes plugin collects data every 10 seconds.
  • processses_value – a single Gauge value – a metric, which value that can go up and down. It is used to count the number of processes in the different states (the state is saved in a tag value of one record). So there are multiple gauge values with different tags for the different process states at a given time.
    tag key tag value description
    host server hostname The name of the source this measurement was recorded.
    type cpu ps_state is the type, which will group the processes by states.
    type_instance processes’ states States are – blocked, paging, running, sleeping, stopped, zombies.
  • A Gauge value – a metric, which value that can go up and down. More on the topic – Data sources.

    A GAUGE value is simply stored as-is. This is the right choice for values which may increase as well as decrease, such as temperatures or the amount of memory used.

  • To cross check the value, the user can use the /proc/stat
    [root@srv ~]# cat /proc/stat 
    cpu  804 0 732 6240 198 106 25 0 0 0
    cpu0 444 0 345 3092 121 44 14 0 0 0
    cpu1 359 0 387 3147 76 62 11 0 0 0
    intr 72376 117 9 0 0 0 0 0 0 1 2 0 0 156 0 187 187 0 0 188 273 0 0 0 0 0 0 6574 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ctxt 216350
    btime 1667997331
    processes 1359
    procs_running 2
    procs_blocked 0
    softirq 38704 2 5003 5 290 6565 0 74 5796 0 20969
    

    Some of the lines are pretty clear about what they mean by “procs_running“, “processes“, “procs_blocked” and so on.

Keep on reading!

Create graph for Linux CPU usage using Grafana, InfluxDB and collectd

This article shows how to make a graph showing a Linux machine’s CPU Usage.

main menu
example cpu usage

The Linux machine is using collectd to gather the load average and send it to the time series back-end – InfluxDB. Grafana is used to visualize the data stored in the time series back-end InfluxDB and organize the graphs in panels and dashboards. Check out the previous articles on the subject to install and configure such software to collect, store and visualize data – Monitor and analyze with Grafana, influxdb 1.8 and collectd under CentOS Stream 9 and Monitor and analyze with Grafana, influxdb 1.8 and collectd under Ubuntu 22.04 LTS.
The collectd daemon is used to gather data on the Linux system and to send it to the back-end InfluxDB.

Key knowledge for the cpu collectd plugin

  • The collectd plugin CPU official page – https://collectd.org/wiki/index.php/Plugin:CPU
  • The CPU plugin options – https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_cpu
  • to enable the CPU plugin, load the plugin with the load directive in /etc/collectd.conf
    LoadPlugin cpu
    
  • The CPU plugin collects data every 10 seconds.
  • cpu_value – 1 derive value is saved in the database. All values are in jiffies – the kernel unit of time. Showing just jiffers is not practical, that’s why all CPU graphs convert jiffers to CPU percentage usage.
    tag key tag value description
    host server hostname The name of the source this measurement was recorded.
    instance execution units number The execution unit this measurement was recorded. For example, systems with 8 cores will have 8 different execution units, so instances from 0 to 7. A graph representing the usage of a single CPU core is possible.
    type cpu The only type available is cpu.
    type_instance CPU usage metrics CPU metrics – idle, interrupt, nice, softirq, steal, system, user, wait.
  • DERIVE value – a metric, in which the change of the value is interesting. For example, it can go up indefinitely and it is important how fast it goes up, there are functions and queries, which will give the user the derivative value.

    These data sources assume that the change of the value is interesting, i.e. the derivative. Such data sources are very common with events that can be counted, for example, the number of emails that have been received per second by an MTA since it was started. The total number of emails is not interesting.

  • To cross check the value, the user can use the /proc/stat
    [root@srv ~]# cat /proc/stat 
    cpu  939 0 988 51486 200 261 56 0 0 0
    cpu0 483 0 473 25796 89 114 25 0 0 0
    cpu1 455 0 514 25690 110 147 31 0 0 0
    intr 123072 118 9 0 0 0 0 0 0 1 6 0 0 156 0 409 409 0 0 1184 501 0 0 0 0 0 0 6823 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ctxt 279137
    btime 1666874114
    processes 1373
    procs_running 1
    procs_blocked 0
    softirq 64069 2 13685 7 544 6967 0 77 15801 0 26986
    

Keep on reading!

Create graph for Linux Load Average using Grafana, InfluxDB and collectd

This article shows how to make a graph showing a Linux machine’s load average.

main menu
A real load average graph of a web server

The Linux machine is using collectd to gather the load average and send it to the time series back-end – InfluxDB. Grafana is used to visualize the data stored in the time series back-end InfluxDB and organize the graphs in panels and dashboards. Check out the previous articles on the subject to install and configure such software to collect, store and visualize data – Monitor and analyze with Grafana, influxdb 1.8 and collectd under CentOS Stream 9 and Monitor and analyze with Grafana, influxdb 1.8 and collectd under Ubuntu 22.04 LTS.
The collectd daemon is used to gather data on the Linux system and to send it to the back-end InfluxDB.

Key knowledge for the load collectd plugin

  • The collectd plugin Load official page – https://collectd.org/wiki/index.php/Plugin:Load
  • The Load plugin options – https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_load
  • to enable the load plugin, load the plugin with the load directive in /etc/collectd.conf
    LoadPlugin load
    
  • The Load plugin collects data every 10 seconds.
  • load_longterm, load_midterm, load_shortterm – 3 gauge values are saved in the database.
  • Gauge value – a metric, which value that can go up and down.

    A GAUGE value is simply stored as-is. This is the right choice for values which may increase as well as decrease, such as temperatures or the amount of memory used.

  • To cross check the value, the user can use the uptime command under Linux or /proc/loadavg
    [root@srv ~]# uptime
     23:08:09 up 52 min,  2 users,  load average: 1.00, 0.77, 0.38
    [root@srv ~]# cat /proc/loadavg 
    1.00 0.77 0.38 2/176 1900
    

Keep on reading!