Degraded ext4 performance with millions files because of high system time in ext4_mb_good_group (100% kworker/u32+flush)

A strange behavior has been monitored in a storage server holding more than 20 millions of files, which is not a really big figure, because we have storage with more than 300 millions of files. The server has a RAID5 of 4 x 10T HGST Ultrastar He10 and storage partition formatted with ext4 filesystem with total of 27T size. Despite the inodes are only 7% and the free space is at 80%, the server began to experience high load averages with almost no IO utilization across the disks and the RAID device.
So the numbers are:

  • 20% free space.
  • 93% free inodes.
  • no IO disk utilization above 10-20% of any of the disks. In fact, the IO above 10% is very rear.
  • really high loads above the number of the server’s cores. The server has 8 virtual processors under Linux and the loads are constantly above 10-14.
  • no Disk Sleep blocked processes or Running to justify the high load.
  • the top command reports one or two kernel processes with 100% running – kworker/u32:1+flush-9:3 (so flushing the data takes so many time?)

If the problem were related to the disks or hardware, it would probably have been seen a high utilization across the disks, because the physical operations would need more time to execute. The high IO utilization across any disk, in deed, may result in a high system time (i.e. kernel time). Apparently, there is no high IO utilization across the disks and even no Disk Sleep blocked processes.
The FlameGraph tool may help to suggest the area where the problem related to.
First, install the git and perf tool (there are two interesting links about it with examples – link1 and link2):

dnf install -y perf git perl-open
git clone https://github.com/brendangregg/FlameGraph

Save a sample 60 second data with:

[root@srv ~]# perf record -F 99 -a -g -- sleep 60
[ perf record: Woken up 24 times to write data ]
[ perf record: Captured and wrote 7.132 MB perf.data (35687 samples) ]

And generate a SVG image of all the function, which are executed during this time frame of 60 seconds. It will give a good picture of what has been executed and how much time it took in the kernel and user space.

[root@srv ~]# cd FlameGraph
[root@srv FlameGraph]# ./stackcollapse-perf.pl ../out.perf > out.folded
[root@srv FlameGraph]# ./flamegraph.pl out.folded > kernel.svg

The SVG are shared bellow. It turned out most of the time during this sample period of 60 seconds, the Linux kernel was in ext4_mb_good_group, which is related to the EXT4 file system.
What fixed the problem (for now) – By removing files it was noticed the system time went down and the load went down, too. Then, after removing approximately a million files, the server performance returned to normal as the average other storage server (this one has unique hardware setup). So by removing files, i.e. INODES from the ext4 file system, the load average and the server ext4 performance were completely different from what it had been before.
The interesting part the inodes never went above 7% (around 24 million of 454 million available) and the free space never went down 20%, but when a good deal of files were removed, the bad ext4 performance disappeared. It still could be related to the physical disks or some strange ext4 bug, but this article is intended to report this behavior and to propose a some kind of a solution (even though a workaround).

SCREENSHOT 1) The kernel process running at 100% – kworker/u32:2-flush-9:3.

main menu
kernel process running at 100%

SCREENSHOT 2) Almost half of the time was spent at ext4 related calls.

main menu
ext4 and ext4_mb_good_group half of the time spent

Keep on reading!

Expand disk and the root partition of the QEMU virtual server

This article is to show how easy is to grow the size of QEMU virtual disk and its partitions (along with ext4 file system). Of course, you can use this article as an example of expanding the partitions of a physical disk.

Our setup is a QEMU virtual server using a raw image of 20G and the steps are as follow:

  1. Stop the virtual server
  2. Resize with qemu-img the raw image of the virtual server
  3. Start the virtual server
  4. Get a root ssh shell (probably by using openssh)
  5. Use parted to resize the partition (and fix the GPT of the disk – not the disk is larger, so the GPT table need fixing).
  6. Use resize2fs to resize the

STEP 1) Power off your virtual server.

The best way is to power it off within the server with the “poweroff” command. Be careful to check whether the host server killed the QEMU process. It is almost certain if the VNC port is released, the QEMU process has been exited.
If you use virsh (i.e. libvirt), you may execute:

virsh shutdown my-private-vm-01
virsh destroy my-private-vm-01

The destroy command ensures there is no QEMU process, which still operates over the image disk file. But it is dangerous for your data if you issue it on a running virtual server, because it may lose the unsaved data.
If you use QEMU manually wait for the process to exit or if you have enabled the management console connect to it using telnet and just quit – this will destory the QEMU virtual server process – again be careful with unsaved data.

[root@lsrv1 ~]# ps axuf|grep qemu
root     15575  2.3 50.1 13061032 8112212 ?    Sl   May08 1522:27 qemu-system-x86_64 -enable-kvm -smp 4,maxcpus=8 -daemonize -vnc :30 -cdrom /mnt/vm/isos/CentOS-7-x86_64-Minimal-1810.iso -drive file=/mnt/vm/images/templatesrv-wordpress.bin,cache=none,aio=threads,if=virtio -boot c -net nic,model=virtio,macaddr=00:00:00:00:00:30 -net tap,ifname=tap30,script=no,downscript=no -balloon virtio -m 8144 -monitor telnet:127.0.0.1:5830,server,nowait
[root@srv-host ~]# telnet 127.0.0.1 5830
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
QEMU 2.0.0 monitor - type 'help' for more information
(qemu) q
Connection closed by foreign host.

If you use a web interface (for example WebVirtMgr) check whether the virtual server is in power-off state.

STEP 2) Resize the image file of the virtual server.

Find where are located the virtual servers’ image files in your installation and use qemu-img. We want to increse the size with 174GB to 200GB.

qemu-img resize my-private-vm-01.img +174GB

STEP 3) Start your server.

Start your server by issuing a command with virsh or QEMU (qemu-system-x86_64) or from a web interface if use one (like WebVirtMgr).
*virsh and libvirt:

virsh start my-private-vm-01

*Manual start of QEMU emulator – qemu-system-x86_64:

qemu-system-x86_64 -enable-kvm -smp 4,maxcpus=8 -daemonize -vnc :30 -cdrom /mnt/vm/isos/CentOS-7-x86_64-Minimal-1810.iso -drive file=/mnt/vm/images/templatesrv-wordpress.bin,cache=none,aio=threads,if=virtio -boot c -net nic,model=virtio,macaddr=00:00:00:00:00:30 -net tap,ifname=tap30,script=no,downscript=no -balloon virtio -m 8144 -monitor telnet:127.0.0.1:5830,server,nowait

Or just use the web browser and start the virtual server from WebVirtMgr if it is what you use.

STEP 4) Open a shell to your server.

We use openssh client to connect to our server.

STEP 5) Use parted to resize the partition.

The program “parted” will report that the partition table does not use the whole available disk, which is perfectly normal because we’ve just increased the disk size. Just confirm to fix the GPT partition table:

parted /dev/vda
GNU Parted 3.2
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Warning: Not all of the space available to /dev/vda appears to be used, you can fix the GPT to use all of the space (an extra 367001600 blocks) or continue with the current
setting? 
Fix/Ignore? Fix                                                           
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  24.0GB  19.9GB  ext4

(parted) resizepart 3 -1                                                  
Warning: Partition /dev/vda3 is being used. Are you sure you want to continue?
parted: invalid token: -1
Yes/No? Yes
End?  [24.0GB]? -1
(parted) p
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  215GB   211GB   ext4

(parted) q                                                                
Information: You may need to update /etc/fstab.

There is a warning the partition is in use but it is perfectly OK to continue.
*parted: reports “invalid token: -1”, but it is accepted for the “End” parameter.

STEP 6) Resize the ext4 file system online

Use the tool resize2fs to resize EXT4.

resize2fs /dev/vda3
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/vda3 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 13
The filesystem on /dev/vda3 is now 51428620 (4k) blocks long.

To check the resize operation:

srv1-vm ~ # dmesg|grep EXT4
[  449.330140] EXT4-fs (vda3): resizing filesystem from 4859392 to 51428620 blocks
[  449.936044] EXT4-fs (vda3): resized filesystem to 51428620
srv1-vm ~ # df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           798M  3.6M  795M   1% /run
/dev/vda3       193G  5.7G  180G   4% /
tmpfs           3.9G  196K  3.9G   1% /dev/shm

Output log of the whole resize operation

srv-vm1 ~ # parted /dev/vda
GNU Parted 3.2
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 26.8GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  24.0GB  19.9GB  ext4

(parted) q                                                                
srv-vm1 ~ # poweroff
Connection to srv-vm1 closed by remote host.
Connection to srv-vm1 closed.
myuser@gw1:~$ sshh srv1-host
srv1-host ~ # cd /mnt/vm/images
srv1-host images # qemu-img resize srv-vm1.img +174GB
Image resized.
srv1-host images # logout
Connection to srv1-host closed.
myuser@gw1:~$ sshh srv-vm1
srv-vm1 ~ # df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           798M  3.5M  795M   1% /run
/dev/vda3        19G  5.7G   12G  33% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           798M     0  798M   0% /run/user/0
srv-vm1 ~ # parted /dev/vda
GNU Parted 3.2
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Warning: Not all of the space available to /dev/vda appears to be used, you can fix the GPT to use all of the space (an extra 367001600 blocks) or continue with the current
setting? 
Fix/Ignore? Fix                                                           
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  24.0GB  19.9GB  ext4

(parted) q
Information: You may need to update /etc/fstab.

srv-vm1 ~ # parted /dev/vda                                           
GNU Parted 3.2
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) resizepart 3 -1                                                  
Warning: Partition /dev/vda3 is being used. Are you sure you want to continue?
parted: invalid token: -1                                                 
Yes/No? Yes                                                               
End?  [24.0GB]? -1                                                        
(parted) p                                                                
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  215GB   211GB   ext4

(parted) q                                                                
Information: You may need to update /etc/fstab.

srv-vm1 ~ # resize2fs /dev/vda3
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/vda3 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 13
The filesystem on /dev/vda3 is now 51428620 (4k) blocks long.

srv-vm1 ~ # dmesg|grep EXT4
[  449.330140] EXT4-fs (vda3): resizing filesystem from 4859392 to 51428620 blocks
[  449.936044] EXT4-fs (vda3): resized filesystem to 51428620

srv-vm1 ~ # touch /forcefsck
srv-vm1 ~ # reboot

We rebooted the virtual machine with force check for precaution, but it is not reqiured.

Bonus – physical disks setup

Probably you would need an additional first step of copying your old disk to the new disk – basically, there are two ways to do it:

  • blind copy everything with a hardware or the Linux “dd” command.
  • use gparted to copy the GPT table and the partitions to the new disk

Online resize of a root ext4 file system – increase the space

Here you can see how to online resize your root ext4 file system. The free space of your partition will be increased after the operation. The size of the root file system will grow not shrink. Of course, this could have been any other partition, not exactly the root one, but in most cases, such operations on the root are the more complex and dangerous – SO ALWAYS do backups before such operations!

All services work properly and no shut down of services, no reboot, or umount is required during the resize operation.

Still, we rebooted the server once to force check the file system as a precaution, because it was possible and this server was not in production. The reboot of the server after this kind of resizing is not mandatory.
The following method is tested on CentOS 7, Ubuntu 16 LTS, and Gentoo with kernel 4.15 kernel. So we can assume you may have no problems if your system is newer than ours.

Summary

  1. Partition resize – Use resizepart in parted command. All Linux distributions have this package with the same name as the needed command “parted”
  2. File system resize – Use resize2fs from the E2fsprogs package. All Linux distributions include this package mostly with the same name of the package.

STEP 1) Expand the partition, which holds the root partition.

Let’s assume you have changed your disk and now there is more unallocated space to be used or somehow the space of the disk is increased. Look below for a real-world example with one of our virtual servers.

root@srv1 ~ # parted /dev/sda
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  24.0GB  19.9GB  ext4
(parted) resizepart 3 -1                                                  
Warning: Partition /dev/sda3 is being used. Are you sure you want to continue?
parted: invalid token: -1                                                 
Yes/No? Yes                                                               
End?  [24.0GB]? -1                                                        
(parted) p                                                                
Model: Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sda: 215GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)
 3      4096MB  215GB   211GB   ext4

(parted) q                                                                
Information: You may need to update /etc/fstab.

As you can see from the first print command the partition number 3 is 19.9GB and after the resize command with “-1” is 211GB. There is a warning about the partition is used, but it is normal and not critical.

STEP 2) Resize the file system, on which we expanded the partition.

You need to install E2fsprogs. All Linux distributions have this package, here are some of them:

  • CentOS 7 – e2fsprogs
  • Ubuntu – e2fsprogs
  • Gentoo – sys-fs/e2fsprogs

After installing the e2fsprogs package you will have the online ext4 resizing tool – resize2fs.

root@srv ~ # resize2fs /dev/sda3
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/sda3 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 13
The filesystem on /dev/sda3 is now 51428620 (4k) blocks long.

Check if everything is OK with

root@srv ~ # dmesg|grep EXT4
[  449.330140] EXT4-fs (vda3): resizing filesystem from 4859392 to 51428620 blocks
[  449.936044] EXT4-fs (vda3): resized filesystem to 51428620
root@srv ~ # df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           798M  3.5M  795M   1% /run
/dev/sda3       193G  3.4G  182G   2% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           798M     0  798M   0% /run/user/0

Bonus – you can force check the file system on the next reboot

Probably it is a good idea to force check the file system integrity on the next boot. This step is not mandatory and you may skip it.
For Ubuntu you can do:

root@srv ~ # touch /forcefsck
root@srv ~ # reboot

Bonus 2

Fixing the GPT. Newer versions may display warning the GPT table is not using the whole disk space and to fix it. Just type fix to add the new unallocated disk space:

root@srv ~ # parted /dev/sda
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Warning: Not all of the space available to /dev/sda appears to be used, you can fix the GPT to use all of the space (an extra 188743680 blocks) or continue with the current setting? 
Fix/Ignore? Fix                                                           
Model: Virtio Block Device (virtblk)
Disk /dev/sda: 118GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name  Flags
 1      1049kB  2097kB  1049kB                        bios_grub
 2      2097kB  17.2GB  17.2GB  ext4
 3      17.2GB  21.5GB  4293MB  linux-swap(v1)

(parted)