Zswap is an interesting way to extend your swap space with a memory write-back cache. Here is our simple explanation:
When this feature is enabled an amount of machine’s RAM is put aside and when the system needs to use swap space it will first write to this area and when it is full then it will use the disk. The data is also compressed on-the-fly by the kernel when saved in the memory allocated for the zswap device. The data is not compressed when saved on the disk. So it may happen your disk won’t be touch at all if the data could fit in the compressed memory pool. In addition, if the memory pool is full or it is at the maximum allowed space and no further extension is possible it occurs data evictions to the disk swap space using least recently used (LRU) algorithm.
Of course, it is a little bit more complex like it compresses only pages and there two handlers, which stores up to 2 compress pages in 1 and another one stores up to 3 pages in 1 (as to understand it even if you sometimes could store more compressed pages in let’s say 5 in 1 page it would not happen, the current memory allocator will compress pages up to what it is configured).
The most important piece of information is:
zswap uses RAM to make a compressed pool, which is first used when a swap out request is made. No writing to the disk is made.
You can effectively increase the amount of RAM using this feature because it’s like you have the ability to compress part of your RAM and the current algorithms show 2x to 3x times compression ratio. So separating 20% of 2G RAM of your virtual server for the zswap device you end up with 1.6G RAM + 400M zswap with the average compression ratio of 2x you may have 2.4G before the swap process touches your disks.
There are multiple cases where this feature is very handful such as:
- virtualization – virtual servers – increase your RAM
- reduce IO to the slow disks such as hard drives
- reduce IO to the flash-based storage, which may increase their life
- database or DNS servers could have great benefits because the compression ratio could be around 3x (i.e. 3 compressed pages stored in 1 real page)
If you do not know what is 1 page in computer terminology – it is the smallest unit of data for memory management and in most cases it is 4K, of course there are additional sizes 8K, 16K and more. You can see more here – https://en.wikipedia.org/wiki/Page_(computer_memory)
Enable zswap
To enable zswap device you must do the following:
Boot your kernel with the kernel parameter (reset is required, on some old kernels of 3.x this is the only option):
zswap.enabled=1
Or just enable it from /proc filesystem (runtime enable, not possible in old kernels):
echo 1 > /sys/module/zswap/parameters/enabled
When you disable it by setting to 0 it will not immediately decompress all pages and remove the pool. The pages in the pool must be invalidated or fault back to the memory. You may force the removal of all the compressed pages and the pool by deactivating the swap device by
swapoff -a
To turn off all swap devices and it will return all swap out pages into memory including the ones in the zswap compressed memory pool. The pool will be removed.
Memory pool size
By default, it is 20% of the RAM. You can increase or decrease it by using /sys file system:
zswap.max_pool_percent=15
Or just enable it from /proc filesystem (runtime enable, not possible in old kernels):
echo 15 > /sys/module/zswap/parameters/max_pool_percent
Compressed memory pool allocators
Currently, there are two:
- zbud – 2 compressed pages stored to 1 page. Of course, this limits the compression ratio to maximum 2x. The default and supported from the begging of this kernel feature.
zswap.zpool=zbud
Or just enable it from /proc filesystem (runtime enable, not possible in old kernels):
echo zbud > /sys/module/zswap/parameters/zpool
Again, this is the default allocator, so you do not need to do anything of the above if you have never changed it and you want to use it. This parameter could be changed at later stage on-the-fly (i.e. runtime, but not in some old kernels) and it just creates a second pool, the old one is removed, when all of the pages are removed.
- z3fold – 3 compressed pages stored to 1 page. Of course, this limits the compression ratio to maximum 3x. Supported from kernel version.
zswap.zpool=z3fold
Or just enable it from /proc filesystem (runtime enable, not possible in old kernels):
echo z3fold > /sys/module/zswap/parameters/zpool
In some system the z3fold may have to load manually before using it in echo command to the /sys file system:
modprobe z3fold echo z3fold > /sys/module/zswap/parameters/zpool
Compression algorithms
The default algorithm is “lzo” and it could be changed at boot time or runtime. A better, but the slightly slower algorithm is “lz4”. There are two more algorithms lz4hc and deflate. This parameter could be changed at later stage on-the-fly (i.e. runtime, but not in some old kernels) and it just creates a second pool, the old one is removed, when all of the pages are removed.
Boot parameters:
zswap.compressor=lz4
Or just enable it from /proc filesystem (runtime enable, not possible in old kernels):
echo lz4 > /sys/module/zswap/parameters/compressor
Example 1 – Kernel boot parameters
In this case, you must reboot the machine and for some Linux distribution, you may have to enable early module loading (like in Ubuntu) and regenerate the initramfs file!
- Kernel boot parameters in /etc/default/grub. Add the following to the GRUB_CMDLINE_LINUX parameter:
zswap.enabled=1 zswap.max_pool_percent=20 zswap.zpool=z3fold zswap.compressor=lz4
- Add the z3fold and lz4 to the module loading in initramfs and regenerate the initramfs (better load them with modprobe and then regenerate the initramfs). Different for the different Linux Distributions – coming soon.
- Regenerate the grub2 configuration file grub.cfg in /boot directory. Different for the different Linux Distributions – coming soon.
Runtime loading. Put the following lines in some start up init or systemd script:
echo z3fold > /sys/module/zswap/parameters/zpool echo lz4 > /sys/module/zswap/parameters/compressor echo 10 > /sys/module/zswap/parameters/max_pool_percent echo 1 > /sys/module/zswap/parameters/enabled
Statistics and debug information
Various papers claim the compression ratio for z3fold averages 2.7 and for zbud is 1.7.
You can see some statistics with (must be with the root user):
srv1 ~ # cd /sys/kernel/debug/zswap srv1 zswap # perl -E "say $(cat stored_pages) * 4096 / $(cat pool_total_size)" 2.67377031535895 srv1 zswap # for i in `ls`;do echo -n "$i = "; cat $i;done duplicate_entry = 0 pool_limit_hit = 0 pool_total_size = 76111872 reject_alloc_fail = 0 reject_compress_poor = 12 reject_kmemcache_fail = 0 reject_reclaim_fail = 0 stored_pages = 49685 written_back_pages = 0 srv1 zswap # uptime 07:08:48 up 75 days, 1:18, 1 user, load average: 0.40, 0.34, 0.40
As you can see in our DNS server the compression ratio is 2.67x. We are using “zswap.enabled=1 zswap.max_pool_percent=20 zswap.zpool=z3fold”.
My daily laptop using z3fold, lz4 and 10% memory pool (1.6G RAM) and the compression ratio is 2.84x for the uptime of 41 days.
mypc ~ # cd /sys/kernel/debug/zswap mypc zswap # perl -E "say $(cat stored_pages) * 4096 / $(cat pool_total_size)" 2.84956333565191 mypc zswap # for i in `ls`;do echo -n "$i = "; cat $i;done duplicate_entry = 0 pool_limit_hit = 0 pool_total_size = 105996288 reject_alloc_fail = 0 reject_compress_poor = 56 reject_kmemcache_fail = 0 reject_reclaim_fail = 0 same_filled_pages = 8408 stored_pages = 73741 written_back_pages = 0 mypc zswap # uptime 17:07:25 up 41 days, 23:14, 19 users, load average: 0.46, 0.61, 0.64
All parameters of kernel version 5.2, which you can see
/sys/kernel/slab/zswap_entry/ /sys/kernel/slab/zswap_entry/remote_node_defrag_ratio /sys/kernel/slab/zswap_entry/total_objects /sys/kernel/slab/zswap_entry/alloc_calls /sys/kernel/slab/zswap_entry/cpu_slabs /sys/kernel/slab/zswap_entry/objects /sys/kernel/slab/zswap_entry/objects_partial /sys/kernel/slab/zswap_entry/cpu_partial /sys/kernel/slab/zswap_entry/validate /sys/kernel/slab/zswap_entry/free_calls /sys/kernel/slab/zswap_entry/min_partial /sys/kernel/slab/zswap_entry/poison /sys/kernel/slab/zswap_entry/red_zone /sys/kernel/slab/zswap_entry/slabs /sys/kernel/slab/zswap_entry/destroy_by_rcu /sys/kernel/slab/zswap_entry/usersize /sys/kernel/slab/zswap_entry/sanity_checks /sys/kernel/slab/zswap_entry/align /sys/kernel/slab/zswap_entry/aliases /sys/kernel/slab/zswap_entry/store_user /sys/kernel/slab/zswap_entry/trace /sys/kernel/slab/zswap_entry/reclaim_account /sys/kernel/slab/zswap_entry/order /sys/kernel/slab/zswap_entry/object_size /sys/kernel/slab/zswap_entry/shrink /sys/kernel/slab/zswap_entry/hwcache_align /sys/kernel/slab/zswap_entry/objs_per_slab /sys/kernel/slab/zswap_entry/partial /sys/kernel/slab/zswap_entry/slabs_cpu_partial /sys/kernel/slab/zswap_entry/ctor /sys/kernel/slab/zswap_entry/slab_size /sys/kernel/slab/zswap_entry/cache_dma /sys/kernel/debug/zswap /sys/kernel/debug/zswap/same_filled_pages /sys/kernel/debug/zswap/stored_pages /sys/kernel/debug/zswap/pool_total_size /sys/kernel/debug/zswap/duplicate_entry /sys/kernel/debug/zswap/written_back_pages /sys/kernel/debug/zswap/reject_compress_poor /sys/kernel/debug/zswap/reject_kmemcache_fail /sys/kernel/debug/zswap/reject_alloc_fail /sys/kernel/debug/zswap/reject_reclaim_fail /sys/kernel/debug/zswap/pool_limit_hit /sys/module/zswap /sys/module/zswap/uevent /sys/module/zswap/parameters /sys/module/zswap/parameters/same_filled_pages_enabled /sys/module/zswap/parameters/enabled /sys/module/zswap/parameters/max_pool_percent /sys/module/zswap/parameters/compressor /sys/module/zswap/parameters/zpool
Troubleshouting
[649321.290732] zswap: zpool z3fold not available
The above error shows on some old kernels, which needs to load explicitly the module z3fold with modprobe:
modprobe z3fold
And during boot you must load the module earlier in initramfs. Here is for ubuntu:
modprobe z3fold echo z3fold >> /etc/initramfs-tools/modules update-grub2