Degraded ext4 performance with millions files because of high system time in ext4_mb_good_group (100% kworker/u32+flush)

A strange behavior has been monitored in a storage server holding more than 20 millions of files, which is not a really big figure, because we have storage with more than 300 millions of files. The server has a RAID5 of 4 x 10T HGST Ultrastar He10 and storage partition formatted with ext4 filesystem with total of 27T size. Despite the inodes are only 7% and the free space is at 80%, the server began to experience high load averages with almost no IO utilization across the disks and the RAID device.
So the numbers are:

  • 20% free space.
  • 93% free inodes.
  • no IO disk utilization above 10-20% of any of the disks. In fact, the IO above 10% is very rear.
  • really high loads above the number of the server’s cores. The server has 8 virtual processors under Linux and the loads are constantly above 10-14.
  • no Disk Sleep blocked processes or Running to justify the high load.
  • the top command reports one or two kernel processes with 100% running – kworker/u32:1+flush-9:3 (so flushing the data takes so many time?)

If the problem were related to the disks or hardware, it would probably have been seen a high utilization across the disks, because the physical operations would need more time to execute. The high IO utilization across any disk, in deed, may result in a high system time (i.e. kernel time). Apparently, there is no high IO utilization across the disks and even no Disk Sleep blocked processes.
The FlameGraph tool may help to suggest the area where the problem related to.
First, install the git and perf tool (there are two interesting links about it with examples – link1 and link2):

dnf install -y perf git perl-open
git clone https://github.com/brendangregg/FlameGraph

Save a sample 60 second data with:

[root@srv ~]# perf record -F 99 -a -g -- sleep 60
[ perf record: Woken up 24 times to write data ]
[ perf record: Captured and wrote 7.132 MB perf.data (35687 samples) ]

And generate a SVG image of all the function, which are executed during this time frame of 60 seconds. It will give a good picture of what has been executed and how much time it took in the kernel and user space.

[root@srv ~]# cd FlameGraph
[root@srv FlameGraph]# ./stackcollapse-perf.pl ../out.perf > out.folded
[root@srv FlameGraph]# ./flamegraph.pl out.folded > kernel.svg

The SVG are shared bellow. It turned out most of the time during this sample period of 60 seconds, the Linux kernel was in ext4_mb_good_group, which is related to the EXT4 file system.
What fixed the problem (for now) – By removing files it was noticed the system time went down and the load went down, too. Then, after removing approximately a million files, the server performance returned to normal as the average other storage server (this one has unique hardware setup). So by removing files, i.e. INODES from the ext4 file system, the load average and the server ext4 performance were completely different from what it had been before.
The interesting part the inodes never went above 7% (around 24 million of 454 million available) and the free space never went down 20%, but when a good deal of files were removed, the bad ext4 performance disappeared. It still could be related to the physical disks or some strange ext4 bug, but this article is intended to report this behavior and to propose a some kind of a solution (even though a workaround).

SCREENSHOT 1) The kernel process running at 100% – kworker/u32:2-flush-9:3.

main menu
kernel process running at 100%

SCREENSHOT 2) Almost half of the time was spent at ext4 related calls.

main menu
ext4 and ext4_mb_good_group half of the time spent

Keep on reading!

Change found sources for kernel version when packages need the kernel sources to compile

Multiple Gentoo packages may need kernel sources to compile. There are packages, which are external modules such as virtualbox-modules or video drivers or wifi drivers or more. All these packages expect the current loaded kernel sources are present and to use them when compiling the external kernel module. But sometimes the proper kernel sources are missing, those needed to compile the kernel module in such a way to load it in the currently loaded kernel.

This article is valid not only for Gentoo Linux distribution but any Linux and kernel sources. So, if the user needs to have properly configured kernel sources for the currently loaded kernel, this is one way to do it right.

Here is an example: Updated kernel, but no sources are kept and then the VirtualBox needs to update to a newer version, but with the missing kernel sources of the currently loaded kernel updating the VirtualBox will cause the VirtualBox to stop working!

root@srv ~ # uname -a
Linux srv 5.15.5-gentoo #2 SMP Tue Nov 30 16:08:49 EET 2021 x86_64 Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz GenuineIntel GNU/Linux
root@srv ~ # emerge -va app-emulation/virtualbox-modules

These are the packages that would be merged, in order:

[ebuild     U  ] app-emulation/virtualbox-modules-6.1.32:0/6.1::gentoo [6.1.26:0/6.1::gentoo] USE="dist-kernel -pax-kernel" 660 KiB

Total: 1 packages (1 upgrades), Size of downloads: 0 KiB

Would you like to merge these packages? [Yes/No] yes

>>> Verifying ebuild manifests

>>> Running pre-merge checks for app-emulation/virtualbox-6.1.32-r1

>>> Emerging (1 of 1) app-emulation/virtualbox-modules-6.1.32::gentoo
 * Fetching files in the background.
 * To view fetch progress, run in another terminal:
 * tail -f /var/log/emerge-fetch.log
 * vbox-kernel-module-src-6.1.32.tar.xz BLAKE2B SHA512 size ;-) ...                                                                                                                   [ ok ]
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Found sources for kernel version:
 *     5.14.2-gentoo-x86_64-genkernel-NEW2
 * Checking for suitable kernel configuration options...                                                                                                                              [ ok ]
>>> Unpacking source...
>>> Unpacking vbox-kernel-module-src-6.1.32.tar.xz to /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work
>>> Source unpacked in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work
>>> Preparing source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...
>>> Source configured.
>>> Compiling source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...

Here is the problem: the currently loaded kernel is version 5.15.5-gentoo and the emerge system finds only sources for 5.14.2-gentoo-x86_64-genkernel-NEW2, which will use to produce modules for the 5.14.2-gentoo-x86_64-genkernel-NEW2. It is obvious enough the modules compiled against kernel sources of 5.14.2-gentoo-x86_64-genkernel-NEW2 version won’t be possible to be loaded under the currently load kernel with version 5.15.5-gentoo.
Here is how to fix this:

  1. Get the kernel sources for 5.15.5-gentoo in /usr/src/linux
  2. Save the currently loaded kernel config in /usr/src/linux/.config
  3. Load the configuration and prepare the kernel sources. No need to compile the kernel sources.

STEP 1) Get the kernel sources for 5.15.5-gentoo

emerge -v =gentoo-sources-5.15.5
rm -f /usr/src/linux
ln -s /usr/src/linux-5.15.5-gentoo /usr/src/linux

These commands will install the needed kernel version and a link to the kernel sources will be created.
Of course, change the kernel version to the proper version if needed.

STEP 2) Save the currently loaded kernel config in /usr/src/linux/.config

zcat /proc/config.gz > /usr/src/linux/.config

If the /proc/config.gz is missing, copy the configuration from the /boot for the currently loaded kernel:

cat /boot/config-5.15.5-gentoo > /usr/src/linux-5.15.5-gentoo/.config

STEP 3) Load the configuration and prepare the kernel sources.

No need to compile the whole kernel source tree. Just to commands to configure and prepare the kernel sources:

cd /usr/src/linux-5.15.5-gentoo
make oldconfig
make prepare

The commands take Here is the output of the last commands:

root@srv1 ~ # cd /usr/src/linux-5.15.5-gentoo
root@srv1 linux # make oldconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
root@srv1 linux # make prepare
  SYNC    include/config/auto.conf.cmd
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTLD  arch/x86/tools/relocs
  HOSTCC  scripts/selinux/genheaders/genheaders
  HOSTCC  scripts/selinux/mdp/mdp
  HOSTCC  scripts/bin2c
  HOSTCC  scripts/kallsyms
  HOSTCC  scripts/sorttable
  HOSTCC  scripts/asn1_compiler
  HOSTCC  scripts/extract-cert
  UPD     include/config/kernel.release
  UPD     include/generated/utsrelease.h
  CC      scripts/mod/empty.o
  HOSTCC  scripts/mod/mk_elfconfig
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  CC      scripts/mod/devicetable-offsets.s
  HOSTCC  scripts/mod/file2alias.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTLD  scripts/mod/modpost
  CC      kernel/bounds.s
  UPD     include/generated/bounds.h
  CC      arch/x86/kernel/asm-offsets.s
  UPD     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND objtool
  HOSTCC  /usr/src/linux-5.15.5-gentoo/tools/objtool/fixdep.o
  HOSTLD  /usr/src/linux-5.15.5-gentoo/tools/objtool/fixdep-in.o
  LINK    /usr/src/linux-5.15.5-gentoo/tools/objtool/fixdep
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/exec-cmd.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/help.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/pager.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/parse-options.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/run-command.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/sigchain.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/subcmd-config.o
  LD      /usr/src/linux-5.15.5-gentoo/tools/objtool/libsubcmd-in.o
  AR      /usr/src/linux-5.15.5-gentoo/tools/objtool/libsubcmd.a
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/arch/x86/special.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/arch/x86/decode.o
  LD      /usr/src/linux-5.15.5-gentoo/tools/objtool/arch/x86/objtool-in.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/weak.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/check.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/special.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/orc_gen.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/orc_dump.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/builtin-check.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/builtin-orc.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/elf.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/objtool.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/libstring.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/libctype.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/str_error_r.o
  CC      /usr/src/linux-5.15.5-gentoo/tools/objtool/librbtree.o
  LD      /usr/src/linux-5.15.5-gentoo/tools/objtool/objtool-in.o
  LINK    /usr/src/linux-5.15.5-gentoo/tools/objtool/objtool

And from now on whenever the kernel sources are needed to compile modules or libraries against, the proper kernel sources will be used of the currently loaded kernel.
The Gentoo emerge command from the begging of this article, but this time with the properly configured kernel sources. The VirtualBox modules are compiled against the loaded kernel, so loading them is not an issue anymore!

root@srv ~ # uname -a
Linux srv 5.15.5-gentoo #2 SMP Tue Nov 30 16:08:49 EET 2021 x86_64 Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz GenuineIntel GNU/Linux
root@srv ~ # emerge -va app-emulation/virtualbox-modules

These are the packages that would be merged, in order:

[ebuild     U  ] app-emulation/virtualbox-modules-6.1.32:0/6.1::gentoo [6.1.26:0/6.1::gentoo] USE="dist-kernel -pax-kernel" 660 KiB

Total: 1 packages (1 upgrades), Size of downloads: 0 KiB

Would you like to merge these packages? [Yes/No] yes

>>> Verifying ebuild manifests

>>> Running pre-merge checks for app-emulation/virtualbox-6.1.32-r1

>>> Emerging (1 of 1) app-emulation/virtualbox-modules-6.1.32::gentoo
 * Fetching files in the background.
 * To view fetch progress, run in another terminal:
 * tail -f /var/log/emerge-fetch.log
 * vbox-kernel-module-src-6.1.32.tar.xz BLAKE2B SHA512 size ;-) ...                                                                                                                   [ ok ]
 * Determining the location of the kernel source code
 * Found kernel source directory:
 *     /usr/src/linux
 * Found sources for kernel version:
 *     5.15.5-gentoo-gentoo
 * Checking for suitable kernel configuration options...                                                                                                                              [ ok ]
>>> Unpacking source...
>>> Unpacking vbox-kernel-module-src-6.1.32.tar.xz to /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work
>>> Source unpacked in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work
>>> Preparing source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...
>>> Source configured.
>>> Compiling source in /var/tmp/portage/app-emulation/virtualbox-modules-6.1.32/work ...
....
....
root@srv ~ # modprobe vboxdrv
Feb 15 14:15:10 www kernel: vboxdrv: loading out-of-tree module taints kernel.
Feb 15 14:15:10 www kernel: vboxdrv: Found 4 processor cores
Feb 15 14:15:10 www kernel: vboxdrv: TSC mode is Invariant, tentative frequency 2394461773 Hz
Feb 15 14:15:10 www kernel: vboxdrv: Successfully loaded version 6.1.32 r149290 (interface 0x00320000)
Feb 15 14:15:10 www kernel: VBoxNetFlt: Successfully started.

Kernel loads only single processor on multi-processor system – ACPI: x2apic entry ignored

There multiple reports on this issue with different processors

kernel, which worked perfectly on multiple systems, loads on our new server and only one processor is shown

Just for the record, the SMP is enabled in the kernel (and in the BIOS – Hyperthreading and multicores are enabled, too):

root@srv ~ # zcat /proc/config.gz | grep 'CONFIG_SMP'
CONFIG_SMP=y

The problem is x2APIC Support in the BIOS of your server.

Apparently, our kernel (version 4.18.12) missed the kernel feature:

root@srv ~ # zcat /proc/config.gz | grep -i 'x2apic'
root@srv

You can see no kernel configuration entry “CONFIG_X86_X2APIC=y” is shown from the above command.

And if your BIOS enables the support of x2APIC you may end up with just one processor under Linux.

This was the case in our server. The x2APIC support is enabled in the BIOS and our kernel (version 4.18.12) does not have CONFIG_X86_X2APIC enabled.
To fix this issue you first might disable the feature in the BIOS and you are going to have all your processors shown and they could be used to compile fast your new kernel (of course, in the case you use custom kernel) after you enable the feature in the kernel CONFIG_X86_X2APIC, which is under
Kernel Configuration —> [*] Support x2apic. The asterisk means it is enabled, so build your kernel. Check and enable this “Device Drivers –> IOMMU Hardware Support –> Support for Interrupt Remapping”, too.
Here you can see how to enable and disable processor x2APIC support in HP ProLiant DL160 Gen9 (2 processors Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz) – Enable or Disable the processor x2APIC support in HP ProLiant DL160 Gen9.
Keep on reading!

Rebuild the official Ubuntu kernel – Ubuntu 16.04 LTS

There are multiple reasons to rebuild the official kernel of a Linux distro but this is not the purpose of this article

just cannot miss the chance to write that all the kernels are built therefore optimized for the very first 64bit Intel/AMD processor! But come on who wants the most important piece of software to be optimized not for his new and expensive processor but for one released 15 years ago???

. Here we are going to show you how to recompile the latest official Ubuntu kernel of Ubuntu 16.04 LTS – the one, which comes with the apt packages system, because this kernel comes with the latest and greatest patches of the Ubuntu team. You should not confuse this howto with the one, which compile a vanilla kernel or the mainline kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/. So if you want a new kernel or the latest from kernel.org this is the right tutorial for Ubuntu – Build your own kernel under Ubuntu using mainline (latest) kernel. The official latest kernel in the Ubuntu repository is not always the latest one from kernel.org, but you can be sure it is probably most secure one, because there are additional modifications, configurations and tests by the team. Here you can see what versions of the kernel are the officials in Ubuntu: https://wiki.ubuntu.com/Kernel/Support

Keep on reading!

Ubuntu 16/18 LTS – load a new kernel without rebooting the server

Here are the commands needed to load a new kernel without rebooting your server or desktop computer. Why you need this? As said in our first article for CentOS 7 – sometime rebooting a server could take 5 to 10 minutes and loading a new kernel is just up to a minute. In fact in most cases loading the new kernel and starting the system then is just under 20-30 seconds, so upgrading your server even with new kernel is super easy lately. We tested it on Ubuntu 16 and Ubuntu 18 servers and it was successful. The system uses systemd and the process is really easy and safe for the systems.

When the processes is initiated the system shutdowns normally (shutting down all running service with systemd) and then load the system immediately with the new kernel and starts the services as usual!

So no need to worry about unflushed data or not proper shutdown of a service! It’s like a normal reboot but without a hardware reboot and is a lot faster!
Here is what is required to load a kernel without hardware rebooting your computer box:

  1. kexec-tools
  2. Load the new kernel, initram file and the command line arguments with “kexec”
  3. Start a systemd target – kexec.target

Ubuntu 16/18 LTS using kexec to load a new kernel

The real commands only for Ubuntu 16/18 LTS:

sudo apt -y install kexec-tools
sudo kexec -l /boot/vmlinuz-4.15.0-33-generic --initrd=/boot/initrd.img-4.15.0-33-generic --command-line="root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro  quiet splash"
sudo systemctl start kexec.target

Here is a real world example with all the output:
And again update your system to see if there is a new kernel and install “kexec-tools”. In our case indeed there is a new kernel – vmlinuz-4.15.0-33-generic

myuser@srv:~$ sudo apt -y upgrade
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following package was automatically installed and is no longer required:
  libllvm5.0
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  amd64-microcode autotools-dev dblatex debhelper dh-strip-nondeterminism docbook-dsssl docbook-utils docbook-xml docbook-xsl fonts-lato fonts-lmodern fonts-texgyre
  intel-microcode iucode-tool jadetex javascript-common kernel-common kernel-package kernel-patch-scripts kernel-wedge kerneloops kerneloops-applet kernelshark kerneltop
  libfile-homedir-perl libfile-stripnondeterminism-perl libfile-which-perl libjs-jquery libllvm6.0 libmail-sendmail-perl libosp5 libostyle1c2 libpotrace0 libptexenc1
  libqpdf21 libruby2.3 libsgmls-perl libsp1c2 libsynctex1 libsys-hostname-long-perl libtexlua52 libtexluajit2 libwebpdemux1 libxml2-utils libzzip-0-13
  linux-headers-4.15.0-33 linux-headers-4.15.0-33-generic linux-image-4.15.0-33-generic linux-modules-4.15.0-33-generic linux-modules-extra-4.15.0-33-generic lmodern lynx
  lynx-common openjade po-debconf preview-latex-style prosper ps2eps python-apt rake ruby ruby-did-you-mean ruby-minitest ruby-net-telnet ruby-power-assert ruby-test-unit
  ruby2.3 rubygems-integration sgml-data sgmlspl sp tex-common tex-gyre texlive texlive-base texlive-bibtex-extra texlive-binaries texlive-extra-utils texlive-font-utils
  texlive-fonts-recommended texlive-fonts-recommended-doc texlive-generic-recommended texlive-latex-base texlive-latex-base-doc texlive-latex-extra
  texlive-latex-extra-doc texlive-latex-recommended texlive-latex-recommended-doc texlive-luatex texlive-math-extra texlive-pictures texlive-pictures-doc texlive-pstricks
  texlive-pstricks-doc tipa trace-cmd xmlto xsltproc
The following packages will be upgraded:
.....
.....
Running mktexlsr /var/lib/texmf ... done.
Building format(s) --all.
        This may take some time... done.
Processing triggers for linux-image-4.15.0-33-generic (4.15.0-33.36~16.04.1) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-33-generic
/etc/kernel/postinst.d/vboxadd:
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel modules.
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0-33-generic
Found initrd image: /boot/initrd.img-4.15.0-33-generic
Found linux image: /boot/vmlinuz-4.13.0-36-generic
Found initrd image: /boot/initrd.img-4.13.0-36-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done

myuser@srv:~$ sudo apt -y install kexec-tools
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libllvm5.0
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  kexec-tools
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 77,4 kB of archives.
After this operation, 276 kB of additional disk space will be used.
Get:1 http://bg.archive.ubuntu.com/ubuntu xenial-updates/main amd64 kexec-tools amd64 1:2.0.16-1ubuntu1~16.04.1 [77,4 kB]
Fetched 77,4 kB in 0s (707 kB/s)      
Preconfiguring packages ...
Selecting previously unselected package kexec-tools.
(Reading database ... 253895 files and directories currently installed.)
Preparing to unpack .../kexec-tools_1%3a2.0.16-1ubuntu1~16.04.1_amd64.deb ...
Unpacking kexec-tools (1:2.0.16-1ubuntu1~16.04.1) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for systemd (229-4ubuntu21.4) ...
Processing triggers for ureadahead (0.100.0-19) ...
Setting up kexec-tools (1:2.0.16-1ubuntu1~16.04.1) ...
Generating /etc/default/kexec...
Processing triggers for systemd (229-4ubuntu21.4) ...
Processing triggers for ureadahead (0.100.0-19) ...

     ┌──────────────────────────────────────────────────────────────────┤ Configuring kexec-tools ├───────────────────────────────────────────────────────────────────┐
     │                                                                                                                                                                │ 
     │ If you choose this option, a system reboot will trigger a restart into a kernel loaded by kexec instead of going through the full system boot loader process.  │ 
     │                                                                                                                                                                │ 
     │ Should kexec-tools handle reboots (sysvinit only)?                                                                                                             │ 
     │                                                                                                                                                                │ 
     │                                                 <Yes>                                                    <No>                                                  │ 
     │                                                                                                                                                                │ 
     └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 

On the above configuration question mark “” and press Enter.

So we performed an update and there was a new kernel vmlinuz-4.15.0-33-generic, which we would like to load without hardware reboot.
Here is our new kernel in “/boot”

myuser@srv:~$ ls -altr /boot/
total 130420
-rw-r--r--  1 root root   184840 Jan 28  2016 memtest86+_multiboot.bin
-rw-r--r--  1 root root   184380 Jan 28  2016 memtest86+.elf
-rw-r--r--  1 root root   182704 Jan 28  2016 memtest86+.bin
-rw-------  1 root root  3879946 Feb 17  2018 System.map-4.13.0-36-generic
-rw-r--r--  1 root root     2850 Feb 17  2018 retpoline-4.13.0-36-generic
-rw-r--r--  1 root root   213220 Feb 17  2018 config-4.13.0-36-generic
-rw-r--r--  1 root root  1501359 Feb 17  2018 abi-4.13.0-36-generic
-rw-r--r--  1 root root  7710912 May 17 16:50 vmlinuz-4.13.0-36-generic
-rw-------  1 root root  4041375 Aug 16 00:00 System.map-4.15.0-33-generic
-rw-r--r--  1 root root        0 Aug 16 00:00 retpoline-4.15.0-33-generic
-rw-r--r--  1 root root   216913 Aug 16 00:00 config-4.15.0-33-generic
-rw-r--r--  1 root root  1537455 Aug 16 00:00 abi-4.15.0-33-generic
-rw-------  1 root root  8108600 Aug 16 21:58 vmlinuz-4.15.0-33-generic
drwxr-xr-x 24 root root     4096 Sep  7 14:15 ..
-rw-r--r--  1 root root 51290506 Sep  7 14:15 initrd.img-4.13.0-36-generic
-rw-r--r--  1 root root 54451680 Sep  7 14:15 initrd.img-4.15.0-33-generic
drwxr-xr-x  3 root root     4096 Sep  7 14:15 .
drwxr-xr-x  5 root root     4096 Sep  7 14:16 grub

Now we know the kernel and initram file names we just check the kernel arguments in the kernel, load them with kexec and start an systemd target to load the new kernel:

myuser@srv:~$ grep vmlinuz-4.15.0-33-generic /boot/grub/grub.cfg 
        linux   /boot/vmlinuz-4.15.0-33-generic root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro  quiet splash $vt_handoff
                linux   /boot/vmlinuz-4.15.0-33-generic root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro  quiet splash $vt_handoff
                linux   /boot/vmlinuz-4.15.0-33-generic root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro  quiet splash $vt_handoff init=/sbin/upstart
                linux   /boot/vmlinuz-4.15.0-33-generic root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro recovery nomodeset 
myuser@srv:~$ sudo kexec -l /boot/vmlinuz-4.15.0-33-generic --initrd=/boot/initrd.img-4.15.0-33-generic --command-line="root=UUID=061b2936-34bf-4da3-b7d2-b8bde0899f03 ro  quiet splash"
myuser@srv:~$ sudo systemctl start kexec.target
Connection to srv closed by remote host.
Connection to srv closed.

Use the first line of the grep output above (or you can cat the file and see what is in it if you have any doubts) to take the proper kernel boot arguments and do not include anything starting with “$”.

As you can see systemd performs a normal shutdown of all services and targets.

main menu
Normal shutdown

The ssh connection is immediately closed because the reboot is initiated.
After 10-15 seconds our host is alive and the new kernel is loaded successfully:

root@test ~ $ ssh root@srv
root@srv's password: 
Last login: Wed Sep  5 17:15:08 2018 from test
[root@srv ~]# uname -a
Linux srv.local 4.15.0-33-generic #36~16.04.1-Ubuntu SMP Wed Aug 15 17:21:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@srv ~]# 

Because we do not wanted to mess up the two output in one article we decided to split it in two separate ones, so here is the previous one for CentOS 7 – “CentOS 7 – load a new kernel without rebooting the CentOS 7 server”

CentOS 7 – load a new kernel without rebooting the server

Here are the commands needed to load a new kernel without rebooting your server or desktop computer. Why you need this? Sometime rebooting a server could take 5 to 10 minutes and loading a new kernel is just up to a minute. In fact in most cases loading the new kernel and starting the system then is just under 20-30 seconds, so upgrading your server even with new kernel is super easy lately. We tested it on CentOS 7 server and it was successful. The system uses systemd and the process is really easy and safe for the systems.

When the processes is initiated the system shutdowns normally (shutting down all running service with systemd) and then load the system immediately with the new kernel and starts the services as usual!

So no need to worry about unflushed data or not proper shutdown of a service! It’s like a normal reboot but without a hardware reboot and is a lot faster!
Here is what is required to load a kernel without hardware rebooting your computer box:

  1. kexec-tools
  2. Load the new kernel, initram file and the command line arguments with “kexec”
  3. Start a systemd target – kexec.target

CentOS 7 using kexec to load a new kernel

The real commands only for CentOS 7:

yum install -y kexec-tools
kexec -l /boot/vmlinuz-3.10.0-862.11.6.el7.x86_64 --initrd=/boot/initramfs-3.10.0-862.11.6.el7.x86_64.img --command-line="root=/dev/mapper/centos_srv-root ro crashkernel=auto rd.lvm.lv=centos_srv/root rd.lvm.lv=centos_srv/swap rhgb quiet LANG=en_US.UTF-8"
systemctl start kexec.target

Here is a real world example with all the output:
As you can see it is important to load the initram file and the exact arguments to the kernel. You should take them from the grub 2 configuration – /boot/grub2/grub.cfg
Here is the real output:

[root@srv ~]# yum -y update
Loaded plugins: fastestmirror
Determining fastest mirrors
 * base: mirrors.neterra.net
 * extras: mirrors.neterra.net
 * updates: mirrors.neterra.net
base                                                                                                                                                 | 3.6 kB  00:00:00     
centos-sclo-rh                                                                                                                                       | 3.0 kB  00:00:00     
centos-sclo-sclo                                                                                                                                     | 2.9 kB  00:00:00     
extras                                                                                                                                               | 3.4 kB  00:00:00     
updates                                                                                                                                              | 3.4 kB  00:00:00     
(1/4): extras/7/x86_64/primary_db                                                                                                                    | 187 kB  00:00:00     
(2/4): updates/7/x86_64/primary_db                                                                                                                   | 5.2 MB  00:00:01     
(3/4): centos-sclo-sclo/x86_64/primary_db                                                                                                            | 292 kB  00:00:01     
(4/4): centos-sclo-rh/x86_64/primary_db                                                                                                              | 3.7 MB  00:00:02     
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:3.10.0-862.11.6.el7 will be installed
---> Package kernel-headers.x86_64 0:3.10.0-862.3.2.el7 will be updated
---> Package kernel-headers.x86_64 0:3.10.0-862.11.6.el7 will be an update
---> Package kernel-tools.x86_64 0:3.10.0-862.3.2.el7 will be updated
---> Package kernel-tools.x86_64 0:3.10.0-862.11.6.el7 will be an update
---> Package kernel-tools-libs.x86_64 0:3.10.0-862.3.2.el7 will be updated
---> Package kernel-tools-libs.x86_64 0:3.10.0-862.11.6.el7 will be an update
....
....
[root@srv ~]# yum install -y kexec-tools
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.neterra.net
 * extras: mirrors.neterra.net
 * updates: mirrors.neterra.net
Package kexec-tools-2.0.15-13.el7.x86_64 already installed and latest version
Nothing to do

So we performed an update and there was a new kernel kernel.x86_64 0:3.10.0-862.11.6.el7, which we would like to load without hardware reboot.
Here is our new kernel in “/boot”

[root@srv ~]# ls -altr /boot/
total 171812
-rw-------.  1 root root  3228420 22 Aug  2017 System.map-3.10.0-693.el7.x86_64
-rw-r--r--.  1 root root   140894 22 Aug  2017 config-3.10.0-693.el7.x86_64
-rw-r--r--.  1 root root      166 22 Aug  2017 .vmlinuz-3.10.0-693.el7.x86_64.hmac
-rwxr-xr-x.  1 root root  5877760 22 Aug  2017 vmlinuz-3.10.0-693.el7.x86_64
-rw-r--r--.  1 root root   293027 22 Aug  2017 symvers-3.10.0-693.el7.x86_64.gz
drwxr-xr-x.  3 root root       17 20 Feb  2018 efi
drwxr-xr-x.  2 root root       27 20 Feb  2018 grub
-rw-r--r--.  1 root root   611343 20 Feb  2018 initrd-plymouth.img
-rw-------.  1 root root 51315067 20 Feb  2018 initramfs-0-rescue-cc7889764e86441b8d1eb54e29e81a91.img
-rwxr-xr-x.  1 root root  5877760 20 Feb  2018 vmlinuz-0-rescue-cc7889764e86441b8d1eb54e29e81a91
-rw-------.  1 root root  3409912 21 May 23,50 System.map-3.10.0-862.3.2.el7.x86_64
-rw-r--r--.  1 root root   147823 21 May 23,50 config-3.10.0-862.3.2.el7.x86_64
-rw-r--r--.  1 root root      170 21 May 23,50 .vmlinuz-3.10.0-862.3.2.el7.x86_64.hmac
-rwxr-xr-x.  1 root root  6228832 21 May 23,50 vmlinuz-3.10.0-862.3.2.el7.x86_64
-rw-r--r--.  1 root root   304943 21 May 23,52 symvers-3.10.0-862.3.2.el7.x86_64.gz
dr-xr-xr-x. 17 root root      224 14 Jun  7,18 ..
-rw-------.  1 root root 12997841 14 Jun  7,19 initramfs-3.10.0-693.el7.x86_64kdump.img
-rw-------.  1 root root 20771492 14 Jun  7,20 initramfs-3.10.0-693.el7.x86_64.img
-rw-------.  1 root root 13007444 14 Jun  7,23 initramfs-3.10.0-862.3.2.el7.x86_64kdump.img
-rw-r--r--.  1 root root   147859 14 Aug 22,02 config-3.10.0-862.11.6.el7.x86_64
-rw-------.  1 root root  3414344 14 Aug 22,02 System.map-3.10.0-862.11.6.el7.x86_64
-rw-r--r--.  1 root root      171 14 Aug 22,02 .vmlinuz-3.10.0-862.11.6.el7.x86_64.hmac
-rwxr-xr-x.  1 root root  6242208 14 Aug 22,02 vmlinuz-3.10.0-862.11.6.el7.x86_64
-rw-r--r--.  1 root root   305158 14 Aug 22,05 symvers-3.10.0-862.11.6.el7.x86_64.gz
-rw-------.  1 root root 20774393  5 Sep 15,20 initramfs-3.10.0-862.11.6.el7.x86_64.img
drwx------.  5 root root       97  5 Sep 15,20 grub2
dr-xr-xr-x.  5 root root     4096  5 Sep 15,21 .
-rw-------.  1 root root 20782404  5 Sep 15,21 initramfs-3.10.0-862.3.2.el7.x86_64.img

Now we know the kernel and initram file names we just check the kernel arguments in the kernel, load them with kexec and start an systemd target to load the new kernel:

[root@srv ~]# grep vmlinuz-3.10.0-862.11.6.el7.x86_64 /boot/grub2/grub.cfg 
        linux16 /vmlinuz-3.10.0-862.11.6.el7.x86_64 root=/dev/mapper/centos_srv-root ro crashkernel=auto rd.lvm.lv=centos_srv/root rd.lvm.lv=centos_srv/swap rhgb quiet LANG=en_US.UTF-8
[root@srv ~]# kexec -l /boot/vmlinuz-3.10.0-862.11.6.el7.x86_64 --initrd=/boot/initramfs-3.10.0-862.11.6.el7.x86_64.img --command-line="root=/dev/mapper/centos_srv-root ro crashkernel=auto rd.lvm.lv=centos_srv/root rd.lvm.lv=centos_srv/swap rhgb quiet LANG=en_US.UTF-8"
[root@srv ~]# systemctl start kexec.target
Connection to srv closed by remote host.
Connection to srv closed.

As you can see systemd performs a normal shutdown of all services and targets.

main menu
Normal Shutdown

The ssh connection is immediately closed because the reboot is initiated.
After 10-15 seconds our host is alive and the new kernel is loaded successfully:

root@test ~ $ ssh root@srv
root@srv's password: 
Last login: Wed Sep  5 15:16:08 2018 from test
[root@srv ~]# uname -a
Linux srv.local 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@srv ~]# 

Because we do not wanted to mess up the two output from different linux distros in one article we decided to split it in two separate ones, so here is the one for Ubuntu 16/18 LTS – “Ubuntu 16/18 LTS – load a new kernel without rebooting the server

Missing network interface 10G Intel X520 with error failed to load because of unsupported SFP+

If you have server with 10G Intel X520 network card

05:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

and you wonder why your system did not have network and even no interface is shown with the commands

ip addr and ifconfig -a

, it is probably because you used unsupported SFP+ module.
In the “dmesg” you’ll see two line regarding the problem with the interface:

[ 3142.439304 ] ixgbe 0000:05:00.0: failed to load because an unsupported SFP+ or QSFP module type was detected.
[ 3142.439306 ] ixgbe 0000:05:00.0: Reload the driver after installing a supported module.

Only tested modules are supported by default and it’s probably normal there are a great deal of different SFP+ devices and the creators of the kernel driver would not be able to test all the different SFP+ modules in the world, so all tested SFP+ modules work by default and all other not tested need to enable an option in the kernel to instruct the driver to use them!

Enable the unsupported SF+ module of Intel X520. The kernel module name is “ixgbe”:

  • Built-in kernel module, when the ixgbe is build in the kernel, you must pass a configuration parameters to the kernel boot line
    ixgbe.allow_unsupported_sfp=1
    

    Under CentOS 7 and Ubuntu 16/17/18 you can do the following:
    Edit

    /etc/default/grub

    and add at the end of the GRUB_CMDLINE_LINUX the above line. So it should look like:

    GRUB_CMDLINE_LINUX=" ixgbe.allow_unsupported_sfp=1"
    

    And then for CentOS 7 execute:

    grub2-mkconfig -o /boot/grub2/grub.cfg
    

    And for Ubuntu/Debian:

    grub-mkconfig -o /boot/grub/grub.cfg
    

    DO NOT forget to execute a grub2 make configuration command, because the grub2 configuration file must be regenerated to include the option you’ve set.

  • Most linux distros use ixgbe as a kernel module, so they load the module with the initrd image after booting the kernel. We must instruct how the kernel module will be loaded from the initrd image, so the initrd image must be changed. Here is how to do it properly for CentOS 7 and Ubuntu/Debian:
    For the both distros execute the command:

    echo "options ixgbe allow_unsupported_sfp=1" > /etc/modprobe.d/ixgbe.conf
    

    It will create a configuration file for the ixgbe kernel module with the configuration you see between the double quotes. And then reinstall the kernel!!! Because the initrd image will be regenerated. If you do not want to reinstall the current kernel, you must manually regenerate the initrd image.
    Regenerate all the initrd images for the all installed kernels.
    Under CentOS 7:

    dracut --regenerate-all --force
    

    Under Ubuntu/Debian:

    update-initramfs -u -k all
    

How to install the latest linux kernel (mainline) in CentOS 7 distro

Here you can see the steps to install the latest (mainline stable) kernel under CentOS 7, whether we need the latest driver, because we bought a new laptop released for the first time last month with the latest hardware or there is a hot fix of some nasty bug it is of no matter.
Here are the steps about installing the latest kernel to our CentOS 7, you must have root access:

  • STEP 1 Import the public key of the repository, which offer us the packages of the CentOS 7 mainline stable kernel (and some other kernels, like the Red Hat Enterprise Linux (RHEL) 6 and 7 and more, you can check out the site). The site of the repository is

    https://elrepo.org/

    Import the public key

    #change to root user (skip the first line if you are root)
    sudo su
    rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
    
  • STEP 2 Check the latest version of the rpm install package of the repository, the command is under

    To install ELRepo for RHEL-7, SL-7 or CentOS-7:

    So at present the latest version is “7.0-3” and execute to wget to download the package for the repository elrepo:

    wget http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
    
  • STEP 3 Install the elrepo package with yum (you can do it with rpm command, but let yum manage all your packages and metadata for them!)
    yum -y install ./elrepo-release-7.0-3.el7.elrepo.noarch.rpm
    

    STEP 4 List the available kernels from the elrepo to choose the one you like

    yum list available --disablerepo='*' --enablerepo=elrepo-kernel
    

    To install the latest mainline kernel you must use package starting with

    kernel-ml-*

    at moment of writing the latest mainline kernel is

    4.15.4-1.el7.elrepo

    So execute

    yum install -y --enablerepo=elrepo-kernel kernel-ml
    

    And it will pull the

    kernel-ml-4.15.4-1.el7.elrepo.x86_64

    and install it

  • STEP 5 Check if you are going to boot the new kernel, you’ve installed and set the right one to boot

    cat /boot/grub2/grubenv |grep saved
    # GRUB Environment Block
    saved_entry=CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
    

    And as you see, no it’ll not boot to the new kernel, so you must configure grub2 to boot your newly just installed kernel. First check all the installed kernels, set the right kernel and then it is mandatory to call

    grub2-mkconfig

    to update the grub2 configuration:

    [root@srv ~]# awk -F\' /^menuentry/{print\$2} /boot/grub2/grub.cfg 
    CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)
    CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
    CentOS Linux (0-rescue-0a26fe4b81d845209fb8958c8e29d600) 7 (Core)
    

    The position 0 (YES, it starts from ZERO!) is “CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)”, so you have two options to set it:

    grub2-set-default 0
    grub2-mkconfig -o /boot/grub2/grub.cfg
    

    Check to see if everything is OK with

    [root@srv ~]# cat /boot/grub2/grubenv |grep saved
    saved_entry=0
    

    or you can set the name of the kernel to boot with

    grub2-set-default "CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)"
    grub2-mkconfig -o /boot/grub2/grub.cfg
    

    Check to see if everything is OK with

    [root@srv ~]# cat /boot/grub2/grubenv |grep saved
    saved_entry=CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)
    
  • STEP 6 is just to reboot
    reboot
    

    STEP 4 Verification of running the latest kernel

    root@srv:~# uname -a
    Linux srv.local 4.15.4-1.el7.elrepo.x86_64 #1 SMP Sat Feb 17 13:35:20 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
    

Install log of the procedure (your output may vary depending on your hardware installed in your system):

[root@srv ~]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
[root@srv ~]# wget http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
--2018-02-19 10:20:42--  http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
Resolving www.elrepo.org (www.elrepo.org)... 69.195.83.87
Connecting to www.elrepo.org (www.elrepo.org)|69.195.83.87|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8656 (8.5K) [application/x-rpm]
Saving to: ‘elrepo-release-7.0-3.el7.elrepo.noarch.rpm’

100%[=======================================================================>] 8,656       --.-K/s   in 0.001s  

2018-02-19 10:20:42 (6.97 MB/s) - ‘elrepo-release-7.0-3.el7.elrepo.noarch.rpm’ saved [8656/8656]
[root@srv ~]# yum -y install ./elrepo-release-7.0-3.el7.elrepo.noarch.rpm 
Loaded plugins: fastestmirror
Examining ./elrepo-release-7.0-3.el7.elrepo.noarch.rpm: elrepo-release-7.0-3.el7.elrepo.noarch
Marking ./elrepo-release-7.0-3.el7.elrepo.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package elrepo-release.noarch 0:7.0-3.el7.elrepo will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=================================================================================================================
 Package              Arch         Version                   Repository                                     Size
=================================================================================================================
Installing:
 elrepo-release       noarch       7.0-3.el7.elrepo          /elrepo-release-7.0-3.el7.elrepo.noarch       5.2 k

Transaction Summary
=================================================================================================================
Install  1 Package

Total size: 5.2 k
Installed size: 5.2 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : elrepo-release-7.0-3.el7.elrepo.noarch                                                        1/1 
  Verifying  : elrepo-release-7.0-3.el7.elrepo.noarch                                                        1/1 

Installed:
  elrepo-release.noarch 0:7.0-3.el7.elrepo                                                                       

Complete!
[root@srv ~]# yum list available --disablerepo='*' --enablerepo=elrepo-kernel
Loaded plugins: fastestmirror                                                                                                                                               
Loading mirror speeds from cached hostfile                                                                                                                                  
 * elrepo-kernel: mirrors.netix.net                                                                                                                                         
Available Packages                                                                                                                                                          
kernel-lt.x86_64                                                                      4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-devel.x86_64                                                                4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-doc.noarch                                                                  4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-headers.x86_64                                                              4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-tools.x86_64                                                                4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-tools-libs.x86_64                                                           4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-lt-tools-libs-devel.x86_64                                                     4.4.116-1.el7.elrepo                                                     elrepo-kernel
kernel-ml.x86_64                                                                      4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-devel.x86_64                                                                4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-doc.noarch                                                                  4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-headers.x86_64                                                              4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-tools.x86_64                                                                4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-tools-libs.x86_64                                                           4.15.4-1.el7.elrepo                                                      elrepo-kernel
kernel-ml-tools-libs-devel.x86_64                                                     4.15.4-1.el7.elrepo                                                      elrepo-kernel
perf.x86_64                                                                           4.15.4-1.el7.elrepo                                                      elrepo-kernel
python-perf.x86_64                                                                    4.15.4-1.el7.elrepo                                                      elrepo-kernel
[root@srv ~]# yum install -y --enablerepo=elrepo-kernel kernel-ml
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.uni-sofia.bg
 * elrepo: mirrors.netix.net
 * elrepo-kernel: mirrors.netix.net
 * extras: centos.uni-sofia.bg
 * updates: centos.uni-sofia.bg
Resolving Dependencies
--> Running transaction check
---> Package kernel-ml.x86_64 0:4.15.4-1.el7.elrepo will be installed
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================
 Package                               Arch                               Version                                           Repository                                 Size
============================================================================================================================================================================
Installing:
 kernel-ml                             x86_64                             4.15.4-1.el7.elrepo                               elrepo-kernel                              44 M

Transaction Summary
============================================================================================================================================================================
Install  1 Package

Total download size: 44 M
Installed size: 195 M
Downloading packages:
kernel-ml-4.15.4-1.el7.elrepo.x86_64.rpm                                                                                                             |  44 MB  00:00:10     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : kernel-ml-4.15.4-1.el7.elrepo.x86_64                                                                                                                     1/1 
  Verifying  : kernel-ml-4.15.4-1.el7.elrepo.x86_64                                                                                                                     1/1 

Installed:
  kernel-ml.x86_64 0:4.15.4-1.el7.elrepo                                                                                                                                    

Complete!
[root@srv ~]# cat /boot/grub2/grubenv |grep saved
# GRUB Environment Block
saved_entry=CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
[root@srv ~]# awk -F\' /^menuentry/{print\$2} /etc/grub2.cfg
CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux (3.10.0-693.el7.x86_64) 7 (Core)
CentOS Linux (0-rescue-0a26fe4b81d845209fb8958c8e29d600) 7 (Core)
[root@srv ~]# grub2-set-default "CentOS Linux (4.15.4-1.el7.elrepo.x86_64) 7 (Core)"
[root@srv ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.4-1.el7.elrepo.x86_64
Found initrd image: /boot/initramfs-4.15.4-1.el7.elrepo.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-0a26fe4b81d845209fb8958c8e29d600
Found initrd image: /boot/initramfs-0-rescue-0a26fe4b81d845209fb8958c8e29d600.img
done
[root@srv ~]# reboot
Connection to 192.168.0.18 closed by remote host.
Connection to 192.168.0.18 closed.
...
...
...
[root@srv ~]# uname -a
Linux srv.local 4.15.4-1.el7.elrepo.x86_64 #1 SMP Sat Feb 17 13:35:20 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@srv ~]#

How to install the latest linux kernel (mainline) in Ubuntu (17.10) distro

Sometimes we need to install the latest kernel version of our linux distro (btw it is called “mainline” if you need to google something about it)! Whether we need the latest driver, because we bought a new laptop released for the first time last month with the latest hardware or there is a hot fix of some nasty bug it is of no matter. Here are the steps (and some troubleshooting) about installing the latest kernel to our Ubuntu install 17.10 (with old versions like 16, 15 and 14 should be the same):

  • STEP 1 is to choose our desired kernel from

    http://kernel.ubuntu.com/~kernel-ppa/mainline/

    Open this address in your favorite browser and choose the latest. In our example the latest kernel, which is no release candidate, 4.14.15. Enter the directory

    http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.1/

    and choose your build, for example amd64 for your 64bit setup (and for most cases choose -generic- one)

    mkdir /root/latestkernel
    cd /root/latestkernel
    wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.1/linux-headers-4.15.1-041501_4.15.1-041501.201802031831_all.deb
    wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.1/linux-headers-4.15.1-041501-generic_4.15.1-041501.201802031831_amd64.deb
    wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.1/linux-image-4.15.1-041501-generic_4.15.1-041501.201802031831_amd64.deb
    
  • STEP 2 after successfully downloading the 3 files just install them with “dpkg” (the package manager tool)
    sudo dpkg -i linux-headers-4.15.1*.deb linux-image-4.15.1*.deb
    
  • STEP 3 is just to reboot your machine, the installation setup did everything you needed to load this kernel
    reboot
    

    STEP 4 Verification of running the latest kernel

    root@srv:~# uname -a
    Linux srv.local 4.15.1-041501-generic #201802031831 SMP Sat Feb 3 18:32:13 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    

Install log of the procedure (your output may vary depending on your hardware installed in your system):

root@srv:~/latestkernel# sudo dpkg -i linux-headers-4.15.1*.deb linux-image-4.15.1*.deb
Selecting previously unselected package linux-headers-4.15.1-041501.
(Reading database ... 200430 files and directories currently installed.)
Preparing to unpack linux-headers-4.15.1-041501_4.15.1-041501.201802031831_all.deb ...
Unpacking linux-headers-4.15.1-041501 (4.15.1-041501.201802031831) ...
Selecting previously unselected package linux-headers-4.15.1-041501-generic.
Preparing to unpack linux-headers-4.15.1-041501-generic_4.15.1-041501.201802031831_amd64.deb ...
Unpacking linux-headers-4.15.1-041501-generic (4.15.1-041501.201802031831) ...
Selecting previously unselected package linux-image-4.15.1-041501-generic.
Preparing to unpack linux-image-4.15.1-041501-generic_4.15.1-041501.201802031831_amd64.deb ...
Examining /etc/kernel/preinst.d/
run-parts: executing /etc/kernel/preinst.d/intel-microcode 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic
Done.
Unpacking linux-image-4.15.1-041501-generic (4.15.1-041501.201802031831) ...
Setting up linux-headers-4.15.1-041501 (4.15.1-041501.201802031831) ...
Setting up linux-headers-4.15.1-041501-generic (4.15.1-041501.201802031831) ...                                                                                             
Setting up linux-image-4.15.1-041501-generic (4.15.1-041501.201802031831) ...                                                                                               
Running depmod.                                                                                                                                                             
update-initramfs: deferring update (hook will be called later)                                                                                                              
Examining /etc/kernel/postinst.d.                                                                                                                                           
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic                                                      
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic                                                       
update-initramfs: Generating /boot/initrd.img-4.15.1-041501-generic                                                                                                         
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic                                                   
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic                                                       
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.15.1-041501-generic /boot/vmlinuz-4.15.1-041501-generic                                                        
Generating grub configuration file ...                                                                                                                                      
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.                                                                   
Found linux image: /boot/vmlinuz-4.15.1-041501-generic                                                                                                                      
Found initrd image: /boot/initrd.img-4.15.1-041501-generic                                                                                                                  
Found linux image: /boot/vmlinuz-4.14.15-041415-generic                                                                                                                     
Found initrd image: /boot/initrd.img-4.14.15-041415-generic                                                                                                                 
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done
root@srv:~/latestkernel#

Ryzen Threadripper kernel problems with PCIe – wifi, network, video problems

If you have Ryzen Threadripper (in my case 1950X) and Ubuntu 17.10 (because the others versions do not boot at all!) with the latest kernel 4.15.x and the default vmlinuz-4.13.0-21-generic you might experience frequently wifi, network cards and video cards problems like wifi disconnects and network cards lost connections and some kind of freezes for a second or two of the video! If you have Ryzen Threadripper 1950X and or ASUS ROG ZENITH EXTREME (or any other motherboard with the chipset AMD X399) and ubuntu 17.10 or probably any linux distro with any kernel you might experience really unstable system. Look at your dmesg and if you see something like

[   18.669950] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   18.669966] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   18.669970] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   18.669973] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   18.669975] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   23.805640] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   23.805664] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   23.805669] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   23.805674] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   23.805678] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   23.838698] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   23.838716] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   23.838720] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   23.838725] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   23.838728] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   36.124101] atlantic: link change old 0 new 100
[   36.124222] IPv6: ADDRCONF(NETDEV_CHANGE): enp65s0: link becomes ready
[   52.294413] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   52.294436] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   52.294441] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   52.294446] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   52.294449] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   53.275256] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   53.275289] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   53.275293] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   53.275298] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   53.275301] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   53.418493] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   53.418517] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   53.418522] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   53.418527] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   53.418530] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[   53.649938] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[   53.649962] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[   53.649967] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[   53.649972] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[   53.649975] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[  113.294411] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[  113.294434] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[  113.294439] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[  113.294444] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000040/00006000
[  113.294450] pcieport 0000:00:01.1:    [ 6] Bad TLP               
[  265.469397] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[  265.469419] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[  265.469424] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Transmitter ID)
[  265.469429] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00001000/00006000
[  265.469438] pcieport 0000:00:01.1:    [12] Replay Timer Timeout  
[  533.031982] dpc 0000:00:01.1:pcie010: DPC containment event, status:0x1f00 source:0x0000
[  533.032004] pcieport 0000:00:01.1: AER: Corrected error received: id=0000
[  533.032009] pcieport 0000:00:01.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0009(Receiver ID)
[  533.032014] pcieport 0000:00:01.1:   device [1022:1453] error status/mask=00000080/00006000
[  533.032022] pcieport 0000:00:01.1:    [ 7] Bad DLLP      

Apparently everything on the pci express could be affected! So till the kernel team fixes the issues, you can use the following workaround: add to the kernel boot parameters

pcie_aspm=off

Under Ubuntu you can add in the file

/etc/default/grub

the following line

GRUB_CMDLINE_LINUX="pcie_aspm=off"

And then execute

update-grub

the changes to take effect in the grub configuration. Then reboot the machine!

Under CentOS 7 the configuration file and the line to put are the same just the update of the grub configuration is different:

  • For UEFI-based systems execute:
    grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
    
  • For legacy-BIOS systems execute:
    grub2-mkconfig -o /boot/grub2/grub.cfg
    

*You should know what is your setup, probably UEFI-based, you could check in your boot directory, if there is /boot/efi/EFI/centos/grub.cfg you probably should use the option for UEFI-based, if you do not have any “efi” directory under “/boot” use the second option for legacy-BIOS.
Then reboot the machine!