Copy files with read errors successfully – skipping only errors (i.e. bad sectors)

Sometimes disks have errors or an SSD disk has a bad NAND cell. Saving the whole hard disk data may not be needed and when only a specific file or two are important and which cannot be copied by cp or rsync because of “Unrecovered read error”.
Furthermore, the SSD reallocates the bad cells, when there are writes to the cell(s), which may not occur years, but reading may be each day. Reading from a sector with bad NAND cells will result in slow IO (multiple read commands are executed before giving up). Copying the file to a new place without only 512 bytes may not harm the data, but it is difficult to be done with the generic tool for copying.
This article is to save single files from a mounted ext4 file system with bad sectors using the ddrescue tool – https://www.gnu.org/software/ddrescue/ In fact, the ddrescue could save files or whole devices.

STEP 1) Install ddrescue.

Installing ddrescue is pretty easy. The tool is included in almost all Linux distributions and it doesn’t have many dependencies. Apparently, there is another dd_rescue tool, which is different than this one, just follow the link above for the tool used here.
CentOS 7/8 or Fedora:

yum install -y ddrescue

Ubuntu last 10 years versions:

apt install -y gddrescue

Gentoo:

emerge -v ddrescue

STEP 2) Rescuing a single file with read errors because of bad sectors in a mounted file system.

[root@srv Snapshots]# ddrescue -v \{9f02ae0a-6dae-4729-b6a6-ec3f0550f294\}.vdi test2.vdi
GNU ddrescue 1.25
About to copy 15724 MBytes from '{9f02ae0a-6dae-4729-b6a6-ec3f0550f294}.vdi' to 'test2.vdi'
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 384 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
     ipos:   13495 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   13495 MB, non-scraped:        0 B,  average rate:    162 MB/s
non-tried:        0 B,  bad-sector:     8192 B,    error rate:    4608 B/s
  rescued:   15724 MB,   bad areas:        2,        run time:      1m 36s
pct rescued:   99.99%, read errors:       18,  remaining time:          0s
                              time since last successful read:          0s
Finished                                      
[root@srv Snapshots]# ls -al
total 52602944
drwx------. 2 root root        4096 Jun  2 02:22 .
drwxr-xr-x. 4 root root        4096 Jun  1 14:16 ..
-rw-------. 1 root root   459981735 Nov  8  2018 2018-11-08T15-19-17-776317000Z.sav
-rw-------. 1 root root   566704069 Jun  1 14:16 2020-06-01T11-16-05-735318000Z.sav
-rw-------. 1 root root  8329887744 Jun  1 12:53 {3d30ebea-2e2f-4e33-8088-d3d66f315e2c}.vdi
-rw-------. 1 root root 15724445696 Nov  8  2018 {9f02ae0a-6dae-4729-b6a6-ec3f0550f294}.vdi
-rw-------. 1 root root  4012900352 Jun  1 14:16 {f7e72510-7dce-48fd-b62c-630664ad984f}.vdi
-rw-r--r--. 1 root root 15724445696 Jun  2 02:24 test2.vdi
-rw-------. 1 root root  9051041792 Jun  2 02:19 test.vdi

Here is an animated gif of the ddrescue procedure:

main menu
ddrescue – copy files with bad sectors

Keep on reading!

Change the location of container storage in podman (with SELinux enabled)

There two main options to change the location of all the containers’ storages:

  • “mount bind” the new location to the default storage directory (look Note 1)
  • Change the path of the location in the configuration file /etc/containers/storage.conf

You should stop all your containers though it is not mandatory.

You should stop the containers (if any) and copy the directory, because when reconfigured the storage path podman won’t access the ones in the old path – containers and images!

STEP 1) Change the storage path in the podman configuration file.

If the SELinux has been disabled, which should not be done, it is just a matter of changing a path option in the configuration file /etc/containers/storage.conf

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

Change it to whatever path you like. Mostly, it should point to the big storage device. In our case, the big storage is mounted under “/mnt/mystorage/virtual/storage”. Change the options to:

# Primary Read/Write location of container storage
graphroot = "/mnt/mystorage/virtual/storage"

Check the running configuration with:

[root@lsrv1 mystorage]# podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.8-1.el7.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.8, commit: f85c8b1ce77b73bcd48b2d802396321217008762'
  Distribution:
    distribution: '"centos"'
    version: "7"
  MemFree: 191963136
  MemTotal: 16563531776
  OCIRuntime:
    name: runc
    package: runc-1.0.0-67.rc10.el7_8.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 7857680384
  SwapTotal: 8581541888
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: lsrv1
  kernel: 3.10.0-1062.9.1.el7.x86_64
  os: linux
  rootless: false
  uptime: 607h 10m 53.36s (Approximately 25.29 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - registry.fedoraproject.org
  - registry.centos.org
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /mnt/mystorage/virtual/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 0
  RunRoot: /var/run/containers/storage
  VolumePath: /mnt/mystorage/virtual/storage/volumes

Keep on reading!

Software and technical overview of Ubuntu 20.04 LTS server edition

Ubuntu 20.04 LTS server edition offers the following software and versions:

Software

  • linux kernel – 5.4.0 – 5.4.0-29-generic
  • System
    • linux-firmware – 1.187
    • libc – 2.31 – 2.31-0ubuntu9
    • GNU GCC – multiple versions available – 7.5.0, 8.4.0, 9.3.0 and 10-20200411. The exact versions – 7.5.0-6ubuntu2, 8.4.0-3ubuntu2, 9.3.0-10ubuntu2 and 10-20200411-0ubuntu1
    • OpenSSL – 1.1.1f – 1.1.1f-1ubuntu2
    • coreutils – 8.308.30-3ubuntu2
    • apt – 2.0.2ubuntu0.1
    • rsyslog – 8.2001.0 – 8.2001.0-1ubuntu1
  • Servers
    • Apache – 2.4.41 – 2.4.41-4ubuntu3
    • Nginx – 1.17.10 – 1.17.10-0ubuntu1
    • MySQL server – 8.0.208.0.20-0ubuntu0.20.04.1
    • MariaDB server – 10.3.22 – 10.3.22-1ubuntu1
    • PostgreSQL – 12.2-4
  • Programming
      LTS the user may install

    • PHP – 7.4 – 7.4.3-4ubuntu1.1
    • python – 3.8.2 (3.8.2-0ubuntu2) and also includes 2.7.17 (2.7.17-2ubuntu4)
    • perl – 5.30.0 and also includes perl 6 6.d-2
    • ruby – 2.7 – 2.7+1
    • OpenJDK – includes multiple versions – 8, 11, 13 and 14. The exact versions are 8u252-b09-1ubuntu1, 11.0.7+10-3ubuntu1, 13.0.3+3-1ubuntu2 and 14.0.1+7-1ubuntu1
    • Go lang – multiple versions – 1.13.8 and 1.14.2. The exact versions – 1.13.8-1ubuntu1 and 1.14.2-1ubuntu1
    • Rust – 1.41.0 – 1.41.0+dfsg1+llvm-0ubuntu2
    • Subversion – 1.13.0 – 1.13.0-3
    • Git – 2.25.1 – 2.25.1-1ubuntu3
    • llvm – multiple versions – 6, 7, 8, 9 and 10. The exact versions – 6.0.1-14, 7.0.1-12, 8.0.1-9, 9.0.1-12, 10.0-50~exp1
  • Graphical User Interface
    • Xorg X server – 1.20.8 – 1.20.8-2ubuntu2
    • GNOME (the GUI) – 3.36.x – Gnome Shell – 3.36.1

Note: Not all of the above software comes installed by default. The versions above are valid for the intial release so in fact, these are the minimal versions you get with Ubuntu 20 LTS and installing and updating it after the initial date may update some of the above packages with new versions. Installed packages are 582 occupying 11G space.

During the installation wizard you may want to install the following snap software environments. Of course, this software is available after the installation setup, too.

The test server is equipped with “Threadripper 1950X AMD“, which is 16 cores CPU.
Check out Minimal installation of Ubuntu server 20.04 LTS, too.

Keep on reading!

Minimal installation of Ubuntu server 20.04 LTS

This tutorial will show you the simple steps of installing a modern Linux Distribution – Ubuntu server 20.04 LTS edition. Following most of the default options during the setup configuration for simplicity.

Here are some basic data from the default installation setup settings:

  1. Installed packages – ~582 occupying 11G of space.
  2. 3 partitions when using automatic patition layout – boot efi, swap and root.
  3. ext4 used for the root parition.

We used the following ISO for the installation process – Ubuntu 20.04 LTS (Focal Fossa):

http://releases.ubuntu.com/focal/ubuntu-20.04-live-server-amd64.iso

It is a LIVE image so you can try it before installing it. The easiest way is just to download the image and burn it to a DVD disk and then follow the installation below:

SCREENSHOT 1) Boot from the disk or USB – whatever you made after downloading the ISO file from Ubuntu official source.

On the image here the DVD is used to boot in UEFI mode installation.

main menu
boot uefi dvd

Keep on reading!

Data too large, data for [] would be [] which is larger than the limit of

Rsyslog writing to Elasticsearch could lead to an error for some of the records and missing to save them in the backend:

{ ... { "error": { "root_cause": [ { "type": "circuit_breaking_exception", 
"reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }], 
"type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }, "status": 429 } }

Unfortunately, such writes are not saved in the Elasticsearch and the data has been lost.

The problem here is that the Java VM has reached the maximum allowed memory and more memory should be allowed to be used by the Java Virtual Machine.

Find the Java VM option for the Elasticsearchjvm.options. In CentOS 7 the file is located in /etc/elasticsearch/jvm.options and set more memory with the variables “-Xms[SIZE]g -Xmx[SIZE]g”, such as:

.....
-Xms4g
-Xmx4g
.....

|grep -v grep
This will allow 4G “maximum size of total heap space” to be used by the Java Virtual Machine. By default, it is 1G (-Xms1g -Xmx1g). It is a good idea to set it half of the server’s memory. Save and restart the Elasticsearch service as usual:

systemctl restart elasticsearch

You should see the variable in the command line with ps command:

[root@loganalyzer ~]# ps axuf|grep elasticsearch
elastic+   592 10.8 34.4 168638848 5493156 ?   Ssl  00:56   4:23 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 
-Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false 
-Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT 
-Xms4g -Xmx4g 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-Djava.io.tmpdir=/tmp/elasticsearch-16851535740012150929 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch 
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log elasticsearch
-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m 
-XX:MaxDirectMemorySize=2147483648 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch 
-Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true 
-cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
elastic+   690  0.0  0.0  70448  4516 ?        Sl   00:56   0:00  \_ /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

The environment variable ES_JAVA_OPTS could be used, too.

ES_JAVA_OPTS="-Xms4g -Xmx4g" ./bin/elasticsearch 

collectd nginx plugin: curl_easy_perform failed because of selinux

Enabling the Nginx plugin for collectd under CentOS (or any other system using SELinux) might be confusing for a newbie. Most sources on the Internet would just install collectd-nginx:

yum install -y collectd-nginx

and configure it in the nginx.conf and collectd.conf. Still, the statistics might not work as expected, the collectd may not be able to gather statistics from the Nginx.

SELinux may prevent collectd (plugin) daemon to connect to Nginx and gather statistics from the Nginx stats page.

Checking the collectd log and it reports a problem:
Keep on reading!

aptly remove a package from a repository using the cli

Here is a fast tip – how to remove a package from our local aptly repository:

  1. Remove the package from the local repository.
  2. Create a new snapshot form the local repository.
  3. Publish the snapshot by switching to the newly created snapshot from the above step.

The commands executing over repository with name xenial-apps to remove package with name example-app and version 10.5.1.22-ubuntu20. The snapshot name xenial-apps1588149526 is just a temporary name used for the snapshot (the ID is unix timestamp of the current time).

aptly repo remove  xenial-apps 'example-app (= 10.5.1.22-ubuntu20)'
aptly snapshot create xenial-apps1588149526 from repo xenial-apps
aptly publish switch xenial-apps ubuntu xenial-apps1588149526

Real world example.

This is the log from our system with just changed names:
Keep on reading!

Kibana server is not ready yet – and Waiting for that migration to complete in the logs

Now, living in the cloud and big data there is a time when the admin may need to save all their logs in a central place! Elasticsearch and Kibana look good for the job! And after months of hassle-free work of the Elasticsearch and Kibana, Elasticsearch just stopped working and after a restart and upgrade (Elasticsearch and Kibana) Kibana showed an error message:

Kibana server is not ready yet

And if you have tried the STOP/START of Kibana and Elasticsearch and Kibana would still show the above message here is what you should do:

  1. Check if the two services are running! Kibana and Elasticsearch, if some of them is missing start it.
  2. Search for the logs and especially Elasticsearch logs. The first place to check is systemd logs with journalctl program (systemctl status also will point out the problem showing last lines of the logs).
  3. Look for the last lines and if they include

    Another Kibana instance appears to be migrating the index

    This article is probably the right place to solve the issue and start your setup successful.

STEP 1) Running services and analyzing the logs.

If kibana and elastisearch use systemd system to operate it is easy to access the logs with systemctl and journalctl
Check whether the kibana and elasticsearch are running with:

[root@loganalyzer ~]# ps ax|grep elasticsearch|grep -v grep
  258 ?        Ssl  836:31 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-13303119363353782625 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -XX:MaxDirectMemorySize=536870912 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
  360 ?        Sl     0:00 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
[root@loganalyzer ~]# ps ax|grep kibana|grep -v grep
 1284 ?        Ssl    4:32 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

If one of the two services are missing you must start it! And second, each service should have only one instance (i.e. process)!
And check the kibana logs with journalctl

[root@loganalyzer ~]# journalctl -u kibana.service
.....
.....
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","plugins","bfetch"],"pid":1219,"message":"Setting up plugin"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Waiting until all Elasticsearch nodes
 are compatible with Kibana before starting saved objects migrations..."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Starting saved objects migrations"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Creating index .kibana_task_manager_2
."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Unable to connect to Elasticsearch
. Error: [resource_already_exists_exception] index [.kibana_task_manager_2/O070AunfSyG6hwd6_pqqRA] already exists, with { index_uuid=\"O070AunfSyG6hwd6_pqqRA\" & index=\".kibana_task_manager
_2\" }"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Another Kibana instance appears to
 be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_task_manager_2 
and restarting Kibana."}

The systemctl status may be used, too. The error and the index are shown in the last lines of the status output – look below.

The problem here is there was a migration of index .kibana_task_manager_2, but it was abandoned because of unknown reason and now we should delete it to be able to use our kibana service. The index name might be with another name but it is the same problem.

STEP 2) Delete kibana index

Delete the kibana index in elasticsearch backend using curl and HTTP/HTTPS request such as:

[root@loganalyzer kibana]# curl -XDELETE http://192.168.0.2:9200/.kibana_task_manager_2
{"acknowledged":true}

Keep on reading!

Cron missing path – executing docker/podman – adding network: failed to locate iptables

If you have ever happened to execute some complex scripts using the cron system you were inevitable to discover the Linux environment was different than the login or ssh shell. The different environment tends to lead to a missing or different PATH environment! Here is what happens with podman starting a container from a cron script:

time="2020-04-19T20:45:20Z" level=error msg="Error adding network: failed to locate iptables: exec: \"iptables\": executable file not found in $PATH"
time="2020-04-19T20:45:20Z" level=error msg="Error while adding pod to CNI network \"podman\": failed to locate iptables: exec: \"iptables\": executable file not found in $PATH"
Error: unable to start container "onedrive-cli": error configuring network namespace for container d297cf80db20441d4258a1acc7d810444795d1ca8730ab242d9fe8a13eaa697d: failed to locate iptables: exec: "iptables": executable file not found in $PATH

The iptables executable is missing because the PATH variable is different than the login or ssh shell one. Executing the commands or the script under ssh or login will result in no error and a proper podman (docker) execution!

A similar problem could have happened with another software trying to execute iptables or another tool, which is not found in the cron’s PATH environment because cron’s environment is very limited and

To ensure the PATH is like the user’s (root) environment just source the “profile” or “.bashrc” file of the current user before the execution of the script or in the first lines of it.
This would do the trick.

. /etc/profile

Or user’s custom

. ~/.bashrc

Or the default OS bashrc

. /etc/bashrc

The dot may be replaced by “source”:

source /etc/bashrc

All (environment) variables will be available after the source command.

Here is the difference:
The environment without the sourcing profile/bashrc file:

 
LANG=en_US.UTF-8
XDG_SESSION_ID=19118
USER=root
PWD=/root
HOME=/root
SHELL=/bin/sh
SHLVL=1
LOGNAME=root
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus
XDG_RUNTIME_DIR=/run/user/0
PATH=/usr/bin:/bin
_=/usr/bin/env

Sourcing the “/etc/profile” file:

LANG=en_US.UTF-8
HISTCONTROL=ignoredups
HOSTNAME=srv.example.com
XDG_SESSION_ID=19165
USER=root
PWD=/root
HOME=/root
MAIL=/var/spool/mail/root
SHELL=/bin/bash
SHLVL=1
LOGNAME=root
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus
XDG_RUNTIME_DIR=/run/user/0
PATH=/usr/local/sbin:/usr/sbin:/usr/bin:/bin
HISTSIZE=1000
LESSOPEN=||/usr/bin/lesspipe.sh %s
_=/usr/bin/env

Multiple additional envrinment varibles, which could be important for user’s scripts executed by the cron.

And in CentOS 8 the iptables happens to be in “/usr/sbin/iptables” – a path /usr/sbin not included in the default cron environment PATH variable!
Of course, the PATH environment may be edited in the cron scheduler with crontab (by just setting the PATH with a path) till the next path missing in it and included in the user’s path! It’s just better to ensure the two environments are the same every time by sourcing the environment configuration file such as /etc/profile or user’s bashrc (or the default on in /etc/bashrc?).

lvm2: remove unknown physical volume device in a logical volume with a caching using vgreduce

Following the article SSD cache device to a hard disk drive using LVM the initial setup is:

  1. Single slow 2T harddisk. So no redundancy. If it fails all data is gone.
  2. SSD cache device for the slow harddisk above.

And here is how to handle a slow device failure! The data is all gone because there is no redundancy option when using a single device for data. The data is not valuable, because it is a cache server. This article will show what to expect when the slow device fails and how to replace it.
The original slow device is missing and replaced by a new one and the partitions are as follow:

  1. /dev/sda4 – the slow device. New device.
  2. /dev/sdb5 – the SSD device. Still in the LVM2 Volume Group and the Logical Volume.

Physical Volume is missing marked as “[unknown]”. The pvdisplay shows metadata information for the missing device. The vgdisplay and lvdisplay show information for the group and the logical volume, but the logical volume is in not available status. So the logical volume cannot be used, which is kind of normal when only the cache device is present.

Slow drive failure and the LVM2 status

[root@srv ~]# ssm list
------------------------------------------------------
Device        Free        Used      Total  Pool       
------------------------------------------------------
/dev/sda                          2.00 TB             
/dev/sda1  0.00 KB    50.00 GB   50.03 GB  md         
/dev/sda2  0.00 KB    15.75 GB   15.76 GB  md         
/dev/sda3  0.00 KB  1023.00 MB    1.00 GB  md         
/dev/sda4                         1.93 TB             
/dev/sdb                        894.25 GB             
/dev/sdb1  0.00 KB    50.00 GB   50.03 GB  md         
/dev/sdb2  0.00 KB    15.75 GB   15.76 GB  md         
/dev/sdb3  0.00 KB  1023.00 MB    1.00 GB  md         
/dev/sdb4                         1.00 KB             
/dev/sdb5  0.00 KB   675.00 GB  675.00 GB  VG_storage1
/dev/sdb6                       152.46 GB             
[unknown]  0.00 KB     1.93 TB    1.93 TB  VG_storage1
------------------------------------------------------
-----------------------------------------------------
Pool         Type  Devices     Free     Used    Total  
-----------------------------------------------------
VG_storage1  lvm   2        0.00 KB  2.59 TB  2.59 TB  
-----------------------------------------------------
------------------------------------------------------------------------------
Volume      Pool  Volume size  FS       FS size       Free  Type   Mount point
------------------------------------------------------------------------------
/dev/md125  md       50.00 GB  ext4    50.00 GB   44.41 GB  raid1  /          
/dev/md126  md     1023.00 MB  ext4  1023.00 MB  788.84 MB  raid1  /boot      
/dev/md127  md       15.75 GB                               raid1             
/dev/sdb6           152.46 GB  ext4   152.46 GB  145.77 GB                    
------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Snapshot                      Origin               Pool         Volume size  Type   
----------------------------------------------------------------------------------
/dev/VG_storage1/lv_storage1  [lv_storage1_corig]  VG_storage1      1.93 TB  cache  
----------------------------------------------------------------------------------
[root@srv ~]# pvdisplay 
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  --- Physical volume ---
  PV Name               /dev/sdb5
  VG Name               VG_storage1
  PV Size               675.00 GiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              172799
  Free PE               0
  Allocated PE          172799
  PV UUID               oLn3hh-ROFU-WSW8-0m8P-YLWY-Akoz-nCxh96
   
  --- Physical volume ---
  PV Name               [unknown]
  VG Name               VG_storage1
  PV Size               1.93 TiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              507188
  Free PE               0
  Allocated PE          507188
  PV UUID               IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd
   
[root@srv ~]# pvscan 
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  PV /dev/sdb5   VG VG_storage1     lvm2 [<675.00 GiB / 0    free]
  PV [unknown]   VG VG_storage1     lvm2 [1.93 TiB / 0    free]
  Total: 2 [2.59 TiB] / in use: 2 [2.59 TiB] / in no VG: 0 [0   ]
[root@srv ~]# vgdisplay 
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  --- Volume group ---
  VG Name               VG_storage1
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  10
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                1
  VG Size               2.59 TiB
  PE Size               4.00 MiB
  Total PE              679987
  Alloc PE / Size       679987 / 2.59 TiB
  Free  PE / Size       0 / 0   
  VG UUID               eZ2ZIb-jcDl-kPLj-oFwJ-LLuN-VxLD-rVJclP
   
[root@srv ~]# vgscan 
  Reading volume groups from cache.
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  Found volume group "VG_storage1" using metadata type lvm2
[root@srv ~]# lvdisplay 
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  --- Logical volume ---
  LV Path                /dev/VG_storage1/lv_storage1
  LV Name                lv_storage1
  VG Name                VG_storage1
  LV UUID                NFWlWF-VmSO-HVr4-72RW-YY82-1ax2-92cI6P
  LV Write Access        read/write
  LV Creation host, time srv.example.com, 2019-10-25 17:18:41 +0000
  LV Cache pool name     lv_cache
  LV Cache origin name   lv_storage1_corig
  LV Status              NOT available
  LV Size                1.93 TiB
  Current LE             507188
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
   
[root@srv ~]# lvscan 
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  inactive          '/dev/VG_storage1/lv_storage1' [1.93 TiB] inherit
[root@srv ~]# lvs
lvs     lvscan  
[root@srv ~]# lvs -a
  WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter.
  Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd.
  LV                  VG          Attr       LSize   Pool       Origin              Data%  Meta%  Move Log Cpy%Sync Convert
  [lv_cache]          VG_storage1 Cwi---C--- 674.90g                                                                       
  [lv_cache_cdata]    VG_storage1 Cwi------- 674.90g                                                                       
  [lv_cache_cmeta]    VG_storage1 ewi-------  48.00m                                                                       
  lv_storage1         VG_storage1 Cwi---C-p-   1.93t [lv_cache] [lv_storage1_corig]                                        
  [lv_storage1_corig] VG_storage1 owi---C-p-   1.93t                                                                       
  [lvol0_pmspare]     VG_storage1 ewi-------  48.00m

Keep on reading!