Copy files with read errors successfully – skipping only errors (i.e. bad sectors)

Sometimes disks have errors or an SSD disk has a bad NAND cell. Saving the whole hard disk data may not be needed and when only a specific file or two are important and which cannot be copied by cp or rsync because of “Unrecovered read error”.
Furthermore, the SSD reallocates the bad cells, when there are writes to the cell(s), which may not occur years, but reading may be each day. Reading from a sector with bad NAND cells will result in slow IO (multiple read commands are executed before giving up). Copying the file to a new place without only 512 bytes may not harm the data, but it is difficult to be done with the generic tool for copying.
This article is to save single files from a mounted ext4 file system with bad sectors using the ddrescue tool – https://www.gnu.org/software/ddrescue/ In fact, the ddrescue could save files or whole devices.

STEP 1) Install ddrescue.

Installing ddrescue is pretty easy. The tool is included in almost all Linux distributions and it doesn’t have many dependencies. Apparently, there is another dd_rescue tool, which is different than this one, just follow the link above for the tool used here.
CentOS 7/8 or Fedora:

yum install -y ddrescue

Ubuntu last 10 years versions:

apt install -y gddrescue

Gentoo:

emerge -v ddrescue

STEP 2) Rescuing a single file with read errors because of bad sectors in a mounted file system.

[root@srv Snapshots]# ddrescue -v \{9f02ae0a-6dae-4729-b6a6-ec3f0550f294\}.vdi test2.vdi
GNU ddrescue 1.25
About to copy 15724 MBytes from '{9f02ae0a-6dae-4729-b6a6-ec3f0550f294}.vdi' to 'test2.vdi'
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 384 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
     ipos:   13495 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   13495 MB, non-scraped:        0 B,  average rate:    162 MB/s
non-tried:        0 B,  bad-sector:     8192 B,    error rate:    4608 B/s
  rescued:   15724 MB,   bad areas:        2,        run time:      1m 36s
pct rescued:   99.99%, read errors:       18,  remaining time:          0s
                              time since last successful read:          0s
Finished                                      
[root@srv Snapshots]# ls -al
total 52602944
drwx------. 2 root root        4096 Jun  2 02:22 .
drwxr-xr-x. 4 root root        4096 Jun  1 14:16 ..
-rw-------. 1 root root   459981735 Nov  8  2018 2018-11-08T15-19-17-776317000Z.sav
-rw-------. 1 root root   566704069 Jun  1 14:16 2020-06-01T11-16-05-735318000Z.sav
-rw-------. 1 root root  8329887744 Jun  1 12:53 {3d30ebea-2e2f-4e33-8088-d3d66f315e2c}.vdi
-rw-------. 1 root root 15724445696 Nov  8  2018 {9f02ae0a-6dae-4729-b6a6-ec3f0550f294}.vdi
-rw-------. 1 root root  4012900352 Jun  1 14:16 {f7e72510-7dce-48fd-b62c-630664ad984f}.vdi
-rw-r--r--. 1 root root 15724445696 Jun  2 02:24 test2.vdi
-rw-------. 1 root root  9051041792 Jun  2 02:19 test.vdi

Here is an animated gif of the ddrescue procedure:

main menu
ddrescue – copy files with bad sectors

Keep on reading!

Data too large, data for [] would be [] which is larger than the limit of

Rsyslog writing to Elasticsearch could lead to an error for some of the records and missing to save them in the backend:

{ ... { "error": { "root_cause": [ { "type": "circuit_breaking_exception", 
"reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }], 
"type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }, "status": 429 } }

Unfortunately, such writes are not saved in the Elasticsearch and the data has been lost.

The problem here is that the Java VM has reached the maximum allowed memory and more memory should be allowed to be used by the Java Virtual Machine.

Find the Java VM option for the Elasticsearchjvm.options. In CentOS 7 the file is located in /etc/elasticsearch/jvm.options and set more memory with the variables “-Xms[SIZE]g -Xmx[SIZE]g”, such as:

.....
-Xms4g
-Xmx4g
.....

|grep -v grep
This will allow 4G “maximum size of total heap space” to be used by the Java Virtual Machine. By default, it is 1G (-Xms1g -Xmx1g). It is a good idea to set it half of the server’s memory. Save and restart the Elasticsearch service as usual:

systemctl restart elasticsearch

You should see the variable in the command line with ps command:

[root@loganalyzer ~]# ps axuf|grep elasticsearch
elastic+   592 10.8 34.4 168638848 5493156 ?   Ssl  00:56   4:23 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 
-Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false 
-Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT 
-Xms4g -Xmx4g 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-Djava.io.tmpdir=/tmp/elasticsearch-16851535740012150929 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch 
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log elasticsearch
-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m 
-XX:MaxDirectMemorySize=2147483648 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch 
-Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true 
-cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
elastic+   690  0.0  0.0  70448  4516 ?        Sl   00:56   0:00  \_ /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

The environment variable ES_JAVA_OPTS could be used, too.

ES_JAVA_OPTS="-Xms4g -Xmx4g" ./bin/elasticsearch 

collectd nginx plugin: curl_easy_perform failed because of selinux

Enabling the Nginx plugin for collectd under CentOS (or any other system using SELinux) might be confusing for a newbie. Most sources on the Internet would just install collectd-nginx:

yum install -y collectd-nginx

and configure it in the nginx.conf and collectd.conf. Still, the statistics might not work as expected, the collectd may not be able to gather statistics from the Nginx.

SELinux may prevent collectd (plugin) daemon to connect to Nginx and gather statistics from the Nginx stats page.

Checking the collectd log and it reports a problem:
Keep on reading!

Kibana server is not ready yet – and Waiting for that migration to complete in the logs

Now, living in the cloud and big data there is a time when the admin may need to save all their logs in a central place! Elasticsearch and Kibana look good for the job! And after months of hassle-free work of the Elasticsearch and Kibana, Elasticsearch just stopped working and after a restart and upgrade (Elasticsearch and Kibana) Kibana showed an error message:

Kibana server is not ready yet

And if you have tried the STOP/START of Kibana and Elasticsearch and Kibana would still show the above message here is what you should do:

  1. Check if the two services are running! Kibana and Elasticsearch, if some of them is missing start it.
  2. Search for the logs and especially Elasticsearch logs. The first place to check is systemd logs with journalctl program (systemctl status also will point out the problem showing last lines of the logs).
  3. Look for the last lines and if they include

    Another Kibana instance appears to be migrating the index

    This article is probably the right place to solve the issue and start your setup successful.

STEP 1) Running services and analyzing the logs.

If kibana and elastisearch use systemd system to operate it is easy to access the logs with systemctl and journalctl
Check whether the kibana and elasticsearch are running with:

[root@loganalyzer ~]# ps ax|grep elasticsearch|grep -v grep
  258 ?        Ssl  836:31 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-13303119363353782625 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -XX:MaxDirectMemorySize=536870912 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
  360 ?        Sl     0:00 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
[root@loganalyzer ~]# ps ax|grep kibana|grep -v grep
 1284 ?        Ssl    4:32 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

If one of the two services are missing you must start it! And second, each service should have only one instance (i.e. process)!
And check the kibana logs with journalctl

[root@loganalyzer ~]# journalctl -u kibana.service
.....
.....
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","plugins","bfetch"],"pid":1219,"message":"Setting up plugin"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Waiting until all Elasticsearch nodes
 are compatible with Kibana before starting saved objects migrations..."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Starting saved objects migrations"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Creating index .kibana_task_manager_2
."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Unable to connect to Elasticsearch
. Error: [resource_already_exists_exception] index [.kibana_task_manager_2/O070AunfSyG6hwd6_pqqRA] already exists, with { index_uuid=\"O070AunfSyG6hwd6_pqqRA\" & index=\".kibana_task_manager
_2\" }"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Another Kibana instance appears to
 be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_task_manager_2 
and restarting Kibana."}

The systemctl status may be used, too. The error and the index are shown in the last lines of the status output – look below.

The problem here is there was a migration of index .kibana_task_manager_2, but it was abandoned because of unknown reason and now we should delete it to be able to use our kibana service. The index name might be with another name but it is the same problem.

STEP 2) Delete kibana index

Delete the kibana index in elasticsearch backend using curl and HTTP/HTTPS request such as:

[root@loganalyzer kibana]# curl -XDELETE http://192.168.0.2:9200/.kibana_task_manager_2
{"acknowledged":true}

Keep on reading!

Cron missing path – executing docker/podman – adding network: failed to locate iptables

If you have ever happened to execute some complex scripts using the cron system you were inevitable to discover the Linux environment was different than the login or ssh shell. The different environment tends to lead to a missing or different PATH environment! Here is what happens with podman starting a container from a cron script:

time="2020-04-19T20:45:20Z" level=error msg="Error adding network: failed to locate iptables: exec: \"iptables\": executable file not found in $PATH"
time="2020-04-19T20:45:20Z" level=error msg="Error while adding pod to CNI network \"podman\": failed to locate iptables: exec: \"iptables\": executable file not found in $PATH"
Error: unable to start container "onedrive-cli": error configuring network namespace for container d297cf80db20441d4258a1acc7d810444795d1ca8730ab242d9fe8a13eaa697d: failed to locate iptables: exec: "iptables": executable file not found in $PATH

The iptables executable is missing because the PATH variable is different than the login or ssh shell one. Executing the commands or the script under ssh or login will result in no error and a proper podman (docker) execution!

A similar problem could have happened with another software trying to execute iptables or another tool, which is not found in the cron’s PATH environment because cron’s environment is very limited and

To ensure the PATH is like the user’s (root) environment just source the “profile” or “.bashrc” file of the current user before the execution of the script or in the first lines of it.
This would do the trick.

. /etc/profile

Or user’s custom

. ~/.bashrc

Or the default OS bashrc

. /etc/bashrc

The dot may be replaced by “source”:

source /etc/bashrc

All (environment) variables will be available after the source command.

Here is the difference:
The environment without the sourcing profile/bashrc file:

 
LANG=en_US.UTF-8
XDG_SESSION_ID=19118
USER=root
PWD=/root
HOME=/root
SHELL=/bin/sh
SHLVL=1
LOGNAME=root
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus
XDG_RUNTIME_DIR=/run/user/0
PATH=/usr/bin:/bin
_=/usr/bin/env

Sourcing the “/etc/profile” file:

LANG=en_US.UTF-8
HISTCONTROL=ignoredups
HOSTNAME=srv.example.com
XDG_SESSION_ID=19165
USER=root
PWD=/root
HOME=/root
MAIL=/var/spool/mail/root
SHELL=/bin/bash
SHLVL=1
LOGNAME=root
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus
XDG_RUNTIME_DIR=/run/user/0
PATH=/usr/local/sbin:/usr/sbin:/usr/bin:/bin
HISTSIZE=1000
LESSOPEN=||/usr/bin/lesspipe.sh %s
_=/usr/bin/env

Multiple additional envrinment varibles, which could be important for user’s scripts executed by the cron.

And in CentOS 8 the iptables happens to be in “/usr/sbin/iptables” – a path /usr/sbin not included in the default cron environment PATH variable!
Of course, the PATH environment may be edited in the cron scheduler with crontab (by just setting the PATH with a path) till the next path missing in it and included in the user’s path! It’s just better to ensure the two environments are the same every time by sourcing the environment configuration file such as /etc/profile or user’s bashrc (or the default on in /etc/bashrc?).

Overwrite Return-Path with postfix because of “550-Sender verification is required but failed”

Sending emails from web applications like PHP may result in rejecting the emails from some servers. Fighting spam emails results in too strict filters and rules, which reject the mails even before the anti-spam service of the accepting server. Here is an error:

Apr  1 04:10:18 srv-mail postfix/pickup[26902]: AB13578FAB3: uid=1015 from=<www-data>
Apr  1 04:10:18 srv-mail postfix/cleanup[21182]: AB13578FAB3: message-id=<20200401041018.AB13578FAB3@www.mydomain.com>
Apr  1 04:10:18 srv-mail postfix/qmgr[6485]: AB13578FAB3: from=<www-data@www.mydomain.com>, size=7923, nrcpt=1 (queue active)
Apr  1 04:10:19 srv-mail postfix/smtp[45689]: AB13578FAB3: to=<mailbox@example.com>, relay=mx.example.com[1.1.1.1]:25, delay=11, delays=0.02/0.01/0.65/10, dsn=5.0.0, status=bounced (host mx.example.com[1.1.1.1] said: 550-Sender verification is required but failed. (ID:550:0:5 550 (smtp1.mx.example.com)): www-data@mydomain.com (in reply to MAIL FROM command))

The receiving server has too strict rules!

It just expects the “From” and the “Return-Path” headers to contain the same string – the sender’s email box.

As you can see, from the example above, the application sends all emails (from let’s say web forms) from the www-data@mydomain.com and probably the www-data is the username of the OS user, under which the application executes.
Or you want to overwrite the Return-Path because it uses the username of the application, which sent the email like “web”, “apache”, “www-data” and so on.
Here is how to overwrite the Return-Path with postfix mail system.

STEP 1) Edit postfix configuration

Add a line in /etc/postfix/main.cf (it is perfectly fine to be on the last line):

smtp_generic_maps = hash:/etc/postfix/generic

And create the file /etc/postfix/generic with mapping “old@mailbox.com new@mail.com”:

www-data@mydomain.com no-reply@domain.com

The domains of the emails may be different or the same. It doesn’t matter. If you do not know what is your “www-data@mydomain.com” the mail logs in /var/log/messages or /varlog/mail maight help to find the emailbox or just send yourself an email and look for the Return-Path.
And a real-world example for /etc/postfix/generic

www-data@www.mydomain.com no-reply@ahelpme.com

STEP 2) Generate the hash file, which postfix will use. Reload the postfix.

The postfix will use the hash file add in the configuration. Just execute:

postmap /etc/postfix/generic

The above command will create a binary file /etc/postfix/generic.db, which will be used by the postfix mail system. Do not edit the file directly. To add entry, just use a text editor and edit /etc/postfix/generic (without the “.db” suffix) and then reload/restart the postfix to enable the new configuration.
And reload (or restart) postfix with

systemctl reload postfix

or for init systems:

/etc/init.d/postfix restart

Dracut boot failed with missing device – exit and continue normal booting!

This issue deserves a much more article, in fact, a straightforward tip:

You may be able to continue a normal boot only by typing “exit” and hitting enter in the “Dracut” console.

Most of the time this Dracut console entering is caused because the system administrator of the server/machine added, replaced or deleted a RAID or similar device and forgot to update the configuration (grub2 probably). And in most of these cases, the raid is not critical for machine normal boot from the root partition, but it may be critical for the services lately. Booting in normal mode, even without some devices, is the main goal because under the normal mode it easier to repair the system.
Check out the two articles on the topic (especially the first one):

SCREENSHOT 1) Just type “exit” and hit enter.

It’s worth noting that if you executed some commands in the console and/or mounted devices to test they are with healthy file system or for whatever reason you did it, the boot process may not continue after typeing exit and probablly a reboot is required. The server will go once more in this mode and then just typing will work.

main menu
type exit

Keep on reading!

podman – Error adding network: failed to allocate for range 0: 10.88.0.46 has been allocated after server reboot

We’ve just stumbled on the following error with one of our podman CentOS 8 servers after restart:

[root@srv ~]# podman start mysql-slave
ERRO[0000] Error adding network: failed to allocate for range 0: 10.88.0.46 has been allocated to c97823be46832ddebbce29f3f51e3091620188710cb7ace246e173a7a981baed, duplicate allocation is not allowed 
ERRO[0000] Error while adding pod to CNI network "podman": failed to allocate for range 0: 10.88.0.46 has been allocated to c97823be46832ddebbce29f3f51e3091620188710cb7ace246e173a7a981baed, duplicate allocation is not allowed 
Error: unable to start container "mysql-slave": error configuring network namespace for container c97823be46832ddebbce29f3f51e3091620188710cb7ace246e173a7a981baed: failed to allocate for range 0: 10.88.0.46 has been allocated to c97823be46832ddebbce29f3f51e3091620188710cb7ace246e173a7a981baed, duplicate allocation is not allowed

Apparently, something got wrong, because the two containers were fine before restarting and they were multiple times stopped, started and restarted.

The solution is to remove IP-named files in /var/lib/cni/networks/podman and start the podman containers again.

It resembles to a bug https://github.com/containers/libpod/issues/3759, which should have already been closed by the new minor CentOS 8 releases.

The interesting part is that the container we are trying to start mysql-slave has c97823be46832ddebbce29f3f51e3091620188710cb7ace246e173a7a981baed, but it reports it cannot allocate it, because it has already been allocated to a container with the same ID. That’s the problem:

The IP-named files in /var/lib/cni/networks/podman were not removed when the podman container had stopped.

Typically, when a podman container executes a stop command, the process should remove the files in /var/lib/cni/networks/podman. Before restarting the CentOS 8 server you may need to top the podman containers for now.

[root@srv ~]# cd /var/lib/cni/networks/podman
[root@srv podman]# ls -altr
total 24
-rwxr-x---. 1 root root    0  3 Dec  0,43 lock
drwxr-xr-x. 3 root root 4096  3 Dec  0,43 ..
-rw-r--r--. 1 root root   64  9 Dec 18,34 10.88.0.46
-rw-r--r--. 1 root root   64 16 Dec 12,01 10.88.0.47
-rw-r--r--. 1 root root   10  1 Mar  9,28 last_reserved_ip.0
-rw-r--r--. 1 root root   70  1 Mar  9,28 10.88.0.49
drwxr-xr-x. 2 root root 4096  1 Mar  9,28 .
[root@srv podman]# rm 10.88.0.46
rm: remove regular file '10.88.0.46'? y
[root@srv podman]# rm 10.88.0.47
rm: remove regular file '10.88.0.47'? y
[root@srv podman]# podman start mysql-slave
mysql-slave
[root@srv podman]# podman ps
CONTAINER ID  IMAGE                           COMMAND               CREATED       STATUS            PORTS  NAMES
c97823be4683  localhost/centos-mysql-5.6:0.9  /entrypoint.sh my...  2 months ago  Up 2 minutes ago         mysql-slave
e96134b31894  docker.io/example/client:latest   start-boinc.sh        2 months ago  Up 6 minutes ago         example-client
[root@srv podman]# ls -altr
общо 20
-rwxr-x---. 1 root root    0  3 Dec  0,43 lock
drwxr-xr-x. 3 root root 4096  3 Dec  0,43 ..
-rw-r--r--. 1 root root   70  1 Mar  9,28 10.88.0.49
-rw-r--r--. 1 root root   10  1 Mar  9,32 last_reserved_ip.0
-rw-r--r--. 1 root root   70  1 Mar  9,32 10.88.0.50
drwxr-xr-x. 2 root root 4096  1 Mar  9,32 .
[root@srv podman]#

We’ve deleted the old IPs (old by date!) 10.88.0.46 and 10.88.0.47 and the mysql-slave container started successfully.

firewalld and podman (or docker) – no internet in the container and could not resolve host

If you happen to use CentOS 8 you have already discovered that Red Hat (i.e. CentOS) switch to podman, which is a fork of docker. So probably the following fix might help to someone, which does not use CentOS 8 or podman. For now, podman and docker are 99.99% the same.
So creating and starting a container is easy and in most cases one command only, but you may stumble on the error your container could not resolve or could not connect to an IP even there is a ping to the IP!
The service in the container may live a happy life without Internet access but just the mapped ports from the outside world. Still, it may happen to need Internet access, let’s say if an update should be performed.
Here is how to fix podman (docker) missing the Internet access in the container:

  • No ping to the outside world. The chances you are missing
    sysctl -w net.ipv4.ip_forward=1
    

    And do not forget to make it permanent by adding the “net.ipv4.ip_forward=1” to /etc/sysctl.conf (or a file “.conf” in /etc/sysctl.d/).

  • ping to the outside IP of the container is available, but no connection to any service is available! Probably the NAT is not enabled in your podman docker configuration. In the case with firewalld, at least, you must enable the masquerade option of the public zone
    firewall-cmd --zone=public --add-masquerade
    firewall-cmd --permanent --zone=public --add-masquerade
    

    The second command with “–permanent” is to make the option permanent over reboots.

The error – Could not resolve host (Name or service not known) despite having servers in /etc/resolv.conf and ping to them!

One may think having IPs in /etc/resolv.conf and ping to them in the container should give the container access to the Internet. But the following error occurs:

[root@srv /]# yum install telnet
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
 * base: artfiles.org
 * extras: centos.mirror.net-d-sign.de
 * updates: centos.bio.lmu.de
http://mirror.fra10.de.leaseweb.net/centos/7.7.1908/os/x86_64/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: mirror.fra10.de.leaseweb.net; Unknown error"
Trying other mirror.
http://artfiles.org/centos.org/7.7.1908/os/x86_64/repodata/repomd.xml: [Errno 14] curl#6 - "Could not resolve host: artfiles.org; Unknown error"
Trying other mirror.
^C

Exiting on user cancel
[root@srv /]# ^C
[root@srv /]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=56 time=5.05 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=5.06 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 5.050/5.055/5.061/0.071 ms
[root@srv ~]# cat /etc/resolv.conf 
nameserver 8.8.8.8
nameserver 8.8.4.4
[root@srv /]# ping google.com
ping: google.com: Name or service not known

The error 2 – Can’t connect to despite having ping to the IP!

[root@srv /]# ping 2.2.2.2
PING 2.2.2.2 (2.2.2.2) 56(84) bytes of data.
64 bytes from 2.2.2.2: icmp_seq=1 ttl=56 time=9.15 ms
64 bytes from 2.2.2.2: icmp_seq=2 ttl=56 time=9.16 ms
^C
[root@srv2 /]# mysql -h2.2.2.2 -uroot -p
Enter password: 
ERROR 2003 (HY000): Can't connect to MySQL server on '2.2.2.2' (113)
[root@srv2 /]#

Despite having ping the MySQL server on 2.2.2.2 and despite the firewall on 2.2.2.2 allows outside connections the container could not connect to it. And testing other services like HTTP, HTTPS, FTP and so on resulted in “unable to connect“, too. Simply because the NAT (aka masquerade is not enabled in the firewall).

Q_WEBENGINECORE_EXPORT QWebEngineFindTextResult has initializer but incomplete type

In addition to one of the cstdlib fix here – Gentoo building qtgui error – g++-v8/cstdlib:75:15: fatal error: stdlib.h: No such file or directory we were unable to to build “dev-qt/qtwebengine-5.14.1“.

It appeared the problem was a wrong order of header includes just like the cstdlib and if you have used the fix with QMAKE_CFLAGS_ISYSTEM in /usr/lib64/qt5/mkspecs/common/gcc-base.conf you would encounter this error. Revert back the original value in /usr/lib64/qt5/mkspecs/common/gcc-base.conf (and probably it is a good idea to rebuild the entire system with “emerge -e”):

QMAKE_CFLAGS_ISYSTEM = -isystem

and continue the build from the point you’ve stopped with something like:

ebuild /usr/portage/dev-qt/qtwebengine/qtwebengine-5.14.1.ebuild compile
ebuild /usr/portage/dev-qt/qtwebengine/qtwebengine-5.14.1.ebuild install
ebuild /usr/portage/dev-qt/qtwebengine/qtwebengine-5.14.1.ebuild qmerge
rm -Rf /var/tmp/portage/dev-qt/qtwebengine-5.14.1/

Keep on reading!