git status and bus error on SSD – fix READ errors by recovering part of the file

SSD and Linux encryption may not be the best idea, especially without the TRIM (allow-discards) option (or never executed fstrim?). Nevertheless, this error may occur not only on an SSD device, but just where there is a corrupted file system or device.
In our case, the SSD has some read errors. Apparently, some files or some parts of files could not be read by the git command:

[myuser@dekstop kernel]# git status -v
Bus error 84/115708)

In the case of SSD bad reads, the only working solution is to find and overwrite the problem file(s) or remove the file(s) and recreate them. A more sophisticated solution is to dump the file with dd and skip errors option enabled to another location and then overwrite the old file with the new one. So only the corrupted area of the file will be lost, which in most cases is just one or two sectors, i.e. one or two 512 bytes of data.

STEP 1) Find the bad files with the find command.

Use find Linux command and read all the files with the cat Linux command, so a bad sector will output an input/output error on READ. On write errors won’t be generated, but the sector will be automatically moved to a healthy one (the bad sector is marked and never used more).

[myuser@dekstop kernel]#  find -type f -exec cat {} > /dev/null \;
cat: ./servers/logo_description.txt: Input/output error

If multiple files are found repeat the procedure with each file.

STEP 2) Copy the healthy portion of the file.

The easiest way to remove the error is just to delete the file (or overwrite it), but if the healthy portion of the file is desirable the dd utility may be used to recover it:
Keep on reading!

Transfer only a list of files with rsync

Transferring a list of files from one server to another maybe not so easy as it looks like, if the list consists of files with symlinks in the paths. Consider the following example:

/mnt/storage1/dir1/subdir1/file1
/mnt/storage1/dir1/subdir2/file2
/mnt/storage1/dir1/subdir3/file3

But what if the subdir3 is a symlink to a sub-directory:

/mnt/storage1/dir1/subdir3 -> /mnt/storage2/dir1/subdir3

STEP 1) Generate a list of only files following the symlinks if they exist.

The best way is to use Linux command find:

find -L /mnt/storage1/ -type f &> find.files.log

The option “-L” instructs the find to follow symbolic links and the test for the file (-type f) will always match against the type of the file that the symbolic link points to and not the symlink itself.

STEP 2) rsync the list of the files

There are several options (–copy-links –copy-dirlinks), which must be used with rsync command to be able to transfer the list of files, which may include symlink directories in the files’ paths:

rsync --copy-links --copy-dirlinks --partial --files-from=/root/find.files.log --verbose --progress --stats --times --perms --owner --group 10.10.10.10::storage /

The command above uses the rsync daemon started on the source server with shared root under the name “root” in the rsync configuration. Here is a sample rsync daemon configuration (/etc/rsyncd.conf):

pid file = /run/rsyncd.pid
use chroot = yes
read only = yes
hosts allow = 10.10.10.10/24
hosts deny = *

[root]
        uid=0
        gid=0
        path = /
        comment = root partition
        exclude = /proc /sys

Of course, a relative path could be used in the rsync daemon configuration if the file generated is also a relative path. For simplicity, here the whole real path is used.

In addition, the rsync could be used without a rsync daemon, but in conjunction with ssh daemon:

rsync --rsh='ssh -p 22 -l root' --copy-links --copy-dirlinks --partial --files-from=/root/find.files.log --verbose --progress --stats --times --perms --owner --group 10.10.10.10:/ /

Again, because the files in the find.files.log are with absolute paths the root / must be used in the rsync command. Relative paths may be used, too.

List all your files (and directories) with file size over FTP without ls -R (recursive)

A great piece of software is

lftp – sophisticated file transfer program

This little console tool could ease your life significantly with many enhancements to the simple FTP protocol. This tip is for those how what to list all their files in a directory or the entire FTP account, but do not have ls command with recursive abilities. So the only option is to manually go through all the directories to fetch the listing information of the directories, but this could be automatically done by

lftp using the custom command “find” and if you add “-l” argument the output is like “ls -al” – file or directory, file permissions, user and group, file size, date and file name are shown on single line for each file.

Just execute the command with proper credentials and the starting directory of your choice. The command output could even be piped to another command.
Keep on reading!

Delete files in a directory with checking for the free space with find and stat command – effective and fast approach

If you have a big storage for let’s say your cache proxy and you may want to delete some files fast you could use

find

linux command to delete files. There are plenty examples in the Internet how to do it but many of them use pipes and sorts or other manipulation of the output, which could require the whole output before running some of the piped commands and if you have millions of files you could wait for an hour or two before the command to run and then to see it is not efficient for you case!
So we need a command (or commands), which begins to delete files immediately with some sort of simple selection. We can use linux command “find” with the most used options like

  • “-type f” – only files
  • “-mtime +/-N” – only files older or newer than N days. “-mtime +5” – files or directory older than 5 days and “-mtime -5” files or directory newer than 5 days
  • “-name ‘*.jpg'” – files or directories with patterns in the name *.jpg, for example “sample.jpg”

So we’ll use

  1. “find” to delete files
  2. and a while cycle periodically to check for the free space
  3. and if the free space gets greater than we would like we will kill the find command.

This approach is probably the most effective one, because we run only once the “find” command – if you have multiple runs of the find command in directory with many sub-directories it will check them on every execution in the same order – you may imagine what is like to have hundreds of thousands sub-directories full of files!

The goal is to execute only one find command and to stop it when we reach the desired free space!

So here an example:

echo "STARTING WITH /mnt/cache/"
find /mnt/cache/ -type f -name '*.jpg' -mtime +60 -delete &>/dev/null &
PID=$!
stime=60

while kill -0 "$PID" >/dev/null 2>&1; do
    FREESPACE=$(($(stat -f --format="%a*%S/1024" "/mnt/cache/")))
    if [[ $FREESPACE -gt 50000000 ]]
    then 
        kill "$PID"
        break
    fi
    echo "SLEEPING FOR $stime"
    sleep ${stime}s
done
echo "TERMINATING"
exit 0

The above piece of bash code will find in /mnt/cache/ only files with names ‘*.jpg’ and older than 60 days and will delete them. In parallel we check if the find command is still executing (because it could return finding nothing or just a small amount of files) and sleep for a 60 seconds. If there is 50000000 kylobites (50Gbytes) it will kill the find command!
Someone would argue we could have used

timeout command,

but this will kill “find” every time the check time passes and on every execution of find there will be the same files to check from the previous run!!! And if the first “find” command passed 10000 files and deleted 100, the second time it will check again this 9900 not deleted files and then will continue with the new one and if you execute it again and again you could enter in a endless loop, because the time is used only for checking the files previously checked and not deleted.

Here is the bash script with two parameters for the command line:

#!/bin/bash

if [ "$1" == "" ]
then 
        echo "USAGE: <script> <path> <time=60s>"
        exit 0
fi
if [ "$2" == "" ] || [ "$2" == 0 ]
then
        stime=60
else
        stime=$2
fi

sleep ${stime}s

echo "STARTING WITH $1"
find "$1" -type f -delete &>/dev/null &
PID=$!

while kill -0 "$PID" >/dev/null 2>&1; do
    FREESPACE=$(($(stat -f --format="%a*%S/1024" "$1")))
    if [[ $FREESPACE -gt 50000000 ]]
    then 
        kill "$PID"
        break
    fi
    echo "SLEEPING FOR $2"
    sleep ${stime}s
done
echo "TERMINATING"
exit 0

Check for the missing two parameters. The second parameter is not mandatory and the default value is 60s. Probably you can tune the script for your needs:

  • the find commands to look for specific files with name/mtime/type and so on.
  • third parameter in the command line to set minimum free space to check.
  • the time of the second parameter to be passed with the argument for s=seconds, m=minutes, h=hours and so on.

Replace an old IP with new one in all files of all sub-directories recursively

Here is a quick Linux tip for those who want to replace their old IP with new one for all files in a given directory and all its sub-directories recursively:

find [path-to-directory] -type f -print0 | xargs -0 sed -i 's[old-IP-escape-dot]/[new-IP]/g'

Quick example:

find /etc/nginx/ -type f -print0 | xargs -0 sed -i 's/192\.168\.10\.124/10.10.10.224/g'

As you can see the directory is “/etc/nginx” and replace it with the directory where are your (configuration) files are. We are replacing old IP=192.168.10.124 with the new one 10.10.10.224, so after the execution of the above line you’ll get modified files with IP 10.10.10.224.
You must escape the dot “.” in the IP!

Delete millions of files slowly without loading the server

There a situations when we need to delete a great deal of files from our filesystem and if we just execute

rm -Rf

the server will surely get loaded and the service it provides will degrade! What if you cannot reformat the filesystem, because the server use it extensively, but you need to delete let’s say a couple of millions file from it? We can use find and usleep (in most linux distro this program is installed by an additional package). The idea is to delete files one by one tuning the pause between every delete. Here you can execute this command in the background or a screen:

find /mnt/storage/old/ -type f -exec echo {} \; -exec rm {} \; -exec usleep 200000 \;

usleep accepts microseconds, so 200000 microseconds are 0.2 seconds. You can tune it precisely with a step of just a microsecond. In the real world under the bash console we probably will use values of max 1/10 of a second around above 100000 microseconds. Execute the command and then watch your server load and tune.

  • usleep in CentOS 7 is installed with package “initscripts”, which is installed by default
  • usleep in Ubuntu is missing and probably won’t find any safe place to download a package to install, but it can be sort of replace with “sleep <floating_point_number>s”, GNU sleep could accept floating point number for the delay and when added “s” at the end it could sleep for a fractions of a seconds. So the command for the Ubuntu is slightly changed:
    find /mnt/storage/old/ -type f -exec echo {} \; -exec rm {} \; -exec sleep 0.2s \;
    
  • not GNU version of sleep require NUMBER, so the smallest sleep is only 1 second, which is too big for the purpose. Check your man manual to see if your system has GNU sleep command.

bash: find all files between a given time period

Here is the command to find all files between two given dates with find linux command:

find /home/ -newermt 20140901T0000 -not -newermt 20141001T0000 -type f

to use “find” with

-newermt

you must have find version above 4.3.3.
With older version of the find utility it can be used with the time of two files.

  1. create two temporary files
  2. “touch” them with the right time
  3. execute find command with “-newer”
  4. delete the two temporary files

Here is the bash code:

dt1=$(date -d '2014-09-01' +'%Y%m%d%H%M.%S'); dt2=$(date -d '2014-10-01' +'%Y%m%d%H%M.%S'); touch -t $dt1 "$dt1"; touch -t $dt2 "$dt2"; find . -type f -newer "$dt2" -not -newer "$dt1";rm "$dt1" "$dt2"