Delete files in a directory with checking for the free space with find and stat command – effective and fast approach

If you have a big storage for let’s say your cache proxy and you may want to delete some files fast you could use

find

linux command to delete files. There are plenty examples in the Internet how to do it but many of them use pipes and sorts or other manipulation of the output, which could require the whole output before running some of the piped commands and if you have millions of files you could wait for an hour or two before the command to run and then to see it is not efficient for you case!
So we need a command (or commands), which begins to delete files immediately with some sort of simple selection. We can use linux command “find” with the most used options like

  • “-type f” – only files
  • “-mtime +/-N” – only files older or newer than N days. “-mtime +5” – files or directory older than 5 days and “-mtime -5” files or directory newer than 5 days
  • “-name ‘*.jpg'” – files or directories with patterns in the name *.jpg, for example “sample.jpg”

So we’ll use

  1. “find” to delete files
  2. and a while cycle periodically to check for the free space
  3. and if the free space gets greater than we would like we will kill the find command.

This approach is probably the most effective one, because we run only once the “find” command – if you have multiple runs of the find command in directory with many sub-directories it will check them on every execution in the same order – you may imagine what is like to have hundreds of thousands sub-directories full of files!

The goal is to execute only one find command and to stop it when we reach the desired free space!

So here an example:

echo "STARTING WITH /mnt/cache/"
find /mnt/cache/ -type f -name '*.jpg' -mtime +60 -delete &>/dev/null &
PID=$!
stime=60

while kill -0 "$PID" >/dev/null 2>&1; do
    FREESPACE=$(($(stat -f --format="%a*%S/1024" "/mnt/cache/")))
    if [[ $FREESPACE -gt 50000000 ]]
    then 
        kill "$PID"
        break
    fi
    echo "SLEEPING FOR $stime"
    sleep ${stime}s
done
echo "TERMINATING"
exit 0

The above piece of bash code will find in /mnt/cache/ only files with names ‘*.jpg’ and older than 60 days and will delete them. In parallel we check if the find command is still executing (because it could return finding nothing or just a small amount of files) and sleep for a 60 seconds. If there is 50000000 kylobites (50Gbytes) it will kill the find command!
Someone would argue we could have used

timeout command,

but this will kill “find” every time the check time passes and on every execution of find there will be the same files to check from the previous run!!! And if the first “find” command passed 10000 files and deleted 100, the second time it will check again this 9900 not deleted files and then will continue with the new one and if you execute it again and again you could enter in a endless loop, because the time is used only for checking the files previously checked and not deleted.

Here is the bash script with two parameters for the command line:

#!/bin/bash

if [ "$1" == "" ]
then 
        echo "USAGE: <script> <path> <time=60s>"
        exit 0
fi
if [ "$2" == "" ] || [ "$2" == 0 ]
then
        stime=60
else
        stime=$2
fi

sleep ${stime}s

echo "STARTING WITH $1"
find "$1" -type f -delete &>/dev/null &
PID=$!

while kill -0 "$PID" >/dev/null 2>&1; do
    FREESPACE=$(($(stat -f --format="%a*%S/1024" "$1")))
    if [[ $FREESPACE -gt 50000000 ]]
    then 
        kill "$PID"
        break
    fi
    echo "SLEEPING FOR $2"
    sleep ${stime}s
done
echo "TERMINATING"
exit 0

Check for the missing two parameters. The second parameter is not mandatory and the default value is 60s. Probably you can tune the script for your needs:

  • the find commands to look for specific files with name/mtime/type and so on.
  • third parameter in the command line to set minimum free space to check.
  • the time of the second parameter to be passed with the argument for s=seconds, m=minutes, h=hours and so on.