more than the default 4 parallel processes using distributed compiling with distcc

Author:

Distributed compilation could greatly speed the build process of Gentoo packages (and not only Gentoo, of course). If you tend to use Gentoo on a laptop or a relatively old CPU you may want to build packages distributively across multiple hosts.
Different (Linux) distributions use different configurations and environment scheme and sometimes it is difficult to sift the configuration, which could be applied to your setup. This is not a tutorial on how to enable parallel processing in Gentoo but it is just our client-site setup.

By default, there is a limit of 4 parallel processes, which is utterly insufficient, because nowadays most servers have more than 8 cores/logical compute units (not to mention that probably most would have 16 and above cores compute units).

The environment variable DISTCC_HOSTS controls, which hosts will receive files for the compilation of what they support and what is the limit of parallel processes.

In Gentoo we set this variable in the /etc/portage/make.conf. Here what you may include in make.conf to have 16 parallel remote processes and up to maximum 4 local (if the remote fails):

MAKEOPTS="-j16 -l4"
FEATURES="distcc"
DISTCC_HOSTS="192.168.0.101/16"

We use the environment DISTCC_HOSTS (here in Gentoo put in the make.conf, but in another Linux distribution an environment variable with this name should be set) because it is easy to set up and control globally for the Gentoo emerge system.
According to the documents:

In order, distcc looks in the $DISTCC_HOSTS environment variable, the user’s $DISTCC_DIR/hosts file, and the system-wide host file.

So when using emerge to build the packages, the emerge will rely on $DISTCC_HOSTS in make.conf (/etc/portage/make.conf or /etc/make.conf if you still use the old path), “/var/tmp/portage/.distcc/” (the build process uses “portage” user and group, not root!) and “/etc/distcc/hosts”. The first option used in the order above will be set the hosts and the limitation for the distributed processing. So if you use $DISTCC_HOSTS in make.conf (or environment) you wouldn’t need to set the “hosts” file.
Separate the different hosts with white space if you have more than one and always use the notation “/LIMIT” for each host. The default value is only 4 parallel processes (i.e it is implicitly added /4 to each hosts in the configuration!)

Monitoring

Verify you are using more than the default 4 parallel processes by the console monitoring program:

root@srv ~ # DISTCC_DIR="/var/tmp/portage/.distcc/" distccmon-text 1
 24877  Compile     gegl-region-generic.c                     192.168.0.101[3]
 24786  Compile     gegl-sampler.c                            192.168.0.101[4]
 24891  Compile     gegl-tile-source.c                        192.168.0.101[5]
 24659  Compile     gegl-buffer-load.c                        192.168.0.101[8]
 24660  Compile     gegl-buffer-save.c                        192.168.0.101[9]
 24664  Compile     gegl-buffer-linear.c                      192.168.0.101[10]
 24769  Compile     gegl-sampler-nearest.c                    192.168.0.101[11]
 24986  Connect     gegl-tile-handler.c                       192.168.0.101[14]
 24919  Preprocess                                                localhost[0]
 24774  Preprocess                                                localhost[1]
 24953  Preprocess                                                localhost[1]
 24898  Preprocess                                                localhost[2]
 24968  Preprocess                                                localhost[3]
 24975  Preprocess                                                localhost[4]
 24832  Preprocess                                                localhost[7]

 24832  Compile     gegl-sampler-lohalo.c                     192.168.0.101[0]
 24953  Compile     gegl-tile-storage.c                       192.168.0.101[1]
 24898  Compile     gegl-tile.c                               192.168.0.101[2]
 25045  Compile     gegl-tile-handler-chain.c                 192.168.0.101[3]
 24919  Compile     gegl-tile-backend.c                       192.168.0.101[6]
 24968  Compile     gegl-tile-backend-file.c                  192.168.0.101[7]
 24660  Compile     gegl-buffer-save.c                        192.168.0.101[9]
 24664  Compile     gegl-buffer-linear.c                      192.168.0.101[10]
 24769  Compile     gegl-sampler-nearest.c                    192.168.0.101[11]
 24774  Compile     gegl-sampler-cubic.c                      192.168.0.101[12]
 24975  Compile     gegl-tile-backend-ram.c                   192.168.0.101[13]
 25078  Compile     gegl-tile-handler-log.c                   192.168.0.101[14]
 25068  Preprocess                                                localhost[0]
 25009  Preprocess                                                localhost[0]

 24832  Compile     gegl-sampler-lohalo.c                     192.168.0.101[0]
 24898  Compile     gegl-tile.c                               192.168.0.101[2]
 25101  Compile     gegl-tile-handler-zoom.c                  192.168.0.101[4]
 25009  Compile     gegl-tile-handler-cache.c                 192.168.0.101[5]
 24968  Compile     gegl-tile-backend-file.c                  192.168.0.101[7]
 25068  Compile     gegl-tile-handler-empty.c                 192.168.0.101[8]
 24660  Compile     gegl-buffer-save.c                        192.168.0.101[9]

Note we set the DISTCC_DIR=”/var/tmp/portage/.distcc/” because as said the Gentoo emerge command uses “portage” user and directory “/var/tmp/portage” to build the packages (by default) and the distcc state directory is under “/var/tmp/portage/.distcc/”. You can see the numbe of parallel processes every second by this command.

Bonus – client and server

Quick mentioning how to set up the system on the client and the server. Check out the Gentoo official wiki https://wiki.gentoo.org/wiki/Distcc Here we summarize it up the important part.

  1. As stated in the official Gentoo wiki – the same GCC and binutils versions should be used on the server and client. To be sure just create the hosts by rsyncing one of the client!
  2. On the Gentoo client and Gentoo server, you must just install sys-devel/distcc
    emerge -v sys-devel/distcc
    
  3. On the server start the distcc daemon, changing only the configuration file “/etc/conf.d/distccd” by adding to the DISTCCD_OPTS “-j 16” (16 is the maximum parallel processes allowed, so you may change this value accordingly) and allowing the IP or network of the machines, which will use this host to send compile jobs (the whole network 192.168.0.0/24 is allowed).
    DISTCCD_OPTS="${DISTCCD_OPTS} -N 15 -j 16"
    DISTCCD_OPTS="${DISTCCD_OPTS} --allow 192.168.0.0/24"
    

    And start the server:

    /etc/init.d/distccd start
    

    Or if you use systemd check out the official wiki.

  4. On the client add the above lines from the top in “/etc/portage/make.conf” (or /etc/make.conf if you still use the old path).
  5. start building a package with emerge on the client

Leave a Reply

Your email address will not be published. Required fields are marked *