Starting up standalone ClickHouse server with basic configuration in docker

ClickHouse is a powerful column-oriented database written in C, which generates analytical and statistical reports in real-time using SQL statements!

It supports on-the-fly compression of the data, cluster setup of replicas and shards instances over thousands of servers, and multi-master cluster modes.

The ClickHouse is an ideal instrument for weblogs and easy real-time generating reports of the weblogs! Or for storing the data of user behaviour and interactions with web sites or applications.
The easiest way to run a CLickHouse instance is within a docker/podman container. The docker hub hosts official containers image maintained by the ClickHouse developers.
And this article will show how to run a ClickHouse standalone server, how to manage the ClickHouse configuration features, and what obstacles the user may encounter.

Here are some key points:

  • Main server configuration file is config.xml (in /etc/clickhouse-server/config.xml) – all server’s settings like listening port, ports, logger, remote access, cluster setup (shards and replicas), system settings (time zone, umask, and more), monitoring, query logs, dictionaries, compressions and so on. Check out the server settings: https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/
  • The main user configuration file is users.xml (in /etc/clickhouse-server/users.xml), which specifies profiles, users, passwords, ACL, quotas, and so on. It also supports SQL-driven user configuration, check out the available settings and users’ options – https://clickhouse.com/docs/en/operations/settings/settings-users/
  • By default, there is a root user with administrative privileges without password, which could only connect to the server from the localhost.
  • Do not edit the main configuration file(s). Some options may get deprecated and removed and the modified configuration file to become incompatible with the new releases.
  • Every configuration setting could be overriden with configuration files in config.d/. A good practice is to have a configuration file per each setting, which overrides the default one in config.xml. For example:
    root@srv ~ # ls -al /etc/clickhouse-server/config.d/
    total 48
    drwxr-xr-x 2 root root 4096 Nov 22 04:40 .
    drwxr-xr-x 4 root root 4096 Nov 22 04:13 ..
    -rw-r--r-- 1 root root  343 Sep 16  2021 00-path.xml
    -rw-r--r-- 1 root root   58 Nov 22 04:40 01-listen.xml
    -rw-r--r-- 1 root root  145 Feb  3  2020 02-log_to_console.xml
    

    There are three configurations files, which override the default paths (00-path.xml), change the default listen setting (01-listen.xml), and log to console (02-log_to_console.xml). Here is what to expect in 00-path.xml

    <yandex>
        <path replace="replace">/mnt/storage/ClickHouse/var/</path>
        <tmp_path replace="replace">/mnt/storage/ClickHouse/tmp/</tmp_path>
        <user_files_path replace="replace">/mnt/storage/ClickHouse/var/user_files/</user_files_path>
        <format_schema_path replace="replace">/mnt/storage/ClickHouse/format_schemas/</format_schema_path>
    </yandex>
    

    So the default settings in config.xml path, tmp_path, user_files_path and format_schema_path will be replaced with the above values.
    To open the ClickHouse for the outer world, i.e. listen to 0.0.0.0 just include a configuration file like 01-listen.xml.

    <yandex>
        <listen_host>0.0.0.0</listen_host>
    </yandex>
    
  • When all additional (including user) configuration files are processed and the result is written in preprocessed_configs/ directory in var directory, for example /var/lib/clickhouse/preprocessed_configs/
  • The configuration directories are reloaded each 3600 seconds (by default, it could be changed) by the ClickHouse server and on a change in the configuration files new processed ones are generated and in most cases the changes are loaded on-the-fly. Still, there are settings, which require manual restart of the main process. Check out the manual for more details.
  • By default, the logger is in the trace log level, which may generate an enormous amount of logging data. So just change the settings to something more production meaningful like warning level (in config.d/04-part_log.xml).
    <yandex>
        <logger>
            <level>warning</level>
        </logger>
    </yandex>
    
  • ClickHouse default ports:
    • 8123 is the HTTP client port (8443 is the HTTPS). The client can connect with curl or wget or other command-line HTTP(S) clients to manage and insert data in databases and tables.
    • 9000 is the native TCP/IP client port (9440 is the TLS enabled port for this service) to manage and insert data in databases and tables.
    • 9004 is the MySQL protocol port. ClickHouse supports MySQL wire protocol and it can be enabled by the
      <yandex>
          <mysql_port>9004</mysql_port>
      </yandex>
      
    • 9009 is the port, which ClickHouse uses to exchange data between ClickHouse servers when using cluster setup and replicas/shards.
  • There is a flag directory, in which files with special names may instruct ClickHouse to process commands. For example, creating a blank file with the name: /var/lib/clickhouse/flags/force_restore_data will instruct the ClickHouse to begin a restore procedure for the server.
  • A good practice is to make backup of the whole configuration directory despite the main configuration file(s) are not changed and in original state.
  • The SQL commands, which are supported by CickHouse server: https://clickhouse.com/docs/en/sql-reference/ and https://clickhouse.com/docs/en/sql-reference/statements/
  • The basic and fundamental table type is MergeTree, which is designed for inserting a very large amount of data into a table – https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/
  • Bear in mind, ClickHouse supports SQL syntax and some of the SQL statements, but UPDATE and DELETE statements are not supported, just INSERTs! The main idea behind the ClickHouse is not to change the data, but to add only!
  • Batch INSERTs are the preferred way of inserting data! In fact, there is a recommendation of 1 INSERT per a second in the ClickHouse manual

Keep on reading!