ClickHouse is a powerful column-oriented database written in C, which generates analytical and statistical reports in real-time using SQL statements!
It supports on-the-fly compression of the data, cluster setup of replicas and shards instances over thousands of servers, and multi-master cluster modes.
The ClickHouse is an ideal instrument for weblogs and easy real-time generating reports of the weblogs! Or for storing the data of user behaviour and interactions with web sites or applications.
The easiest way to run a CLickHouse instance is within a docker/podman container. The docker hub hosts official containers image maintained by the ClickHouse developers.
And this article will show how to run a ClickHouse standalone server, how to manage the ClickHouse configuration features, and what obstacles the user may encounter.
Here are some key points:
- Main server configuration file is config.xml (in /etc/clickhouse-server/config.xml) – all server’s settings like listening port, ports, logger, remote access, cluster setup (shards and replicas), system settings (time zone, umask, and more), monitoring, query logs, dictionaries, compressions and so on. Check out the server settings: https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/
- The main user configuration file is users.xml (in /etc/clickhouse-server/users.xml), which specifies profiles, users, passwords, ACL, quotas, and so on. It also supports SQL-driven user configuration, check out the available settings and users’ options – https://clickhouse.com/docs/en/operations/settings/settings-users/
- By default, there is a root user with administrative privileges without password, which could only connect to the server from the localhost.
- Do not edit the main configuration file(s). Some options may get deprecated and removed and the modified configuration file to become incompatible with the new releases.
- Every configuration setting could be overriden with configuration files in config.d/. A good practice is to have a configuration file per each setting, which overrides the default one in config.xml. For example:
root@srv ~ # ls -al /etc/clickhouse-server/config.d/ total 48 drwxr-xr-x 2 root root 4096 Nov 22 04:40 . drwxr-xr-x 4 root root 4096 Nov 22 04:13 .. -rw-r--r-- 1 root root 343 Sep 16 2021 00-path.xml -rw-r--r-- 1 root root 58 Nov 22 04:40 01-listen.xml -rw-r--r-- 1 root root 145 Feb 3 2020 02-log_to_console.xml
There are three configurations files, which override the default paths (00-path.xml), change the default listen setting (01-listen.xml), and log to console (02-log_to_console.xml). Here is what to expect in 00-path.xml
<yandex> <path replace="replace">/mnt/storage/ClickHouse/var/</path> <tmp_path replace="replace">/mnt/storage/ClickHouse/tmp/</tmp_path> <user_files_path replace="replace">/mnt/storage/ClickHouse/var/user_files/</user_files_path> <format_schema_path replace="replace">/mnt/storage/ClickHouse/format_schemas/</format_schema_path> </yandex>
So the default settings in config.xml path, tmp_path, user_files_path and format_schema_path will be replaced with the above values.
To open the ClickHouse for the outer world, i.e. listen to 0.0.0.0 just include a configuration file like 01-listen.xml.<yandex> <listen_host>0.0.0.0</listen_host> </yandex>
- When all additional (including user) configuration files are processed and the result is written in preprocessed_configs/ directory in var directory, for example /var/lib/clickhouse/preprocessed_configs/
- The configuration directories are reloaded each 3600 seconds (by default, it could be changed) by the ClickHouse server and on a change in the configuration files new processed ones are generated and in most cases the changes are loaded on-the-fly. Still, there are settings, which require manual restart of the main process. Check out the manual for more details.
- By default, the logger is in the trace log level, which may generate an enormous amount of logging data. So just change the settings to something more production meaningful like warning level (in config.d/04-part_log.xml).
<yandex> <logger> <level>warning</level> </logger> </yandex>
- ClickHouse default ports:
- 8123 is the HTTP client port (8443 is the HTTPS). The client can connect with curl or wget or other command-line HTTP(S) clients to manage and insert data in databases and tables.
- 9000 is the native TCP/IP client port (9440 is the TLS enabled port for this service) to manage and insert data in databases and tables.
- 9004 is the MySQL protocol port. ClickHouse supports MySQL wire protocol and it can be enabled by the
<yandex> <mysql_port>9004</mysql_port> </yandex>
- 9009 is the port, which ClickHouse uses to exchange data between ClickHouse servers when using cluster setup and replicas/shards.
- There is a flag directory, in which files with special names may instruct ClickHouse to process commands. For example, creating a blank file with the name: /var/lib/clickhouse/flags/force_restore_data will instruct the ClickHouse to begin a restore procedure for the server.
- A good practice is to make backup of the whole configuration directory despite the main configuration file(s) are not changed and in original state.
- The SQL commands, which are supported by CickHouse server: https://clickhouse.com/docs/en/sql-reference/ and https://clickhouse.com/docs/en/sql-reference/statements/
- The basic and fundamental table type is MergeTree, which is designed for inserting a very large amount of data into a table – https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/
- Bear in mind, ClickHouse supports SQL syntax and some of the SQL statements, but UPDATE and DELETE statements are not supported, just INSERTs! The main idea behind the ClickHouse is not to change the data, but to add only!
- Batch INSERTs are the preferred way of inserting data! In fact, there is a recommendation of 1 INSERT per a second in the ClickHouse manual
Docker/podman container hints to
- Map configuration (/etc/clickhouse-server/)config.d/ and (/etc/clickhouse-server/)users.d/ directories from the host direcotries.
- Map the all the ClickHouse server ports to the host network.
- Map the data (/var/lib/clickhouse) and log (/var/log/clickhouse-server) directories to the host directories.
- Use specific tag version from the docker hub (https://hub.docker.com/r/yandex/clickhouse-server/tags) to run the container
Start Clickhouse docker/podman container
The structure of the host’s mapped directories is under /srv/clickhose:
/srv/clickhouse/ /srv/clickhouse/config /srv/clickhouse/config/config.d /srv/clickhouse/config/users.d /srv/clickhouse/log /srv/clickhouse/data
The modified settings in separated files under /srv/clickhouse/config/config.d/
root@srv ~ cd /srv/clickhouse/config/config.d root@srv ~ ls -al total 20 drwxr-xr-x 2 root root 4096 29 Mar 14:23 . drwxr-xr-x 4 root root 4096 29 Mar 14:08 .. -rw-r--r-- 1 root root 58 29 Mar 14:22 01-listen.xml -rw-r--r-- 1 root root 77 29 Mar 14:23 02-logger.xml -rw-r--r-- 1 root root 53 29 Mar 14:23 03-mysql.xml root@srv config.d # cat 01-listen.xml <yandex> <listen_host>0.0.0.0</listen_host> </yandex> root@srv config.d # cat 02-logger.xml <yandex> <logger> <level>warning</level> </logger> </yandex> root@srv config.d # cat 03-mysql.xml <yandex> <mysql_port>9004</mysql_port> </yandex>
Start the docker/podman container with mapped ports and directories:
docker run -d --ulimit nofile=262144:262144 \ -v /srv/clickhouse/config/config.d:/etc/clickhouse-server/config.d \ -v /srv/clickhouse/config/users.d:/etc/clickhouse-server/users.d \ -v /srv/clickhouse/data:/var/lib/clickhouse \ -v /srv/clickhouse/log:/var/log/clickhouse-server \ -p 8123:8123 -p 8443:8443 -p 9000:9000 -p 9004:9004 -p 9009:9009 \ --name=srv-clickhouse yandex/clickhouse-server:21.3.20.1
This command will run a docker container with the ClickHouse server in standalone mode with the ports and important directories mapped from the host.
The docker logs shows no errors:
root@srv config.d # docker logs srv-clickhouse Processing configuration file '/etc/clickhouse-server/config.xml'. Merging configuration file '/etc/clickhouse-server/config.d/01-listen.xml'. Merging configuration file '/etc/clickhouse-server/config.d/02-logger.xml'. Merging configuration file '/etc/clickhouse-server/config.d/03-mysql.xml'. Logging warning to /var/log/clickhouse-server/clickhouse-server.log Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
The docker will pull the ClickHouse version from the official Docker repository if not found locally:
root@srv config.d # docker run -d --ulimit nofile=262144:262144 -v /srv/clickhouse/config/config.d:/etc/clickhouse-server/config.d -v /srv/clickhouse/config/users.d:/etc/clickhouse-server/users.d -v /srv/clickhouse/data:/var/lib/clickhouse -v /srv/clickhouse/log:/var/log/clickhouse-server -p 8123:8123 -p 8443:8443 -p 9000:9000 -p 9004:9004 -p 9009:9009 --name=srv-clickhouse yandex/clickhouse-server:21.3.20.1 Unable to find image 'yandex/clickhouse-server:21.3.20.1' locally 21.3.20.1: Pulling from yandex/clickhouse-server ea362f368469: Pull complete 54739179c5e5: Pull complete d0ce5a474780: Pull complete 28b2e29dbd36: Pull complete c5c1469c5183: Pull complete 7d899235912c: Pull complete cee82804555b: Pull complete 5d95cce6848d: Pull complete 795fb4371f9d: Pull complete Digest: sha256:4eccfffb01d735ab7c1af9a97fbff0c532112a6871b2bb5fe5c478d86d247b7e Status: Downloaded newer image for yandex/clickhouse-server:21.3.20.1 f6559f10646d4d33cc1fb9fbaf4085246490e33a67c1717aa257d4a340b027a0
On first start, the system databases are created automatically:
root@srv clickhouse # find . ./config ./config/config.d ./config/config.d/03-mysql.xml ./config/config.d/01-listen.xml ./config/config.d/02-logger.xml ./config/users.d ./log ./log/clickhouse-server.err.log ./log/clickhouse-server.log ./data ./data/metadata_dropped ./data/flags ./data/format_schemas ./data/access ./data/access/row_policies.list ./data/access/settings_profiles.list ./data/access/roles.list ./data/access/users.list ./data/access/quotas.list ./data/status ./data/metadata ./data/metadata/default ./data/metadata/default.sql ./data/metadata/system.sql ./data/metadata/system ./data/preprocessed_configs ./data/preprocessed_configs/config.xml ./data/preprocessed_configs/users.xml ./data/user_files ./data/store ./data/store/586 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/count.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/primary.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/data.mrk3 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/default_compression_codec.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/checksums.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/data.bin ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/partition.dat ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/minmax_event_date.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_2_2_0/columns.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/count.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/primary.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/data.mrk3 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/default_compression_codec.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/checksums.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/data.bin ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/partition.dat ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/minmax_event_date.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_3_3_0/columns.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/detached ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/format_version.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/count.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/primary.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/data.mrk3 ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/default_compression_codec.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/checksums.txt ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/data.bin ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/partition.dat ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/minmax_event_date.idx ./data/store/586/5861ef9d-e712-4c1a-af7c-909a8ab09be1/202203_1_1_0/columns.txt ./data/store/f12 ./data/store/f12/f121bb20-d3d5-4a34-8cde-3c0b2a8acb32 ./data/store/f12/f121bb20-d3d5-4a34-8cde-3c0b2a8acb32/metric_log.sql ./data/store/f12/f121bb20-d3d5-4a34-8cde-3c0b2a8acb32/trace_log.sql ./data/store/5a4 ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0 ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/count.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/primary.idx ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/data.mrk3 ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/default_compression_codec.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/checksums.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/data.bin ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/partition.dat ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/minmax_event_date.idx ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_2_2_0/columns.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/detached ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/format_version.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0 ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/count.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/primary.idx ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/data.mrk3 ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/default_compression_codec.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/checksums.txt ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/data.bin ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/partition.dat ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/minmax_event_date.idx ./data/store/5a4/5a4b4e31-d1c5-491f-bac8-8caf8f1bab7b/202203_1_1_0/columns.txt ./data/store/951 ./data/store/951/951b7d68-c474-4f58-ba58-91509bd0712c ./data/tmp ./data/data ./data/data/default ./data/data/system ./data/data/system/trace_log ./data/data/system/metric_log ./data/dictionaries_lib
Connect locally with the native client
Just attach to the running docker container and execute the client ClickHouse program to enter the ClickHouse console shell and execute SQL commands:
root@srv clickhouse # docker exec -it srv-clickhouse bash root@78bda00c532c:~# clickhouse-client ClickHouse client version 21.3.20.1 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.3.20 revision 54447. 78bda00c532c :) show databases; SHOW DATABASES Query id: 7ccebb4e-9aea-4a0a-8cc4-835aeafcba78 ┌─name────┐ │ default │ │ system │ └─────────┘ 2 rows in set. Elapsed: 0.002 sec. 78bda00c532c :)
The ClickHouse settings can be viewed with an ordinary SELECT statement:
root@78bda00c532c:~# clickhouse-client ClickHouse client version 21.3.20.1 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.3.20 revision 54447. 78bda00c532c :) SELECT * FROM system.settings ORDER BY name LIMIT 10; SELECT * FROM system.settings ORDER BY name ASC LIMIT 10 Query id: a5f54317-f535-4560-a054-f51453e99bdd ┌─name─────────────────────────────────────────────────┬─value─┬─changed─┬─description───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type───┐ │ add_http_cors_header │ 0 │ 0 │ Write add http CORS header. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ aggregate_functions_null_for_empty │ 0 │ 0 │ Rewrite all aggregate functions in a query, adding -OrNull suffix to them │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ aggregation_memory_efficient_merge_threads │ 0 │ 0 │ Number of threads to use for merge intermediate aggregation results in memory efficient mode. When bigger, then more memory is consumed. 0 means - same as 'max_threads'. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ UInt64 │ │ allow_changing_replica_until_first_data_packet │ 0 │ 0 │ Allow HedgedConnections to change replica until receiving first data packet │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_ddl │ 1 │ 0 │ If it is set to true, then a user is allowed to executed DDL queries. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_distributed_ddl │ 1 │ 0 │ If it is set to true, then a user is allowed to executed distributed DDL queries. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_drop_detached │ 0 │ 0 │ Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_experimental_alter_materialized_view_structure │ 0 │ 0 │ Allow atomic alter on Materialized views. Work in progress. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_experimental_bigint_types │ 0 │ 0 │ Allow Int128, Int256, UInt256 and Decimal256 types │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ │ allow_experimental_database_atomic │ 1 │ 0 │ Obsolete setting, does nothing. Will be removed after 2021-02-12 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ Bool │ └──────────────────────────────────────────────────────┴───────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴────────┘ 10 rows in set. Elapsed: 0.005 sec. 78bda00c532c :) Bye.
Limit is added to the query because otherwise the query return 390 rows, i.e. settings.
Use the native ClickHouse client
Create a database, create a simple table of type MergeTree (the basic and fundamental table type in ClickHouse, more details in https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/) and insert a record in the newly created table:
root@78bda00c532c:~# clickhouse-client ClickHouse client version 21.3.20.1 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.3.20 revision 54447. 78bda00c532c :) CREATE TABLE mytest.xxx (a UInt8, d Date) ENGINE = MergeTree() ORDER BY (a) PARTITION BY toYYYYMM(d); CREATE TABLE mytest.xxx ( `a` UInt8, `d` Date ) ENGINE = MergeTree PARTITION BY toYYYYMM(d) ORDER BY a Query id: 9e2f7360-a1e1-40ff-bf29-f843e467f0c2 Ok. 0 rows in set. Elapsed: 0.014 sec. 78bda00c532c :) INSERT INTO mytest.xxx values(8,'2020-08-13'); INSERT INTO mytest.xxx VALUES Query id: 06773c45-17fd-4d38-b600-b4aa906a577e Ok. 1 rows in set. Elapsed: 0.016 sec. 78bda00c532c :) SELECT * FROM mytest.xxx; SELECT * FROM mytest.xxx Query id: a01fb45b-7d0d-47b1-ab40-01ed31888096 ┌─a─┬──────────d─┐ │ 8 │ 2020-08-13 │ └───┴────────────┘ 1 rows in set. Elapsed: 0.008 sec. 78bda00c532c :)
Use HTTP client to manage ClickHouse server
Using curl to execute SQL queries.
Using curl‘s data-binary option:
root@srv clickhouse # echo "INSERT INTO mytest.xxx values(7,'2020-08-14');" | curl 'http://localhost:8123/' --data-binary @- root@srv clickhouse # echo "SELECT * from mytest.xxx;" | curl 'http://localhost:8123/' --data-binary @- 8 2020-08-13 7 2020-08-14
Or curl with query parameters and encoded properly:
root@srv clickhouse # curl http://localhost:8123/?query=SHOW%20DATABASES default mytest system root@srv clickhouse # curl http://localhost:8123/?query=SHOW%20TABLES%20FROM%20mytest xxx