This post details my recent experience with ClickHouse backup and restore.

Context: A part of a task at work involves decent size of mutations (both automated daily workflows and occasional manual adhoc mutations) on ClickHouse. As per standard practice, easy backup and restore procedures have been set up - for automated workflows, data backup is integrated within the workflow itself, while for manual adhoc mutations, backups of the required tables are taken on disk using the native ClickHouse backup command:BACKUP TABLE db_name.table_name TO Disk('backups', 'table_name.zip'). Additionally, we maintain the previous day’s AWS disk snapshots as a last resort.

Recently, I had to revisit this backup and restore approach due to two major scenerio’s:

  1. Mutation volume has increased 2-3x
  2. Need to frequently perform manual data migration from production to development (particularly for specific tables) for testing new additions

Implementing automated remote storage backup (S3, etc.) was the ideal solution, as it provides independence from local disk backup issues and allows data to be easily pulled to development servers (for either the complete database or specific tables) using restore commands. This approach would be particularly efficient since mutations happen on less than 15% of the database tables(12 out of 90), making the restore process faster when targeting only the affected tables.

The key advantage of using ClickHouse’s backup and restore functionality over AWS disk snapshots is its granular control at the table level, which enables quick and hassle-free recovery for table-based scenarios with just a restore command.

There are two approaches to implement this:

  1. Use ClickHouse native backup and restore commands. However, this requires additional work to handle, schema migrations, cluster management (syncing before freezing), dropping replicas in case of inactive, etc

  2. Use the Altinity ClickHouse-backup tool, which provides these features out-of-the-box

I opted for the ClickHouse-backup tool due to its comprehensive feature set. Here’s my experience using it:

Backup

Commands

For a complete backup:

clickhouse-backup create_remote app_name_complete_backup_13022025

For a specific database:

clickhouse-backup create_remote -t db_name.* db_name_complete_backup

For a specific table:

clickhouse-backup create_remote -t db_name.table_name table_name_backup

For subsequent backups, use the incremental backups:

clickhouse-backup create_remote --diff-from-remote app_name_complete_backup_13022025 service_name_complete_backup_date

You can find more CLI commands in the official documentation.

Performance Analysis

Experimented with different compression and concurrency settings to find the optimal balance between backup size and performance with respect to remote storage (S3). Here’s the reference configuration (/etc/clickhouse-backup/config.yml)

general:
  remote_storage: s3
  max_file_size: 0
  backups_to_keep_local: -1
  backups_to_keep_remote: 2
  log_level: error
  ....
  download_concurrency: 8
  upload_concurrency: 4
  ....
  skip_tables: - system.* - INFORMATION*SCHEMA.* - information*schema.* - \_temporary*and_external_tables.*
  skip_table_engines: []
  ....
  check_replicas_before_attach: true
  ....
  max_connections: 4
  ....
s3:
  ....
  bucket: {bucket_name}
  ....
  region: {region}
  ....
  path: {path}
  ....
  compression_level: 1
  compression_format: zstd
  ....
  storage_class: STANDARD
  custom_storage_class_map: {}
  concurrency: 5
  ....
  max_parts_count: 4000
  allow_multipart_download: true

Here are my findings:

Initial backup:

  1. Using zstd compression with level 3:
Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 81.37 GB
Time taken : ~10 minutes
Peak cpu increase % : ~40%
  1. Using zstd compression with level 1:
Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 92.71 GB
Time taken : ~9 minutes
Peak cpu increase % : ~30%

Incremental backups:

  1. Using zstd compression level 1:
Uncompressed data size : 746.72 GB
Updated CH db size next day(compressed) : 143.61 GB
S3 remote backup size : 11.55 GB (includes mutated data/new parts as well, of last run mutations)
Time taken : 70 seconds
Peak cpu increase % : ~20%

Restore

Commands

Complete Database Restore with schema

clickhouse-backup restore_remote app_name_complete_backup_13022025

For data-only restore (preserving existing schema):

clickhouse-backup restore_remote -d app_name_complete_backup_13022025

Selective Restore

For restoring a specific database with schema:

clickhouse-backup restore_remote -t db_name.* app_name_complete_backup_13022025

For database migration with name mapping (e.g., prod to development):

clickhouse-backup restore_remote -m prod_db_name:development_db_name app_name_complete_backup_13022025

Performance Analysis

  1. Complete restore with schema (zstd level 3):
clickhouse-backup restore_remote app_name_complete_backup_13022025

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 81.37 GB
Time taken : ~25 minutes
Peak cpu increase % : ~30%
  1. Data-only restore (zstd level 3):
clickhouse-backup restore_remote -d app_name_complete_backup_13022025

Almost same as above
  1. Complete restore (zstd level 1):
clickhouse-backup restore_remote app_name_complete_backup_13022025

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 92.71 GB
Time taken : ~15 minutes
Peak cpu increase % : ~40%
  1. Selective table restore (zstd level 1): 4 mutation-affected tables with schema
clickhouse-backup restore_remote -t db_name.table_pattern* app_name_complete_backup_13022025

Uncompressed data size : 213.1 GB
CH db size(compressed) : 40.7 GB
S3 remote backup size : 17.3 GB
Time taken : ~3 minutes
Peak cpu increase % : ~40%

Trade-offs

  • Lower compression levels result in faster backups but larger S3 storage size
  • Higher concurrency speeds up backups but increases resource utilization

For my use case, zstd compression with level 1 provided the best balance for both backup and restore operations, considering that ClickHouse data is already compressed using LZ4 by default.

Limitations

There’s a known issue with backing up materialized views that don’t use the TO table clause. In such cases, ClickHouse creates a separate table with the format .inner_id.XXX (where XXX is the UUID of the materialized view).

Solution: Always use the TO table_name clause when creating materialized views.

Key takeaways

  • Table-level granular backup and restore capabilities enable efficient targeted operations
  • Incremental backups significantly reduce both backup duration and storage requirements
  • Simple data migration to development servers with a single command