Optimizing ClickHouse Backups

This post details my recent experience with ClickHouse backup and restore.

Context: A part of a task at work involves decent size of mutations (both automated daily workflows and occasional manual adhoc mutations) on ClickHouse. As per standard practice, easy backup and restore procedures have been set up - for automated workflows, data backup is integrated within the workflow itself, while for manual adhoc mutations, backups of the required tables are taken on disk using the native ClickHouse backup command:BACKUP TABLE db_name.table_name TO Disk('backups', 'table_name.zip'). Additionally, we maintain the previous day’s AWS disk snapshots as a last resort.

Recently, I had to revisit this backup and restore approach due to two major scenerio’s:

Mutation volume has increased 2-3x
Need to frequently perform manual data migration from production to development (particularly for specific tables) for testing new additions

Implementing automated remote storage backup (S3, etc.) was the ideal solution, as it provides independence from local disk backup issues and allows data to be easily pulled to development servers (for either the complete database or specific tables) using restore commands. This approach would be particularly efficient since mutations happen on less than 15% of the database tables(12 out of 90), making the restore process faster when targeting only the affected tables.

The key advantage of using ClickHouse’s backup and restore functionality over AWS disk snapshots is its granular control at the table level, which enables quick and hassle-free recovery for table-based scenarios with just a restore command.

There are two approaches to implement this:

Use ClickHouse native backup and restore commands. However, this requires additional work to handle, schema migrations, cluster management (syncing before freezing), dropping replicas in case of inactive, etc
Use the Altinity ClickHouse-backup tool, which provides these features out-of-the-box

I opted for the ClickHouse-backup tool due to its comprehensive feature set. Here’s my experience using it:

Backup

Commands

For a complete backup:

clickhouse-backup create_remote app_name_complete_backup_13022025

For a specific database:

clickhouse-backup create_remote -t db_name.* db_name_complete_backup

For a specific table:

clickhouse-backup create_remote -t db_name.table_name table_name_backup

For subsequent backups, use the incremental backups:

clickhouse-backup create_remote --diff-from-remote app_name_complete_backup_13022025 service_name_complete_backup_date

You can find more CLI commands in the official documentation.

Performance Analysis

Experimented with different compression and concurrency settings to find the optimal balance between backup size and performance with respect to remote storage (S3). Here’s the reference configuration (/etc/clickhouse-backup/config.yml)

general:
  remote_storage: s3
  max_file_size: 0
  backups_to_keep_local: -1
  backups_to_keep_remote: 2
  log_level: error
  ....
  download_concurrency: 8
  upload_concurrency: 4
  ....
  skip_tables: - system.* - INFORMATION*SCHEMA.* - information*schema.* - \_temporary*and_external_tables.*
  skip_table_engines: []
  ....
  check_replicas_before_attach: true
  ....
  max_connections: 4
  ....
s3:
  ....
  bucket: {bucket_name}
  ....
  region: {region}
  ....
  path: {path}
  ....
  compression_level: 1
  compression_format: zstd
  ....
  storage_class: STANDARD
  custom_storage_class_map: {}
  concurrency: 5
  ....
  max_parts_count: 4000
  allow_multipart_download: true

Here are my findings:

Initial backup:

Using zstd compression with level 3:

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 81.37 GB
Time taken : ~10 minutes
Peak cpu increase % : ~40%

Using zstd compression with level 1:

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 92.71 GB
Time taken : ~9 minutes
Peak cpu increase % : ~30%

Incremental backups:

Using zstd compression level 1:

Uncompressed data size : 746.72 GB
Updated CH db size next day(compressed) : 143.61 GB
S3 remote backup size : 11.55 GB (includes mutated data/new parts as well, of last run mutations)
Time taken : 70 seconds
Peak cpu increase % : ~20%

Restore

Commands

Complete Database Restore with schema

clickhouse-backup restore_remote app_name_complete_backup_13022025

For data-only restore (preserving existing schema):

clickhouse-backup restore_remote -d app_name_complete_backup_13022025

Selective Restore

For restoring a specific database with schema:

clickhouse-backup restore_remote -t db_name.* app_name_complete_backup_13022025

For database migration with name mapping (e.g., prod to development):

clickhouse-backup restore_remote -m prod_db_name:development_db_name app_name_complete_backup_13022025

Performance Analysis

Complete restore with schema (zstd level 3):

clickhouse-backup restore_remote app_name_complete_backup_13022025

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 81.37 GB
Time taken : ~25 minutes
Peak cpu increase % : ~30%

Data-only restore (zstd level 3):

clickhouse-backup restore_remote -d app_name_complete_backup_13022025

Almost same as above

Complete restore (zstd level 1):

clickhouse-backup restore_remote app_name_complete_backup_13022025

Uncompressed data size : 743.62 GB
CH db size(compressed) : 142.32 GB
S3 remote backup size : 92.71 GB
Time taken : ~15 minutes
Peak cpu increase % : ~40%

Selective table restore (zstd level 1): 4 mutation-affected tables with schema

clickhouse-backup restore_remote -t db_name.table_pattern* app_name_complete_backup_13022025

Uncompressed data size : 213.1 GB
CH db size(compressed) : 40.7 GB
S3 remote backup size : 17.3 GB
Time taken : ~3 minutes
Peak cpu increase % : ~40%

Trade-offs

Lower compression levels result in faster backups but larger S3 storage size
Higher concurrency speeds up backups but increases resource utilization

For my use case, zstd compression with level 1 provided the best balance for both backup and restore operations, considering that ClickHouse data is already compressed using LZ4 by default.

Limitations

There’s a known issue with backing up materialized views that don’t use the TO table clause. In such cases, ClickHouse creates a separate table with the format .inner_id.XXX (where XXX is the UUID of the materialized view).

Solution: Always use the TO table_name clause when creating materialized views.

Key takeaways

Table-level granular backup and restore capabilities enable efficient targeted operations
Incremental backups significantly reduce both backup duration and storage requirements
Simple data migration to development servers with a single command

Backup#

Commands#

Performance Analysis#

Initial backup:#

Incremental backups:#

Restore#

Commands#

Performance Analysis#

Trade-offs#

Limitations#

Key takeaways#

Backup

Commands

Performance Analysis

Initial backup:

Incremental backups:

Restore

Commands

Performance Analysis

Trade-offs

Limitations

Key takeaways