Files
sld-filebackups-py/README.md

10 KiB

sld-filebackups-py

A lightweight, zero-dependency Python backup utility that archives files and folders defined in a JSON list, with automatic rotation of old backups. Designed to run as a daily cron job on Linux servers.


Table of Contents


Features

  • Selective backup — define which paths to back up in a JSON file, each with its own enable/disable flag; no need to touch the code to add or remove entries
  • Folder backups — directories are archived as .tar.gz (only the folder name is preserved as the archive root, no absolute path leaking)
  • File backups — single files are compressed as .gz
  • Skip-if-exists logic — if a backup for today already exists, it is skipped automatically, making the script safe to call multiple times per day
  • Auto-rotation — after each backup run, old archives beyond a configurable retention count are automatically deleted per subfolder
  • Dry-run mode — preview exactly what rotation would delete, without removing anything
  • Structured logging — always outputs to console (useful for reading cron output); optionally writes to a persistent log file
  • Multi-environment support — switch between local, local2, and prod path configurations in a single file
  • Graceful error handling — malformed JSON entries, missing paths, empty folders, and permission errors are caught and logged without crashing the whole run

Project Structure

backups_script/
├── script.py           # Entry point and CLI argument parser
├── functions.py        # Core logic: backup, rotation, checks
├── constants.py        # Shared state: paths, loaded config, timestamps
├── logger.py           # Logging setup (console + optional file handler)
├── init.py             # Environment selector (local / prod)
├── config.json         # Runtime configuration
├── dir_backups.json    # Declarative list of paths to back up
└── LICENSE             # GNU GPL v3

Module Responsibilities

File Role
init.py Defines ROOT_DIR_APP and ROOT_DIR_BACKUPS based on the selected environment. Imported first by everything else.
constants.py Builds all derived paths (backup folder, config paths), loads config.json and dir_backups.json into memory, captures today's date and current time.
logger.py Reads config.json directly and configures the root Python logger with a StreamHandler (always on) and an optional FileHandler.
functions.py Contains all business logic: default_backup_dir(), check_existing_folders(), backups_now(), autorotate_backups(), show_enabled().
script.py Bootstraps logging, then parses CLI arguments and calls the appropriate function(s). With no flags, runs a full backup + rotation.

How It Works

  1. script.py calls setup_logger(), which reads config.json and sets up logging.
  2. default_backup_dir() ensures the root backup folder and the host-named subfolder exist.
  3. check_existing_folders() reads dir_backups.json, filters for enabled entries (flag == 1), verifies each path exists, and classifies it as "folder" or "file". Empty or unreadable directories are excluded.
  4. backups_now() iterates the verified paths:
    • For folders: creates a <name>_YYYY-MM-DD.tar.gz archive using Python's tarfile module.
    • For files: creates a <name>_YYYY-MM-DD.gz compressed copy using gzip + shutil.copyfileobj.
    • If the target archive already exists today, the entry is skipped.
  5. autorotate_backups() scans each immediate subfolder of the host backup directory, sorts .gz files by modification time (newest first), and deletes any beyond the keep_backups threshold.

Installation

No packages to install. The script uses Python's standard library only.

git clone https://gitea.sld-server.org/sld-admin/sld-filebackups-py.git
cd sld-filebackups-py

Then set your environment and paths in init.py and dir_backups.json.


Configuration

config.json

{
    "keep_backups": 7,
    "logs": false,
    "logs_path": "/home/backups/logs"
}
Key Type Default Description
keep_backups integer 7 How many recent backup archives to retain per subfolder. Older ones are deleted by the rotation step.
logs boolean false If true, a backup.log file is written to logs_path in addition to console output.
logs_path string ~/backups/logs Directory where backup.log will be created. Created automatically if it does not exist.

Note: Even when logs is false, all output is still printed to stdout/stderr, which means cron will capture it via mail or redirection as usual.


dir_backups.json

This is the declarative list of everything to back up. Each entry is a JSON array of exactly three values:

[
    [ "/absolute/path/to/folder",  1, "BackupName"   ],
    [ "/absolute/path/to/file",    1, "ConfigBackup" ],
    [ "/path/that/is/disabled",    0, "OldEntry"     ]
]
Position Field Description
0 path Absolute path to the file or folder to back up.
1 enabled 1 = include in backup runs. 0 = skip entirely (the entry is parsed but never processed).
2 name A short identifier used as the subfolder name inside the backup destination, and as the prefix of the archive filename. Must be unique across entries.

Tips:

  • To temporarily disable an entry without deleting it, set the flag to 0.
  • The name field becomes a directory under <ROOT_DIR_BACKUPS>/<hostname>/, so avoid spaces and special characters.
  • Folders are only backed up if they are non-empty and readable.

Environment (init.py)

env = "local"   # Switch between: "local", "local2", "prod"
Environment ROOT_DIR_APP ROOT_DIR_BACKUPS
local /home/sld-admin/Scrivania/backups_script/ <ROOT_DIR_APP>/backups/Daily_File_Backups/
local2 /home/simo-positive/Desktop/backups_script/ <ROOT_DIR_APP>/backups/Daily_File_Backups/
prod /opt/sld-backups/ /home/backups/backups_root/Daily_File_Backups/

If an unknown value is set, the script exits immediately with an error.


Usage

# Full backup + auto-rotation (default, no flags needed)
python3 script.py

# Show which paths are enabled and which are disabled
python3 script.py --show

# Check whether declared paths exist on disk and print a status report
python3 script.py --check

# Run backup with verbose debug output
python3 script.py --debug

# Run only the rotation step (no new backups created)
python3 script.py --rotate

# Preview what rotation would delete, without actually deleting anything
python3 script.py --rotate --dry

CLI Reference

Flag Long form Description
-s --show Print enabled and disabled paths from dir_backups.json.
-d --debug Run backup with debug="on", which enables verbose path-checking output.
-c --check Run check_existing_folders() and print a detailed status for each declared path.
-r --rotate Run autorotate_backups() only. Can be combined with --dry.
--dry Dry-run mode for --rotate: logs candidates for deletion but deletes nothing.

Backup Storage Layout

Backups are written under:

<ROOT_DIR_BACKUPS>/
└── <hostname>/
    ├── Documents/
    │   ├── Documents_2026-03-10.tar.gz
    │   ├── Documents_2026-03-11.tar.gz
    │   └── Documents_2026-03-12.tar.gz
    └── ConfigBackup/
        ├── ConfigBackup_2026-03-10.gz
        └── ConfigBackup_2026-03-11.gz
  • Each entry in dir_backups.json gets its own subfolder named after its name field.
  • Archives are named <name>_YYYY-MM-DD.tar.gz (folders) or <name>_YYYY-MM-DD.gz (files).
  • The host's hostname is used as a top-level grouping folder, which makes it easy to collect backups from multiple machines into the same root.

Backup Rotation

The rotation step (autorotate_backups) runs automatically after every backup, or can be triggered manually with --rotate.

Logic:

  1. Scans each immediate subfolder of <ROOT_DIR_BACKUPS>/<hostname>/.
  2. Finds all *.gz files (this covers both .gz and .tar.gz).
  3. Sorts them by modification time, newest first.
  4. Keeps the first keep_backups (default: 7) and deletes the rest.

Dry-run (--rotate --dry) logs exactly which files would be deleted, with no filesystem changes. Useful for verifying the retention setting before applying it.


Logging

All functions use Python's standard logging module via a named logger (__name__). The root logger is configured by logger.py at startup.

  • Console output is always active (via StreamHandler), regardless of the logs setting.
  • File output is added when "logs": true is set in config.json. The log file is <logs_path>/backup.log and is appended to on each run.
  • Log format: YYYY-MM-DD HH:MM:SS [LEVEL] message

Running as a Cron Job

To run a full backup every day at 2:00 AM:

crontab -e
0 2 * * * /usr/bin/python3 /opt/sld-backups/script.py >> /home/backups/logs/cron.log 2>&1

Since the script always writes to stdout, cron output redirection captures the full run log even if file logging is disabled in config.json.


Requirements

  • Python 3.6+
  • No third-party packages — uses only the standard library:
    • tarfile, gzip, shutil — archiving and compression
    • logging — structured output
    • argparse — CLI argument parsing
    • pathlib — path handling
    • socket — hostname detection
    • json — configuration loading

License

GNU General Public License v3.0 — see LICENSE for full terms.