# sld-filebackups-py A lightweight, zero-dependency Python backup utility that archives files and folders defined in a JSON list, with automatic rotation of old backups. Designed to run as a daily cron job on Linux servers. --- ## Table of Contents - [Features](#features) - [Project Structure](#project-structure) - [How It Works](#how-it-works) - [Installation](#installation) - [Configuration](#configuration) - [config.json](#configjson) - [dir_backups.json](#dir_backupsjson) - [Environment (init.py)](#environment-initpy) - [Usage](#usage) - [Backup Storage Layout](#backup-storage-layout) - [Backup Rotation](#backup-rotation) - [Logging](#logging) - [Running as a Cron Job](#running-as-a-cron-job) - [Requirements](#requirements) - [License](#license) --- ## Features - **Selective backup** — define which paths to back up in a JSON file, each with its own enable/disable flag; no need to touch the code to add or remove entries - **Folder backups** — directories are archived as `.tar.gz` (only the folder name is preserved as the archive root, no absolute path leaking) - **File backups** — single files are compressed as `.gz` - **Skip-if-exists logic** — if a backup for today already exists, it is skipped automatically, making the script safe to call multiple times per day - **Auto-rotation** — after each backup run, old archives beyond a configurable retention count are automatically deleted per subfolder - **Dry-run mode** — preview exactly what rotation would delete, without removing anything - **Structured logging** — always outputs to console (useful for reading cron output); optionally writes to a persistent log file - **Multi-environment support** — switch between `local`, `local2`, and `prod` path configurations in a single file - **Graceful error handling** — malformed JSON entries, missing paths, empty folders, and permission errors are caught and logged without crashing the whole run --- ## Project Structure ``` backups_script/ ├── script.py # Entry point and CLI argument parser ├── functions.py # Core logic: backup, rotation, checks ├── constants.py # Shared state: paths, loaded config, timestamps ├── logger.py # Logging setup (console + optional file handler) ├── init.py # Environment selector (local / prod) ├── config.json # Runtime configuration ├── dir_backups.json # Declarative list of paths to back up └── LICENSE # GNU GPL v3 ``` ### Module Responsibilities | File | Role | |---|---| | `init.py` | Defines `ROOT_DIR_APP` and `ROOT_DIR_BACKUPS` based on the selected environment. Imported first by everything else. | | `constants.py` | Builds all derived paths (backup folder, config paths), loads `config.json` and `dir_backups.json` into memory, captures today's date and current time. | | `logger.py` | Reads `config.json` directly and configures the root Python logger with a `StreamHandler` (always on) and an optional `FileHandler`. | | `functions.py` | Contains all business logic: `default_backup_dir()`, `check_existing_folders()`, `backups_now()`, `autorotate_backups()`, `show_enabled()`. | | `script.py` | Bootstraps logging, then parses CLI arguments and calls the appropriate function(s). With no flags, runs a full backup + rotation. | --- ## How It Works 1. `script.py` calls `setup_logger()`, which reads `config.json` and sets up logging. 2. `default_backup_dir()` ensures the root backup folder and the host-named subfolder exist. 3. `check_existing_folders()` reads `dir_backups.json`, filters for enabled entries (`flag == 1`), verifies each path exists, and classifies it as `"folder"` or `"file"`. Empty or unreadable directories are excluded. 4. `backups_now()` iterates the verified paths: - For **folders**: creates a `_YYYY-MM-DD.tar.gz` archive using Python's `tarfile` module. - For **files**: creates a `_YYYY-MM-DD.gz` compressed copy using `gzip` + `shutil.copyfileobj`. - If the target archive already exists today, the entry is skipped. 5. `autorotate_backups()` scans each immediate subfolder of the host backup directory, sorts `.gz` files by modification time (newest first), and deletes any beyond the `keep_backups` threshold. --- ## Installation No packages to install. The script uses Python's standard library only. ```bash git clone https://gitea.sld-server.org/sld-admin/sld-filebackups-py.git cd sld-filebackups-py ``` Then set your environment and paths in `init.py` and `dir_backups.json`. --- ## Configuration ### `config.json` ```json { "keep_backups": 7, "logs": false, "logs_path": "/home/backups/logs" } ``` | Key | Type | Default | Description | |---|---|---|---| | `keep_backups` | integer | `7` | How many recent backup archives to retain per subfolder. Older ones are deleted by the rotation step. | | `logs` | boolean | `false` | If `true`, a `backup.log` file is written to `logs_path` in addition to console output. | | `logs_path` | string | `~/backups/logs` | Directory where `backup.log` will be created. Created automatically if it does not exist. | > **Note:** Even when `logs` is `false`, all output is still printed to stdout/stderr, which means cron will capture it via mail or redirection as usual. --- ### `dir_backups.json` This is the declarative list of everything to back up. Each entry is a JSON array of exactly three values: ```json [ [ "/absolute/path/to/folder", 1, "BackupName" ], [ "/absolute/path/to/file", 1, "ConfigBackup" ], [ "/path/that/is/disabled", 0, "OldEntry" ] ] ``` | Position | Field | Description | |---|---|---| | 0 | `path` | Absolute path to the file or folder to back up. | | 1 | `enabled` | `1` = include in backup runs. `0` = skip entirely (the entry is parsed but never processed). | | 2 | `name` | A short identifier used as the subfolder name inside the backup destination, and as the prefix of the archive filename. Must be unique across entries. | **Tips:** - To temporarily disable an entry without deleting it, set the flag to `0`. - The `name` field becomes a directory under `//`, so avoid spaces and special characters. - Folders are only backed up if they are non-empty and readable. --- ### Environment (`init.py`) ```python env = "local" # Switch between: "local", "local2", "prod" ``` | Environment | `ROOT_DIR_APP` | `ROOT_DIR_BACKUPS` | |---|---|---| | `local` | `/home/sld-admin/Scrivania/backups_script/` | `/backups/Daily_File_Backups/` | | `local2` | `/home/simo-positive/Desktop/backups_script/` | `/backups/Daily_File_Backups/` | | `prod` | `/opt/sld-backups/` | `/home/backups/backups_root/Daily_File_Backups/` | If an unknown value is set, the script exits immediately with an error. --- ## Usage ```bash # Full backup + auto-rotation (default, no flags needed) python3 script.py # Show which paths are enabled and which are disabled python3 script.py --show # Check whether declared paths exist on disk and print a status report python3 script.py --check # Run backup with verbose debug output python3 script.py --debug # Run only the rotation step (no new backups created) python3 script.py --rotate # Preview what rotation would delete, without actually deleting anything python3 script.py --rotate --dry ``` ### CLI Reference | Flag | Long form | Description | |---|---|---| | `-s` | `--show` | Print enabled and disabled paths from `dir_backups.json`. | | `-d` | `--debug` | Run backup with `debug="on"`, which enables verbose path-checking output. | | `-c` | `--check` | Run `check_existing_folders()` and print a detailed status for each declared path. | | `-r` | `--rotate` | Run `autorotate_backups()` only. Can be combined with `--dry`. | | | `--dry` | Dry-run mode for `--rotate`: logs candidates for deletion but deletes nothing. | --- ## Backup Storage Layout Backups are written under: ``` / └── / ├── Documents/ │ ├── Documents_2026-03-10.tar.gz │ ├── Documents_2026-03-11.tar.gz │ └── Documents_2026-03-12.tar.gz └── ConfigBackup/ ├── ConfigBackup_2026-03-10.gz └── ConfigBackup_2026-03-11.gz ``` - Each entry in `dir_backups.json` gets its own subfolder named after its `name` field. - Archives are named `_YYYY-MM-DD.tar.gz` (folders) or `_YYYY-MM-DD.gz` (files). - The host's hostname is used as a top-level grouping folder, which makes it easy to collect backups from multiple machines into the same root. --- ## Backup Rotation The rotation step (`autorotate_backups`) runs automatically after every backup, or can be triggered manually with `--rotate`. **Logic:** 1. Scans each immediate subfolder of `//`. 2. Finds all `*.gz` files (this covers both `.gz` and `.tar.gz`). 3. Sorts them by modification time, newest first. 4. Keeps the first `keep_backups` (default: 7) and deletes the rest. **Dry-run** (`--rotate --dry`) logs exactly which files would be deleted, with no filesystem changes. Useful for verifying the retention setting before applying it. --- ## Logging All functions use Python's standard `logging` module via a named logger (`__name__`). The root logger is configured by `logger.py` at startup. - **Console output** is always active (via `StreamHandler`), regardless of the `logs` setting. - **File output** is added when `"logs": true` is set in `config.json`. The log file is `/backup.log` and is appended to on each run. - Log format: `YYYY-MM-DD HH:MM:SS [LEVEL] message` --- ## Running as a Cron Job To run a full backup every day at 2:00 AM: ```bash crontab -e ``` ``` 0 2 * * * /usr/bin/python3 /opt/sld-backups/script.py >> /home/backups/logs/cron.log 2>&1 ``` Since the script always writes to stdout, cron output redirection captures the full run log even if file logging is disabled in `config.json`. --- ## Requirements - Python **3.6+** - **No third-party packages** — uses only the standard library: - `tarfile`, `gzip`, `shutil` — archiving and compression - `logging` — structured output - `argparse` — CLI argument parsing - `pathlib` — path handling - `socket` — hostname detection - `json` — configuration loading --- ## License GNU General Public License v3.0 — see [LICENSE](LICENSE) for full terms.