258 lines
10 KiB
Markdown
258 lines
10 KiB
Markdown
# sld-filebackups-py
|
|
|
|
A lightweight, zero-dependency Python backup utility that archives files and folders defined in a JSON list, with automatic rotation of old backups. Designed to run as a daily cron job on Linux servers.
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
- [Features](#features)
|
|
- [Project Structure](#project-structure)
|
|
- [How It Works](#how-it-works)
|
|
- [Installation](#installation)
|
|
- [Configuration](#configuration)
|
|
- [config.json](#configjson)
|
|
- [dir_backups.json](#dir_backupsjson)
|
|
- [Environment (init.py)](#environment-initpy)
|
|
- [Usage](#usage)
|
|
- [Backup Storage Layout](#backup-storage-layout)
|
|
- [Backup Rotation](#backup-rotation)
|
|
- [Logging](#logging)
|
|
- [Running as a Cron Job](#running-as-a-cron-job)
|
|
- [Requirements](#requirements)
|
|
- [License](#license)
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
- **Selective backup** — define which paths to back up in a JSON file, each with its own enable/disable flag; no need to touch the code to add or remove entries
|
|
- **Folder backups** — directories are archived as `.tar.gz` (only the folder name is preserved as the archive root, no absolute path leaking)
|
|
- **File backups** — single files are compressed as `.gz`
|
|
- **Skip-if-exists logic** — if a backup for today already exists, it is skipped automatically, making the script safe to call multiple times per day
|
|
- **Auto-rotation** — after each backup run, old archives beyond a configurable retention count are automatically deleted per subfolder
|
|
- **Dry-run mode** — preview exactly what rotation would delete, without removing anything
|
|
- **Structured logging** — always outputs to console (useful for reading cron output); optionally writes to a persistent log file
|
|
- **Multi-environment support** — switch between `local`, `local2`, and `prod` path configurations in a single file
|
|
- **Graceful error handling** — malformed JSON entries, missing paths, empty folders, and permission errors are caught and logged without crashing the whole run
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
```
|
|
backups_script/
|
|
├── script.py # Entry point and CLI argument parser
|
|
├── functions.py # Core logic: backup, rotation, checks
|
|
├── constants.py # Shared state: paths, loaded config, timestamps
|
|
├── logger.py # Logging setup (console + optional file handler)
|
|
├── init.py # Environment selector (local / prod)
|
|
├── config.json # Runtime configuration
|
|
├── dir_backups.json # Declarative list of paths to back up
|
|
└── LICENSE # GNU GPL v3
|
|
```
|
|
|
|
### Module Responsibilities
|
|
|
|
| File | Role |
|
|
|---|---|
|
|
| `init.py` | Defines `ROOT_DIR_APP` and `ROOT_DIR_BACKUPS` based on the selected environment. Imported first by everything else. |
|
|
| `constants.py` | Builds all derived paths (backup folder, config paths), loads `config.json` and `dir_backups.json` into memory, captures today's date and current time. |
|
|
| `logger.py` | Reads `config.json` directly and configures the root Python logger with a `StreamHandler` (always on) and an optional `FileHandler`. |
|
|
| `functions.py` | Contains all business logic: `default_backup_dir()`, `check_existing_folders()`, `backups_now()`, `autorotate_backups()`, `show_enabled()`. |
|
|
| `script.py` | Bootstraps logging, then parses CLI arguments and calls the appropriate function(s). With no flags, runs a full backup + rotation. |
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
1. `script.py` calls `setup_logger()`, which reads `config.json` and sets up logging.
|
|
2. `default_backup_dir()` ensures the root backup folder and the host-named subfolder exist.
|
|
3. `check_existing_folders()` reads `dir_backups.json`, filters for enabled entries (`flag == 1`), verifies each path exists, and classifies it as `"folder"` or `"file"`. Empty or unreadable directories are excluded.
|
|
4. `backups_now()` iterates the verified paths:
|
|
- For **folders**: creates a `<name>_YYYY-MM-DD.tar.gz` archive using Python's `tarfile` module.
|
|
- For **files**: creates a `<name>_YYYY-MM-DD.gz` compressed copy using `gzip` + `shutil.copyfileobj`.
|
|
- If the target archive already exists today, the entry is skipped.
|
|
5. `autorotate_backups()` scans each immediate subfolder of the host backup directory, sorts `.gz` files by modification time (newest first), and deletes any beyond the `keep_backups` threshold.
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
No packages to install. The script uses Python's standard library only.
|
|
```bash
|
|
git clone https://gitea.sld-server.org/sld-admin/sld-filebackups-py.git
|
|
cd sld-filebackups-py
|
|
```
|
|
|
|
Then set your environment and paths in `init.py` and `dir_backups.json`.
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### `config.json`
|
|
```json
|
|
{
|
|
"keep_backups": 7,
|
|
"logs": false,
|
|
"logs_path": "/home/backups/logs"
|
|
}
|
|
```
|
|
|
|
| Key | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `keep_backups` | integer | `7` | How many recent backup archives to retain per subfolder. Older ones are deleted by the rotation step. |
|
|
| `logs` | boolean | `false` | If `true`, a `backup.log` file is written to `logs_path` in addition to console output. |
|
|
| `logs_path` | string | `~/backups/logs` | Directory where `backup.log` will be created. Created automatically if it does not exist. |
|
|
|
|
> **Note:** Even when `logs` is `false`, all output is still printed to stdout/stderr, which means cron will capture it via mail or redirection as usual.
|
|
|
|
---
|
|
|
|
### `dir_backups.json`
|
|
|
|
This is the declarative list of everything to back up. Each entry is a JSON array of exactly three values:
|
|
```json
|
|
[
|
|
[ "/absolute/path/to/folder", 1, "BackupName" ],
|
|
[ "/absolute/path/to/file", 1, "ConfigBackup" ],
|
|
[ "/path/that/is/disabled", 0, "OldEntry" ]
|
|
]
|
|
```
|
|
|
|
| Position | Field | Description |
|
|
|---|---|---|
|
|
| 0 | `path` | Absolute path to the file or folder to back up. |
|
|
| 1 | `enabled` | `1` = include in backup runs. `0` = skip entirely (the entry is parsed but never processed). |
|
|
| 2 | `name` | A short identifier used as the subfolder name inside the backup destination, and as the prefix of the archive filename. Must be unique across entries. |
|
|
|
|
**Tips:**
|
|
- To temporarily disable an entry without deleting it, set the flag to `0`.
|
|
- The `name` field becomes a directory under `<ROOT_DIR_BACKUPS>/<hostname>/`, so avoid spaces and special characters.
|
|
- Folders are only backed up if they are non-empty and readable.
|
|
|
|
---
|
|
|
|
### Environment (`init.py`)
|
|
```python
|
|
env = "local" # Switch between: "local", "local2", "prod"
|
|
```
|
|
|
|
| Environment | `ROOT_DIR_APP` | `ROOT_DIR_BACKUPS` |
|
|
|---|---|---|
|
|
| `local` | `/home/sld-admin/Scrivania/backups_script/` | `<ROOT_DIR_APP>/backups/Daily_File_Backups/` |
|
|
| `local2` | `/home/simo-positive/Desktop/backups_script/` | `<ROOT_DIR_APP>/backups/Daily_File_Backups/` |
|
|
| `prod` | `/opt/sld-backups/` | `/home/backups/backups_root/Daily_File_Backups/` |
|
|
|
|
If an unknown value is set, the script exits immediately with an error.
|
|
|
|
---
|
|
|
|
## Usage
|
|
```bash
|
|
# Full backup + auto-rotation (default, no flags needed)
|
|
python3 script.py
|
|
|
|
# Show which paths are enabled and which are disabled
|
|
python3 script.py --show
|
|
|
|
# Check whether declared paths exist on disk and print a status report
|
|
python3 script.py --check
|
|
|
|
# Run backup with verbose debug output
|
|
python3 script.py --debug
|
|
|
|
# Run only the rotation step (no new backups created)
|
|
python3 script.py --rotate
|
|
|
|
# Preview what rotation would delete, without actually deleting anything
|
|
python3 script.py --rotate --dry
|
|
```
|
|
|
|
### CLI Reference
|
|
|
|
| Flag | Long form | Description |
|
|
|---|---|---|
|
|
| `-s` | `--show` | Print enabled and disabled paths from `dir_backups.json`. |
|
|
| `-d` | `--debug` | Run backup with `debug="on"`, which enables verbose path-checking output. |
|
|
| `-c` | `--check` | Run `check_existing_folders()` and print a detailed status for each declared path. |
|
|
| `-r` | `--rotate` | Run `autorotate_backups()` only. Can be combined with `--dry`. |
|
|
| | `--dry` | Dry-run mode for `--rotate`: logs candidates for deletion but deletes nothing. |
|
|
|
|
---
|
|
|
|
## Backup Storage Layout
|
|
|
|
Backups are written under:
|
|
```
|
|
<ROOT_DIR_BACKUPS>/
|
|
└── <hostname>/
|
|
├── Documents/
|
|
│ ├── Documents_2026-03-10.tar.gz
|
|
│ ├── Documents_2026-03-11.tar.gz
|
|
│ └── Documents_2026-03-12.tar.gz
|
|
└── ConfigBackup/
|
|
├── ConfigBackup_2026-03-10.gz
|
|
└── ConfigBackup_2026-03-11.gz
|
|
```
|
|
|
|
- Each entry in `dir_backups.json` gets its own subfolder named after its `name` field.
|
|
- Archives are named `<name>_YYYY-MM-DD.tar.gz` (folders) or `<name>_YYYY-MM-DD.gz` (files).
|
|
- The host's hostname is used as a top-level grouping folder, which makes it easy to collect backups from multiple machines into the same root.
|
|
|
|
---
|
|
|
|
## Backup Rotation
|
|
|
|
The rotation step (`autorotate_backups`) runs automatically after every backup, or can be triggered manually with `--rotate`.
|
|
|
|
**Logic:**
|
|
1. Scans each immediate subfolder of `<ROOT_DIR_BACKUPS>/<hostname>/`.
|
|
2. Finds all `*.gz` files (this covers both `.gz` and `.tar.gz`).
|
|
3. Sorts them by modification time, newest first.
|
|
4. Keeps the first `keep_backups` (default: 7) and deletes the rest.
|
|
|
|
**Dry-run** (`--rotate --dry`) logs exactly which files would be deleted, with no filesystem changes. Useful for verifying the retention setting before applying it.
|
|
|
|
---
|
|
|
|
## Logging
|
|
|
|
All functions use Python's standard `logging` module via a named logger (`__name__`). The root logger is configured by `logger.py` at startup.
|
|
|
|
- **Console output** is always active (via `StreamHandler`), regardless of the `logs` setting.
|
|
- **File output** is added when `"logs": true` is set in `config.json`. The log file is `<logs_path>/backup.log` and is appended to on each run.
|
|
- Log format: `YYYY-MM-DD HH:MM:SS [LEVEL] message`
|
|
|
|
---
|
|
|
|
## Running as a Cron Job
|
|
|
|
To run a full backup every day at 2:00 AM:
|
|
```bash
|
|
crontab -e
|
|
```
|
|
```
|
|
0 2 * * * /usr/bin/python3 /opt/sld-backups/script.py >> /home/backups/logs/cron.log 2>&1
|
|
```
|
|
|
|
Since the script always writes to stdout, cron output redirection captures the full run log even if file logging is disabled in `config.json`.
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
- Python **3.6+**
|
|
- **No third-party packages** — uses only the standard library:
|
|
- `tarfile`, `gzip`, `shutil` — archiving and compression
|
|
- `logging` — structured output
|
|
- `argparse` — CLI argument parsing
|
|
- `pathlib` — path handling
|
|
- `socket` — hostname detection
|
|
- `json` — configuration loading
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
GNU General Public License v3.0 — see [LICENSE](LICENSE) for full terms. |