Files
sld-filebackups-py/README.md

258 lines
10 KiB
Markdown

# sld-filebackups-py
A lightweight, zero-dependency Python backup utility that archives files and folders defined in a JSON list, with automatic rotation of old backups. Designed to run as a daily cron job on Linux servers.
---
## Table of Contents
- [Features](#features)
- [Project Structure](#project-structure)
- [How It Works](#how-it-works)
- [Installation](#installation)
- [Configuration](#configuration)
- [config.json](#configjson)
- [dir_backups.json](#dir_backupsjson)
- [Environment (init.py)](#environment-initpy)
- [Usage](#usage)
- [Backup Storage Layout](#backup-storage-layout)
- [Backup Rotation](#backup-rotation)
- [Logging](#logging)
- [Running as a Cron Job](#running-as-a-cron-job)
- [Requirements](#requirements)
- [License](#license)
---
## Features
- **Selective backup** — define which paths to back up in a JSON file, each with its own enable/disable flag; no need to touch the code to add or remove entries
- **Folder backups** — directories are archived as `.tar.gz` (only the folder name is preserved as the archive root, no absolute path leaking)
- **File backups** — single files are compressed as `.gz`
- **Skip-if-exists logic** — if a backup for today already exists, it is skipped automatically, making the script safe to call multiple times per day
- **Auto-rotation** — after each backup run, old archives beyond a configurable retention count are automatically deleted per subfolder
- **Dry-run mode** — preview exactly what rotation would delete, without removing anything
- **Structured logging** — always outputs to console (useful for reading cron output); optionally writes to a persistent log file
- **Multi-environment support** — switch between `local`, `local2`, and `prod` path configurations in a single file
- **Graceful error handling** — malformed JSON entries, missing paths, empty folders, and permission errors are caught and logged without crashing the whole run
---
## Project Structure
```
backups_script/
├── script.py # Entry point and CLI argument parser
├── functions.py # Core logic: backup, rotation, checks
├── constants.py # Shared state: paths, loaded config, timestamps
├── logger.py # Logging setup (console + optional file handler)
├── init.py # Environment selector (local / prod)
├── config.json # Runtime configuration
├── dir_backups.json # Declarative list of paths to back up
└── LICENSE # GNU GPL v3
```
### Module Responsibilities
| File | Role |
|---|---|
| `init.py` | Defines `ROOT_DIR_APP` and `ROOT_DIR_BACKUPS` based on the selected environment. Imported first by everything else. |
| `constants.py` | Builds all derived paths (backup folder, config paths), loads `config.json` and `dir_backups.json` into memory, captures today's date and current time. |
| `logger.py` | Reads `config.json` directly and configures the root Python logger with a `StreamHandler` (always on) and an optional `FileHandler`. |
| `functions.py` | Contains all business logic: `default_backup_dir()`, `check_existing_folders()`, `backups_now()`, `autorotate_backups()`, `show_enabled()`. |
| `script.py` | Bootstraps logging, then parses CLI arguments and calls the appropriate function(s). With no flags, runs a full backup + rotation. |
---
## How It Works
1. `script.py` calls `setup_logger()`, which reads `config.json` and sets up logging.
2. `default_backup_dir()` ensures the root backup folder and the host-named subfolder exist.
3. `check_existing_folders()` reads `dir_backups.json`, filters for enabled entries (`flag == 1`), verifies each path exists, and classifies it as `"folder"` or `"file"`. Empty or unreadable directories are excluded.
4. `backups_now()` iterates the verified paths:
- For **folders**: creates a `<name>_YYYY-MM-DD.tar.gz` archive using Python's `tarfile` module.
- For **files**: creates a `<name>_YYYY-MM-DD.gz` compressed copy using `gzip` + `shutil.copyfileobj`.
- If the target archive already exists today, the entry is skipped.
5. `autorotate_backups()` scans each immediate subfolder of the host backup directory, sorts `.gz` files by modification time (newest first), and deletes any beyond the `keep_backups` threshold.
---
## Installation
No packages to install. The script uses Python's standard library only.
```bash
git clone https://gitea.sld-server.org/sld-admin/sld-filebackups-py.git
cd sld-filebackups-py
```
Then set your environment and paths in `init.py` and `dir_backups.json`.
---
## Configuration
### `config.json`
```json
{
"keep_backups": 7,
"logs": false,
"logs_path": "/home/backups/logs"
}
```
| Key | Type | Default | Description |
|---|---|---|---|
| `keep_backups` | integer | `7` | How many recent backup archives to retain per subfolder. Older ones are deleted by the rotation step. |
| `logs` | boolean | `false` | If `true`, a `backup.log` file is written to `logs_path` in addition to console output. |
| `logs_path` | string | `~/backups/logs` | Directory where `backup.log` will be created. Created automatically if it does not exist. |
> **Note:** Even when `logs` is `false`, all output is still printed to stdout/stderr, which means cron will capture it via mail or redirection as usual.
---
### `dir_backups.json`
This is the declarative list of everything to back up. Each entry is a JSON array of exactly three values:
```json
[
[ "/absolute/path/to/folder", 1, "BackupName" ],
[ "/absolute/path/to/file", 1, "ConfigBackup" ],
[ "/path/that/is/disabled", 0, "OldEntry" ]
]
```
| Position | Field | Description |
|---|---|---|
| 0 | `path` | Absolute path to the file or folder to back up. |
| 1 | `enabled` | `1` = include in backup runs. `0` = skip entirely (the entry is parsed but never processed). |
| 2 | `name` | A short identifier used as the subfolder name inside the backup destination, and as the prefix of the archive filename. Must be unique across entries. |
**Tips:**
- To temporarily disable an entry without deleting it, set the flag to `0`.
- The `name` field becomes a directory under `<ROOT_DIR_BACKUPS>/<hostname>/`, so avoid spaces and special characters.
- Folders are only backed up if they are non-empty and readable.
---
### Environment (`init.py`)
```python
env = "local" # Switch between: "local", "local2", "prod"
```
| Environment | `ROOT_DIR_APP` | `ROOT_DIR_BACKUPS` |
|---|---|---|
| `local` | `/home/sld-admin/Scrivania/backups_script/` | `<ROOT_DIR_APP>/backups/Daily_File_Backups/` |
| `local2` | `/home/simo-positive/Desktop/backups_script/` | `<ROOT_DIR_APP>/backups/Daily_File_Backups/` |
| `prod` | `/opt/sld-backups/` | `/home/backups/backups_root/Daily_File_Backups/` |
If an unknown value is set, the script exits immediately with an error.
---
## Usage
```bash
# Full backup + auto-rotation (default, no flags needed)
python3 script.py
# Show which paths are enabled and which are disabled
python3 script.py --show
# Check whether declared paths exist on disk and print a status report
python3 script.py --check
# Run backup with verbose debug output
python3 script.py --debug
# Run only the rotation step (no new backups created)
python3 script.py --rotate
# Preview what rotation would delete, without actually deleting anything
python3 script.py --rotate --dry
```
### CLI Reference
| Flag | Long form | Description |
|---|---|---|
| `-s` | `--show` | Print enabled and disabled paths from `dir_backups.json`. |
| `-d` | `--debug` | Run backup with `debug="on"`, which enables verbose path-checking output. |
| `-c` | `--check` | Run `check_existing_folders()` and print a detailed status for each declared path. |
| `-r` | `--rotate` | Run `autorotate_backups()` only. Can be combined with `--dry`. |
| | `--dry` | Dry-run mode for `--rotate`: logs candidates for deletion but deletes nothing. |
---
## Backup Storage Layout
Backups are written under:
```
<ROOT_DIR_BACKUPS>/
└── <hostname>/
├── Documents/
│ ├── Documents_2026-03-10.tar.gz
│ ├── Documents_2026-03-11.tar.gz
│ └── Documents_2026-03-12.tar.gz
└── ConfigBackup/
├── ConfigBackup_2026-03-10.gz
└── ConfigBackup_2026-03-11.gz
```
- Each entry in `dir_backups.json` gets its own subfolder named after its `name` field.
- Archives are named `<name>_YYYY-MM-DD.tar.gz` (folders) or `<name>_YYYY-MM-DD.gz` (files).
- The host's hostname is used as a top-level grouping folder, which makes it easy to collect backups from multiple machines into the same root.
---
## Backup Rotation
The rotation step (`autorotate_backups`) runs automatically after every backup, or can be triggered manually with `--rotate`.
**Logic:**
1. Scans each immediate subfolder of `<ROOT_DIR_BACKUPS>/<hostname>/`.
2. Finds all `*.gz` files (this covers both `.gz` and `.tar.gz`).
3. Sorts them by modification time, newest first.
4. Keeps the first `keep_backups` (default: 7) and deletes the rest.
**Dry-run** (`--rotate --dry`) logs exactly which files would be deleted, with no filesystem changes. Useful for verifying the retention setting before applying it.
---
## Logging
All functions use Python's standard `logging` module via a named logger (`__name__`). The root logger is configured by `logger.py` at startup.
- **Console output** is always active (via `StreamHandler`), regardless of the `logs` setting.
- **File output** is added when `"logs": true` is set in `config.json`. The log file is `<logs_path>/backup.log` and is appended to on each run.
- Log format: `YYYY-MM-DD HH:MM:SS [LEVEL] message`
---
## Running as a Cron Job
To run a full backup every day at 2:00 AM:
```bash
crontab -e
```
```
0 2 * * * /usr/bin/python3 /opt/sld-backups/script.py >> /home/backups/logs/cron.log 2>&1
```
Since the script always writes to stdout, cron output redirection captures the full run log even if file logging is disabled in `config.json`.
---
## Requirements
- Python **3.6+**
- **No third-party packages** — uses only the standard library:
- `tarfile`, `gzip`, `shutil` — archiving and compression
- `logging` — structured output
- `argparse` — CLI argument parsing
- `pathlib` — path handling
- `socket` — hostname detection
- `json` — configuration loading
---
## License
GNU General Public License v3.0 — see [LICENSE](LICENSE) for full terms.