diff --git a/docs/future/backups.md b/docs/future/backups.md deleted file mode 100644 index 5a42892..0000000 --- a/docs/future/backups.md +++ /dev/null @@ -1,2529 +0,0 @@ -# Wild Cloud Backup System - Complete Implementation Guide - -**Date:** 2025-11-26 -**Status:** 📋 READY FOR IMPLEMENTATION -**Estimated Effort:** Phase 1: 2-3 days | Phase 2: 5-7 days | Phase 3: 3-5 days - ---- - -## Table of Contents - -1. [Executive Summary](#executive-summary) -2. [Background and Context](#background-and-context) -3. [Problem Analysis](#problem-analysis) -4. [Architecture Overview](#architecture-overview) -5. [Configuration Design](#configuration-design) -6. [Phase 1: Core Backup Fix](#phase-1-core-backup-fix) -7. [Phase 2: Restic Integration](#phase-2-restic-integration) -8. [Phase 3: Restore from Restic](#phase-3-restore-from-restic) -9. [API Specifications](#api-specifications) -10. [Web UI Design](#web-ui-design) -11. [Testing Strategy](#testing-strategy) -12. [Deployment Guide](#deployment-guide) -13. [Task Breakdown](#task-breakdown) -14. [Success Criteria](#success-criteria) - ---- - -## Executive Summary - -### Current State -App backups are completely broken - they create only metadata files (`backup.json`) without any actual backup data: -- ❌ No database dump files (`.sql`, `.dump`) -- ❌ No PVC archive files (`.tar.gz`) -- ❌ Users cannot restore from these "backups" -- ✅ Cluster backups work correctly (different code path) - -### Root Cause -Database detection uses pod label-based discovery (`app=gitea` in `postgres` namespace), but database pods are shared infrastructure labeled `app=postgres`. Detection always returns empty, so no backups are created. - -### Why This Matters -- **Scale**: Applications like Immich may host terabyte-scale photo libraries -- **Storage**: Wild Central devices may not have sufficient local storage -- **Flexibility**: Need flexible destinations: local, NFS, S3, Backblaze B2, SFTP, etc. -- **Deduplication**: Critical for TB-scale data (60-80% space savings) - -### Solution: Three-Phase Approach - -**Phase 1 (CRITICAL - 2-3 days)**: Fix broken app backups -- Manifest-based database detection (declarative) -- kubectl exec for database dumps -- PVC discovery and backup -- Store files locally in staging directory - -**Phase 2 (HIGH PRIORITY - 5-7 days)**: Restic integration -- Upload staged files to restic repository -- Configuration via config.yaml and web UI -- Support multiple backends (local, S3, B2, SFTP) -- Repository initialization and testing - -**Phase 3 (MEDIUM PRIORITY - 3-5 days)**: Restore from restic -- List available snapshots -- Restore from any snapshot -- Database and PVC restoration -- Web UI for restore operations - ---- - -## Background and Context - -### Project Philosophy - -Wild Cloud follows strict KISS/YAGNI principles: -- **KISS**: Keep implementations as simple as possible -- **YAGNI**: Build only what's needed now, not speculative features -- **No future-proofing**: Let complexity emerge from actual requirements -- **Trust in emergence**: Start simple, enhance when requirements proven - -### Key Design Decisions - -1. **Manifest-based detection**: Read app dependencies from `manifest.yaml` (declarative), not runtime pod discovery -2. **kubectl exec approach**: Use standard Kubernetes operations for dumps and tar archives -3. **Restic for scale**: Use battle-tested restic tool for TB-scale data and flexible backends -4. **Phased implementation**: Fix core bugs first, add features incrementally - -### Why Restic? - -**Justified by actual requirements** (not premature optimization): -- **Scale**: Handle TB-scale data (Immich with terabytes of photos) -- **Flexibility**: Multiple backends (local, S3, B2, SFTP, Azure, GCS) -- **Efficiency**: 60-80% space savings via deduplication -- **Security**: Built-in AES-256 encryption -- **Reliability**: Battle-tested, widely adopted -- **Incremental**: Only backup changed blocks - ---- - -## Problem Analysis - -### Critical Bug: App Backups Create No Files - -**Evidence** from `/home/payne/repos/wild-cloud-dev/.working/in-progress-fix.md`: - -``` -Backup structure: -apps/ -└── gitea/ - └── 20241124T143022Z/ - └── backup.json ← Only this file exists! - -Expected structure: -apps/ -└── gitea/ - └── 20241124T143022Z/ - ├── backup.json - ├── postgres.sql ← Missing! - └── data.tar.gz ← Missing! -``` - -### Root Cause Analysis - -**File**: `wild-central-api/internal/backup/backup.go` (lines 544-569) - -```go -func (m *Manager) detectDatabaseType(ctx context.Context, namespace, appLabel string) (string, error) { - // This looks for pods with label "app=gitea" in namespace "postgres" - // But database pods are labeled "app=postgres" in namespace "postgres" - // This ALWAYS returns empty result! - - cmd := exec.CommandContext(ctx, "kubectl", "get", "pods", - "-n", namespace, - "-l", fmt.Sprintf("app=%s", appLabel), // ← Wrong label! - "-o", "jsonpath={.items[0].metadata.name}") - - output, err := cmd.Output() - if err != nil || len(output) == 0 { - return "", nil // ← Returns empty, no backup created - } - // ... -} -``` - -**Why It's Broken**: -1. Gitea backup tries to find pod with label `app=gitea` in namespace `postgres` -2. But PostgreSQL pod is labeled `app=postgres` in namespace `postgres` -3. Detection always fails → no database dump created -4. Same problem for PVC detection → no PVC archive created -5. Only `backup.json` metadata file is written - -### Why Cluster Backups Work - -Cluster backups don't use app-specific detection: -- Directly use `kubectl get` to find etcd pods -- Use hardcoded paths for config files -- Don't rely on app-based pod labels -- Actually create `.tar.gz` files with real data - ---- - -## Architecture Overview - -### System Components - -``` -┌─────────────────────────────────────────────────────────┐ -│ Wild Cloud Backup System │ -├─────────────────────────────────────────────────────────┤ -│ │ -│ ┌─ Web UI (wild-web-app) ─────────────────────┐ │ -│ │ - Backup configuration form │ │ -│ │ - Repository status display │ │ -│ │ - Backup creation/restore UI │ │ -│ └──────────────────┬───────────────────────────┘ │ -│ │ REST API │ -│ ┌─ API Layer (wild-central-api) ───────────────┐ │ -│ │ - Backup configuration endpoints │ │ -│ │ - Backup/restore operation handlers │ │ -│ │ - Restic integration layer │ │ -│ └──────────────────┬───────────────────────────┘ │ -│ │ │ -│ ┌─ Backup Engine ────────────────────────────┐ │ -│ │ - Manifest parser │ │ -│ │ - Database backup (kubectl exec pg_dump) │ │ -│ │ - PVC backup (kubectl exec tar) │ │ -│ │ - Restic upload (Phase 2) │ │ -│ └──────────────────┬───────────────────────────┘ │ -│ │ │ -│ ┌─ Storage Layer ────────────────────────────┐ │ -│ │ Phase 1: Local staging directory │ │ -│ │ Phase 2: Restic repository (local/remote) │ │ -│ └─────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────┘ -``` - -### Data Flow - -**Phase 1 (Local Staging)**: -``` -User clicks "Backup" → API Handler - ↓ - Read manifest.yaml (detect databases) - ↓ - kubectl exec pg_dump → postgres.sql - ↓ - kubectl exec tar → pvc-data.tar.gz - ↓ - Save to /var/lib/wild-central/backup-staging/ - ↓ - Write backup.json metadata -``` - -**Phase 2 (Restic Upload)**: -``` -[Same as Phase 1] → Local staging files created - ↓ - restic backup - ↓ - Upload to repository (S3/B2/local/etc) - ↓ - Clean staging directory - ↓ - Update metadata with snapshot ID -``` - -**Phase 3 (Restore)**: -``` -User selects snapshot → restic restore - ↓ - Download to staging directory - ↓ - kubectl exec psql < postgres.sql - ↓ - kubectl cp tar file → pod - ↓ - kubectl exec tar -xzf → restore PVC data -``` - ---- - -## Configuration Design - -### Schema: config.yaml - -```yaml -cloud: - domain: "wildcloud.local" - dns: - ip: "192.168.8.50" - - backup: - # Restic repository location (native restic URI format) - repository: "/mnt/backups/wild-cloud" # or "s3:bucket" or "sftp:user@host:/path" - - # Local staging directory (always on Wild Central filesystem) - staging: "/var/lib/wild-central/backup-staging" - - # Retention policy (restic forget flags) - retention: - keepDaily: 7 - keepWeekly: 4 - keepMonthly: 6 - keepYearly: 2 - - # Backend-specific configuration (optional, backend-dependent) - backend: - # For S3-compatible backends (B2, Wasabi, MinIO) - endpoint: "s3.us-west-002.backblazeb2.com" - region: "us-west-002" - - # For SFTP - port: 22 -``` - -### Schema: secrets.yaml - -```yaml -cloud: - backup: - # Restic repository encryption password - password: "strong-encryption-password" - - # Backend credentials (conditional on backend type) - credentials: - # For S3/B2/S3-compatible (auto-detected from repository prefix) - s3: - accessKeyId: "KEY_ID" - secretAccessKey: "SECRET_KEY" - - # For SFTP - sftp: - password: "ssh-password" - # OR - privateKey: | - -----BEGIN OPENSSH PRIVATE KEY----- - ... - -----END OPENSSH PRIVATE KEY----- - - # For Azure - azure: - accountName: "account" - accountKey: "key" - - # For Google Cloud - gcs: - projectId: "project-id" - serviceAccountKey: | - { "type": "service_account", ... } -``` - -### Configuration Examples - -#### Example 1: Local Testing - -**config.yaml**: -```yaml -cloud: - backup: - repository: "/mnt/external-drive/wild-cloud-backups" - staging: "/var/lib/wild-central/backup-staging" - retention: - keepDaily: 7 - keepWeekly: 4 - keepMonthly: 6 -``` - -**secrets.yaml**: -```yaml -cloud: - backup: - password: "test-backup-password-123" -``` - -#### Example 2: Backblaze B2 - -**config.yaml**: -```yaml -cloud: - backup: - repository: "b2:wild-cloud-backups" - staging: "/var/lib/wild-central/backup-staging" - retention: - keepDaily: 7 - keepWeekly: 4 - keepMonthly: 6 - backend: - endpoint: "s3.us-west-002.backblazeb2.com" - region: "us-west-002" -``` - -**secrets.yaml**: -```yaml -cloud: - backup: - password: "strong-encryption-password" - credentials: - s3: - accessKeyId: "0020123456789abcdef" - secretAccessKey: "K002abcdefghijklmnop" -``` - -#### Example 3: AWS S3 - -**config.yaml**: -```yaml -cloud: - backup: - repository: "s3:s3.amazonaws.com/my-wild-cloud-backups" - staging: "/var/lib/wild-central/backup-staging" - retention: - keepDaily: 14 - keepWeekly: 8 - keepMonthly: 12 - backend: - region: "us-east-1" -``` - -**secrets.yaml**: -```yaml -cloud: - backup: - password: "prod-encryption-password" - credentials: - s3: - accessKeyId: "AKIAIOSFODNN7EXAMPLE" - secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCY" -``` - -#### Example 4: SFTP Remote Server - -**config.yaml**: -```yaml -cloud: - backup: - repository: "sftp:backup-user@backup.example.com:/wild-cloud-backups" - staging: "/var/lib/wild-central/backup-staging" - retention: - keepDaily: 7 - keepWeekly: 4 - keepMonthly: 6 - backend: - port: 2222 -``` - -**secrets.yaml**: -```yaml -cloud: - backup: - password: "restic-repo-password" - credentials: - sftp: - privateKey: | - -----BEGIN OPENSSH PRIVATE KEY----- - ... - -----END OPENSSH PRIVATE KEY----- -``` - -#### Example 5: NFS/SMB Mount (as Local Path) - -**config.yaml**: -```yaml -cloud: - backup: - repository: "/mnt/nas-backups/wild-cloud" # NFS mounted via OS - staging: "/var/lib/wild-central/backup-staging" - retention: - keepDaily: 7 - keepWeekly: 4 - keepMonthly: 6 -``` - -**secrets.yaml**: -```yaml -cloud: - backup: - password: "backup-encryption-password" -``` - -### Backend Detection Logic - -```go -func DetectBackendType(repository string) string { - if strings.HasPrefix(repository, "/") { - return "local" - } else if strings.HasPrefix(repository, "sftp:") { - return "sftp" - } else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") { - return "s3" - } else if strings.HasPrefix(repository, "azure:") { - return "azure" - } else if strings.HasPrefix(repository, "gs:") { - return "gcs" - } else if strings.HasPrefix(repository, "rclone:") { - return "rclone" - } - return "unknown" -} -``` - -### Environment Variable Mapping - -```go -func BuildResticEnv(config BackupConfig, secrets BackupSecrets) map[string]string { - env := map[string]string{ - "RESTIC_REPOSITORY": config.Repository, - "RESTIC_PASSWORD": secrets.Password, - } - - backendType := DetectBackendType(config.Repository) - - switch backendType { - case "s3": - env["AWS_ACCESS_KEY_ID"] = secrets.Credentials.S3.AccessKeyID - env["AWS_SECRET_ACCESS_KEY"] = secrets.Credentials.S3.SecretAccessKey - - if config.Backend.Endpoint != "" { - env["AWS_S3_ENDPOINT"] = config.Backend.Endpoint - } - if config.Backend.Region != "" { - env["AWS_DEFAULT_REGION"] = config.Backend.Region - } - - case "sftp": - if secrets.Credentials.SFTP.Password != "" { - env["RESTIC_SFTP_PASSWORD"] = secrets.Credentials.SFTP.Password - } - // SSH key handling done via temp file - - case "azure": - env["AZURE_ACCOUNT_NAME"] = secrets.Credentials.Azure.AccountName - env["AZURE_ACCOUNT_KEY"] = secrets.Credentials.Azure.AccountKey - - case "gcs": - // Write service account key to temp file, set GOOGLE_APPLICATION_CREDENTIALS - } - - return env -} -``` - ---- - -## Phase 1: Core Backup Fix - -### Goal -Fix critical bugs and create actual backup files (no restic yet). - -### Priority -🔴 **CRITICAL** - Users cannot restore from current backups - -### Timeline -2-3 days - -### Overview - -Replace broken pod label-based detection with manifest-based detection. Use kubectl exec to create actual database dumps and PVC archives. - -### Task 1.1: Implement Manifest-Based Database Detection - -**File**: `wild-central-api/internal/backup/backup.go` - -**Add New Structures**: -```go -type AppDependencies struct { - HasPostgres bool - HasMySQL bool - HasRedis bool -} -``` - -**Implement Detection Function**: -```go -func (m *Manager) getAppDependencies(appName string) (*AppDependencies, error) { - manifestPath := filepath.Join(m.directoryPath, appName, "manifest.yaml") - - manifest, err := directory.LoadManifest(manifestPath) - if err != nil { - return nil, fmt.Errorf("failed to load manifest: %w", err) - } - - deps := &AppDependencies{ - HasPostgres: contains(manifest.Requires, "postgres"), - HasMySQL: contains(manifest.Requires, "mysql"), - HasRedis: contains(manifest.Requires, "redis"), - } - - return deps, nil -} - -func contains(slice []string, item string) bool { - for _, s := range slice { - if s == item { - return true - } - } - return false -} -``` - -**Changes Required**: -- Add import: `"github.com/wild-cloud/wild-central/daemon/internal/directory"` -- Remove old `detectDatabaseType()` function (lines 544-569) - -**Acceptance Criteria**: -- Reads manifest.yaml for app -- Correctly identifies postgres dependency -- Correctly identifies mysql dependency -- Returns error if manifest not found -- Unit test: parse manifest with postgres -- Unit test: parse manifest without databases - -**Estimated Effort**: 2 hours - ---- - -### Task 1.2: Implement PostgreSQL Backup via kubectl exec - -**File**: `wild-central-api/internal/backup/backup.go` - -**Implementation**: -```go -func (m *Manager) backupPostgres(ctx context.Context, instanceName, appName, backupDir string) (string, error) { - dbName := appName // Database name convention - - // Find postgres pod in postgres namespace - podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres") - if err != nil { - return "", fmt.Errorf("postgres pod not found: %w", err) - } - - // Execute pg_dump - dumpFile := filepath.Join(backupDir, "postgres.sql") - cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres", - podName, "--", "pg_dump", "-U", "postgres", dbName) - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("pg_dump failed: %w", err) - } - - // Write dump to file - if err := os.WriteFile(dumpFile, output, 0600); err != nil { - return "", fmt.Errorf("failed to write dump: %w", err) - } - - return dumpFile, nil -} - -// Helper function to find pod by label -func (m *Manager) findPodInNamespace(ctx context.Context, namespace, labelSelector string) (string, error) { - cmd := exec.CommandContext(ctx, "kubectl", "get", "pods", - "-n", namespace, - "-l", labelSelector, - "-o", "jsonpath={.items[0].metadata.name}") - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("kubectl get pods failed: %w", err) - } - - podName := strings.TrimSpace(string(output)) - if podName == "" { - return "", fmt.Errorf("no pod found with label %s in namespace %s", labelSelector, namespace) - } - - return podName, nil -} -``` - -**Acceptance Criteria**: -- Finds postgres pod correctly -- Executes pg_dump successfully -- Creates .sql file with actual data -- Handles errors gracefully -- Integration test: backup Gitea database - -**Estimated Effort**: 3 hours - ---- - -### Task 1.3: Implement MySQL Backup via kubectl exec - -**File**: `wild-central-api/internal/backup/backup.go` - -**Implementation**: -```go -func (m *Manager) backupMySQL(ctx context.Context, instanceName, appName, backupDir string) (string, error) { - dbName := appName - - // Find mysql pod - podName, err := m.findPodInNamespace(ctx, "mysql", "app=mysql") - if err != nil { - return "", fmt.Errorf("mysql pod not found: %w", err) - } - - // Get MySQL root password from secret - password, err := m.getMySQLPassword(ctx) - if err != nil { - return "", fmt.Errorf("failed to get mysql password: %w", err) - } - - // Execute mysqldump - dumpFile := filepath.Join(backupDir, "mysql.sql") - cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "mysql", - podName, "--", "mysqldump", - "-uroot", - fmt.Sprintf("-p%s", password), - "--single-transaction", - "--routines", - "--triggers", - dbName) - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("mysqldump failed: %w", err) - } - - if err := os.WriteFile(dumpFile, output, 0600); err != nil { - return "", fmt.Errorf("failed to write dump: %w", err) - } - - return dumpFile, nil -} - -func (m *Manager) getMySQLPassword(ctx context.Context) (string, error) { - cmd := exec.CommandContext(ctx, "kubectl", "get", "secret", - "-n", "mysql", - "mysql-root-password", - "-o", "jsonpath={.data.password}") - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("failed to get secret: %w", err) - } - - // Decode base64 - decoded, err := base64.StdEncoding.DecodeString(string(output)) - if err != nil { - return "", fmt.Errorf("failed to decode password: %w", err) - } - - return string(decoded), nil -} -``` - -**Acceptance Criteria**: -- Finds mysql pod correctly -- Retrieves password from secret -- Executes mysqldump successfully -- Creates .sql file with actual data -- Handles errors gracefully - -**Estimated Effort**: 3 hours - ---- - -### Task 1.4: Implement PVC Discovery and Backup - -**File**: `wild-central-api/internal/backup/backup.go` - -**Implementation**: -```go -func (m *Manager) findAppPVCs(ctx context.Context, appName string) ([]string, error) { - // Get namespace for app (convention: app name) - namespace := appName - - cmd := exec.CommandContext(ctx, "kubectl", "get", "pvc", - "-n", namespace, - "-o", "jsonpath={.items[*].metadata.name}") - - output, err := cmd.Output() - if err != nil { - return nil, fmt.Errorf("kubectl get pvc failed: %w", err) - } - - pvcNames := strings.Fields(string(output)) - return pvcNames, nil -} - -func (m *Manager) backupPVC(ctx context.Context, namespace, pvcName, backupDir string) (string, error) { - // Find pod using this PVC - podName, err := m.findPodUsingPVC(ctx, namespace, pvcName) - if err != nil { - return "", fmt.Errorf("no pod found using PVC %s: %w", pvcName, err) - } - - // Get mount path for PVC - mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName) - if err != nil { - return "", fmt.Errorf("failed to get mount path: %w", err) - } - - // Create tar archive of PVC data - tarFile := filepath.Join(backupDir, fmt.Sprintf("%s.tar.gz", pvcName)) - cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace, - podName, "--", "tar", "czf", "-", "-C", mountPath, ".") - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("tar command failed: %w", err) - } - - if err := os.WriteFile(tarFile, output, 0600); err != nil { - return "", fmt.Errorf("failed to write tar file: %w", err) - } - - return tarFile, nil -} - -func (m *Manager) findPodUsingPVC(ctx context.Context, namespace, pvcName string) (string, error) { - cmd := exec.CommandContext(ctx, "kubectl", "get", "pods", - "-n", namespace, - "-o", "json") - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("kubectl get pods failed: %w", err) - } - - // Parse JSON to find pod using this PVC - var podList struct { - Items []struct { - Metadata struct { - Name string `json:"name"` - } `json:"metadata"` - Spec struct { - Volumes []struct { - PersistentVolumeClaim *struct { - ClaimName string `json:"claimName"` - } `json:"persistentVolumeClaim"` - } `json:"volumes"` - } `json:"spec"` - } `json:"items"` - } - - if err := json.Unmarshal(output, &podList); err != nil { - return "", fmt.Errorf("failed to parse pod list: %w", err) - } - - for _, pod := range podList.Items { - for _, volume := range pod.Spec.Volumes { - if volume.PersistentVolumeClaim != nil && - volume.PersistentVolumeClaim.ClaimName == pvcName { - return pod.Metadata.Name, nil - } - } - } - - return "", fmt.Errorf("no pod found using PVC %s", pvcName) -} - -func (m *Manager) getPVCMountPath(ctx context.Context, namespace, podName, pvcName string) (string, error) { - cmd := exec.CommandContext(ctx, "kubectl", "get", "pod", - "-n", namespace, - podName, - "-o", "json") - - output, err := cmd.Output() - if err != nil { - return "", fmt.Errorf("kubectl get pod failed: %w", err) - } - - var pod struct { - Spec struct { - Volumes []struct { - Name string `json:"name"` - PersistentVolumeClaim *struct { - ClaimName string `json:"claimName"` - } `json:"persistentVolumeClaim"` - } `json:"volumes"` - Containers []struct { - VolumeMounts []struct { - Name string `json:"name"` - MountPath string `json:"mountPath"` - } `json:"volumeMounts"` - } `json:"containers"` - } `json:"spec"` - } - - if err := json.Unmarshal(output, &pod); err != nil { - return "", fmt.Errorf("failed to parse pod: %w", err) - } - - // Find volume name for PVC - var volumeName string - for _, volume := range pod.Spec.Volumes { - if volume.PersistentVolumeClaim != nil && - volume.PersistentVolumeClaim.ClaimName == pvcName { - volumeName = volume.Name - break - } - } - - if volumeName == "" { - return "", fmt.Errorf("PVC %s not found in pod volumes", pvcName) - } - - // Find mount path for volume - for _, container := range pod.Spec.Containers { - for _, mount := range container.VolumeMounts { - if mount.Name == volumeName { - return mount.MountPath, nil - } - } - } - - return "", fmt.Errorf("mount path not found for volume %s", volumeName) -} -``` - -**Acceptance Criteria**: -- Discovers PVCs in app namespace -- Finds pod using PVC -- Gets correct mount path -- Creates tar.gz with actual data -- Handles multiple PVCs -- Integration test: backup Immich PVCs - -**Estimated Effort**: 4 hours - ---- - -### Task 1.5: Update BackupApp Flow - -**File**: `wild-central-api/internal/backup/backup.go` - -**Replace BackupApp function** (complete rewrite): - -```go -func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) { - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute) - defer cancel() - - // Create timestamped backup directory - timestamp := time.Now().UTC().Format("20060102T150405Z") - stagingDir := filepath.Join(m.dataDir, "instances", instanceName, "backups", "staging") - backupDir := filepath.Join(stagingDir, "apps", appName, timestamp) - - if err := os.MkdirAll(backupDir, 0755); err != nil { - return nil, fmt.Errorf("failed to create backup directory: %w", err) - } - - // Initialize backup info with in_progress status - info := &BackupInfo{ - Type: "app", - AppName: appName, - Status: "in_progress", - CreatedAt: time.Now().UTC().Format(time.RFC3339), - Files: []string{}, - } - - // Save initial metadata - if err := m.saveBackupMetadata(backupDir, info); err != nil { - return nil, fmt.Errorf("failed to save initial metadata: %w", err) - } - - // Read app dependencies from manifest - deps, err := m.getAppDependencies(appName) - if err != nil { - info.Status = "failed" - info.Error = fmt.Sprintf("Failed to read manifest: %v", err) - m.saveBackupMetadata(backupDir, info) - return info, err - } - - var backupFiles []string - - // Backup PostgreSQL if required - if deps.HasPostgres { - file, err := m.backupPostgres(ctx, instanceName, appName, backupDir) - if err != nil { - info.Status = "failed" - info.Error = fmt.Sprintf("PostgreSQL backup failed: %v", err) - m.saveBackupMetadata(backupDir, info) - return info, err - } - backupFiles = append(backupFiles, file) - } - - // Backup MySQL if required - if deps.HasMySQL { - file, err := m.backupMySQL(ctx, instanceName, appName, backupDir) - if err != nil { - info.Status = "failed" - info.Error = fmt.Sprintf("MySQL backup failed: %v", err) - m.saveBackupMetadata(backupDir, info) - return info, err - } - backupFiles = append(backupFiles, file) - } - - // Discover and backup PVCs - pvcNames, err := m.findAppPVCs(ctx, appName) - if err != nil { - // Log warning but don't fail if no PVCs found - log.Printf("Warning: failed to find PVCs for %s: %v", appName, err) - } else { - for _, pvcName := range pvcNames { - file, err := m.backupPVC(ctx, appName, pvcName, backupDir) - if err != nil { - log.Printf("Warning: failed to backup PVC %s: %v", pvcName, err) - continue - } - backupFiles = append(backupFiles, file) - } - } - - // Calculate total backup size - var totalSize int64 - for _, file := range backupFiles { - stat, err := os.Stat(file) - if err == nil { - totalSize += stat.Size() - } - } - - // Update final metadata - info.Status = "completed" - info.Files = backupFiles - info.Size = totalSize - info.Error = "" - - if err := m.saveBackupMetadata(backupDir, info); err != nil { - return info, fmt.Errorf("failed to save final metadata: %w", err) - } - - return info, nil -} - -func (m *Manager) saveBackupMetadata(backupDir string, info *BackupInfo) error { - metadataFile := filepath.Join(backupDir, "backup.json") - data, err := json.MarshalIndent(info, "", " ") - if err != nil { - return fmt.Errorf("failed to marshal metadata: %w", err) - } - return os.WriteFile(metadataFile, data, 0644) -} -``` - -**Acceptance Criteria**: -- Creates timestamped backup directories -- Reads manifest to detect dependencies -- Backs up databases if present -- Backs up PVCs if present -- Calculates accurate backup size -- Saves complete metadata -- Handles errors gracefully -- Integration test: Full Gitea backup - -**Estimated Effort**: 4 hours - ---- - -### Task 1.6: Build and Test - -**Steps**: -1. Build wild-central-api -2. Deploy to test environment -3. Test Gitea backup (PostgreSQL + PVC) -4. Test Immich backup (PostgreSQL + multiple PVCs) -5. Verify backup files exist and have data -6. Verify metadata accuracy -7. Test manual restore - -**Acceptance Criteria**: -- All builds succeed -- App backups create actual files -- Metadata is accurate -- Manual restore works - -**Estimated Effort**: 4 hours - ---- - -## Phase 2: Restic Integration - -### Goal -Upload staged backups to restic repository with flexible backends. - -### Priority -🟡 **HIGH PRIORITY** (after Phase 1 complete) - -### Timeline -5-7 days - -### Prerequisites -- Phase 1 completed and tested -- Restic installed on Wild Central device -- Backup destination configured (S3, B2, local, etc.) - -### Task 2.1: Configuration Management - -**File**: `wild-central-api/internal/backup/config.go` (new file) - -**Implementation**: -```go -package backup - -import ( - "fmt" - "strings" - - "github.com/wild-cloud/wild-central/daemon/internal/config" -) - -type BackupConfig struct { - Repository string - Staging string - Retention RetentionPolicy - Backend BackendConfig -} - -type RetentionPolicy struct { - KeepDaily int - KeepWeekly int - KeepMonthly int - KeepYearly int -} - -type BackendConfig struct { - Type string - Endpoint string - Region string - Port int -} - -type BackupSecrets struct { - Password string - Credentials BackendCredentials -} - -type BackendCredentials struct { - S3 *S3Credentials - SFTP *SFTPCredentials - Azure *AzureCredentials - GCS *GCSCredentials -} - -type S3Credentials struct { - AccessKeyID string - SecretAccessKey string -} - -type SFTPCredentials struct { - Password string - PrivateKey string -} - -type AzureCredentials struct { - AccountName string - AccountKey string -} - -type GCSCredentials struct { - ProjectID string - ServiceAccountKey string -} - -func LoadBackupConfig(instanceName string) (*BackupConfig, *BackupSecrets, error) { - cfg, err := config.LoadInstanceConfig(instanceName) - if err != nil { - return nil, nil, fmt.Errorf("failed to load config: %w", err) - } - - secrets, err := config.LoadInstanceSecrets(instanceName) - if err != nil { - return nil, nil, fmt.Errorf("failed to load secrets: %w", err) - } - - backupCfg := &BackupConfig{ - Repository: cfg.Cloud.Backup.Repository, - Staging: cfg.Cloud.Backup.Staging, - Retention: RetentionPolicy{ - KeepDaily: cfg.Cloud.Backup.Retention.KeepDaily, - KeepWeekly: cfg.Cloud.Backup.Retention.KeepWeekly, - KeepMonthly: cfg.Cloud.Backup.Retention.KeepMonthly, - KeepYearly: cfg.Cloud.Backup.Retention.KeepYearly, - }, - Backend: BackendConfig{ - Type: DetectBackendType(cfg.Cloud.Backup.Repository), - Endpoint: cfg.Cloud.Backup.Backend.Endpoint, - Region: cfg.Cloud.Backup.Backend.Region, - Port: cfg.Cloud.Backup.Backend.Port, - }, - } - - backupSecrets := &BackupSecrets{ - Password: secrets.Cloud.Backup.Password, - Credentials: BackendCredentials{ - S3: secrets.Cloud.Backup.Credentials.S3, - SFTP: secrets.Cloud.Backup.Credentials.SFTP, - Azure: secrets.Cloud.Backup.Credentials.Azure, - GCS: secrets.Cloud.Backup.Credentials.GCS, - }, - } - - return backupCfg, backupSecrets, nil -} - -func DetectBackendType(repository string) string { - if strings.HasPrefix(repository, "/") { - return "local" - } else if strings.HasPrefix(repository, "sftp:") { - return "sftp" - } else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") { - return "s3" - } else if strings.HasPrefix(repository, "azure:") { - return "azure" - } else if strings.HasPrefix(repository, "gs:") { - return "gcs" - } else if strings.HasPrefix(repository, "rclone:") { - return "rclone" - } - return "unknown" -} - -func ValidateBackupConfig(cfg *BackupConfig, secrets *BackupSecrets) error { - if cfg.Repository == "" { - return fmt.Errorf("repository is required") - } - - if secrets.Password == "" { - return fmt.Errorf("repository password is required") - } - - // Validate backend-specific credentials - switch cfg.Backend.Type { - case "s3": - if secrets.Credentials.S3 == nil { - return fmt.Errorf("S3 credentials required for S3 backend") - } - if secrets.Credentials.S3.AccessKeyID == "" || secrets.Credentials.S3.SecretAccessKey == "" { - return fmt.Errorf("S3 access key and secret key required") - } - case "sftp": - if secrets.Credentials.SFTP == nil { - return fmt.Errorf("SFTP credentials required for SFTP backend") - } - if secrets.Credentials.SFTP.Password == "" && secrets.Credentials.SFTP.PrivateKey == "" { - return fmt.Errorf("SFTP password or private key required") - } - case "azure": - if secrets.Credentials.Azure == nil { - return fmt.Errorf("Azure credentials required for Azure backend") - } - if secrets.Credentials.Azure.AccountName == "" || secrets.Credentials.Azure.AccountKey == "" { - return fmt.Errorf("Azure account name and key required") - } - case "gcs": - if secrets.Credentials.GCS == nil { - return fmt.Errorf("GCS credentials required for GCS backend") - } - if secrets.Credentials.GCS.ServiceAccountKey == "" { - return fmt.Errorf("GCS service account key required") - } - } - - return nil -} -``` - -**Estimated Effort**: 3 hours - ---- - -### Task 2.2: Restic Operations Module - -**File**: `wild-central-api/internal/backup/restic.go` (new file) - -**Implementation**: -```go -package backup - -import ( - "context" - "encoding/json" - "fmt" - "os" - "os/exec" - "strings" -) - -type ResticClient struct { - config *BackupConfig - secrets *BackupSecrets -} - -func NewResticClient(config *BackupConfig, secrets *BackupSecrets) *ResticClient { - return &ResticClient{ - config: config, - secrets: secrets, - } -} - -func (r *ResticClient) buildEnv() map[string]string { - env := map[string]string{ - "RESTIC_REPOSITORY": r.config.Repository, - "RESTIC_PASSWORD": r.secrets.Password, - } - - switch r.config.Backend.Type { - case "s3": - if r.secrets.Credentials.S3 != nil { - env["AWS_ACCESS_KEY_ID"] = r.secrets.Credentials.S3.AccessKeyID - env["AWS_SECRET_ACCESS_KEY"] = r.secrets.Credentials.S3.SecretAccessKey - } - if r.config.Backend.Endpoint != "" { - env["AWS_S3_ENDPOINT"] = r.config.Backend.Endpoint - } - if r.config.Backend.Region != "" { - env["AWS_DEFAULT_REGION"] = r.config.Backend.Region - } - - case "sftp": - if r.secrets.Credentials.SFTP != nil && r.secrets.Credentials.SFTP.Password != "" { - env["RESTIC_SFTP_PASSWORD"] = r.secrets.Credentials.SFTP.Password - } - - case "azure": - if r.secrets.Credentials.Azure != nil { - env["AZURE_ACCOUNT_NAME"] = r.secrets.Credentials.Azure.AccountName - env["AZURE_ACCOUNT_KEY"] = r.secrets.Credentials.Azure.AccountKey - } - } - - return env -} - -func (r *ResticClient) Init(ctx context.Context) error { - cmd := exec.CommandContext(ctx, "restic", "init") - - // Set environment variables - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - output, err := cmd.CombinedOutput() - if err != nil { - return fmt.Errorf("restic init failed: %w: %s", err, string(output)) - } - - return nil -} - -func (r *ResticClient) Backup(ctx context.Context, path string, tags []string) (string, error) { - args := []string{"backup", path} - for _, tag := range tags { - args = append(args, "--tag", tag) - } - - cmd := exec.CommandContext(ctx, "restic", args...) - - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - output, err := cmd.CombinedOutput() - if err != nil { - return "", fmt.Errorf("restic backup failed: %w: %s", err, string(output)) - } - - // Parse snapshot ID from output - snapshotID := r.parseSnapshotID(string(output)) - - return snapshotID, nil -} - -func (r *ResticClient) ListSnapshots(ctx context.Context, tags []string) ([]Snapshot, error) { - args := []string{"snapshots", "--json"} - for _, tag := range tags { - args = append(args, "--tag", tag) - } - - cmd := exec.CommandContext(ctx, "restic", args...) - - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - output, err := cmd.Output() - if err != nil { - return nil, fmt.Errorf("restic snapshots failed: %w", err) - } - - var snapshots []Snapshot - if err := json.Unmarshal(output, &snapshots); err != nil { - return nil, fmt.Errorf("failed to parse snapshots: %w", err) - } - - return snapshots, nil -} - -func (r *ResticClient) Restore(ctx context.Context, snapshotID, targetPath string) error { - cmd := exec.CommandContext(ctx, "restic", "restore", snapshotID, "--target", targetPath) - - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - output, err := cmd.CombinedOutput() - if err != nil { - return fmt.Errorf("restic restore failed: %w: %s", err, string(output)) - } - - return nil -} - -func (r *ResticClient) Stats(ctx context.Context) (*RepositoryStats, error) { - cmd := exec.CommandContext(ctx, "restic", "stats", "--json") - - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - output, err := cmd.Output() - if err != nil { - return nil, fmt.Errorf("restic stats failed: %w", err) - } - - var stats RepositoryStats - if err := json.Unmarshal(output, &stats); err != nil { - return nil, fmt.Errorf("failed to parse stats: %w", err) - } - - return &stats, nil -} - -func (r *ResticClient) TestConnection(ctx context.Context) error { - cmd := exec.CommandContext(ctx, "restic", "cat", "config") - - cmd.Env = os.Environ() - for k, v := range r.buildEnv() { - cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v)) - } - - _, err := cmd.Output() - if err != nil { - return fmt.Errorf("connection test failed: %w", err) - } - - return nil -} - -func (r *ResticClient) parseSnapshotID(output string) string { - lines := strings.Split(output, "\n") - for _, line := range lines { - if strings.Contains(line, "snapshot") && strings.Contains(line, "saved") { - parts := strings.Fields(line) - for i, part := range parts { - if part == "snapshot" && i+1 < len(parts) { - return parts[i+1] - } - } - } - } - return "" -} - -type Snapshot struct { - ID string `json:"id"` - Time string `json:"time"` - Hostname string `json:"hostname"` - Tags []string `json:"tags"` - Paths []string `json:"paths"` -} - -type RepositoryStats struct { - TotalSize int64 `json:"total_size"` - TotalFileCount int64 `json:"total_file_count"` - SnapshotCount int `json:"snapshot_count"` -} -``` - -**Estimated Effort**: 4 hours - ---- - -### Task 2.3: Update Backup Flow to Upload to Restic - -**File**: `wild-central-api/internal/backup/backup.go` - -**Modify BackupApp function** to add restic upload after staging: - -```go -func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) { - // ... existing Phase 1 code to create local backup ... - - // After local backup succeeds, upload to restic if configured - cfg, secrets, err := LoadBackupConfig(instanceName) - if err == nil && cfg.Repository != "" { - // Restic is configured, upload backup - client := NewResticClient(cfg, secrets) - - tags := []string{ - fmt.Sprintf("type:app"), - fmt.Sprintf("app:%s", appName), - fmt.Sprintf("instance:%s", instanceName), - } - - snapshotID, err := client.Backup(ctx, backupDir, tags) - if err != nil { - log.Printf("Warning: restic upload failed: %v", err) - // Don't fail the backup, local files still exist - } else { - info.SnapshotID = snapshotID - - // Clean up staging directory after successful upload - if err := os.RemoveAll(backupDir); err != nil { - log.Printf("Warning: failed to clean staging directory: %v", err) - } - } - } - - // Save final metadata - if err := m.saveBackupMetadata(backupDir, info); err != nil { - return info, fmt.Errorf("failed to save final metadata: %w", err) - } - - return info, nil -} -``` - -**Estimated Effort**: 2 hours - ---- - -### Task 2.4: API Client Updates - -**File**: `wild-web-app/src/services/api/backups.ts` - -**Add configuration endpoints**: - -```typescript -export interface BackupConfiguration { - repository: string; - staging: string; - retention: { - keepDaily: number; - keepWeekly: number; - keepMonthly: number; - keepYearly: number; - }; - backend: { - type: string; - endpoint?: string; - region?: string; - port?: number; - }; -} - -export interface BackupConfigurationWithCredentials extends BackupConfiguration { - password: string; - credentials?: { - s3?: { - accessKeyId: string; - secretAccessKey: string; - }; - sftp?: { - password?: string; - privateKey?: string; - }; - azure?: { - accountName: string; - accountKey: string; - }; - gcs?: { - projectId: string; - serviceAccountKey: string; - }; - }; -} - -export interface RepositoryStatus { - initialized: boolean; - reachable: boolean; - lastBackup?: string; - snapshotCount: number; -} - -export interface RepositoryStats { - repositorySize: number; - repositorySizeHuman: string; - snapshotCount: number; - fileCount: number; - uniqueChunks: number; - compressionRatio: number; - oldestSnapshot?: string; - latestSnapshot?: string; -} - -export async function getBackupConfiguration( - instanceId: string -): Promise<{ config: BackupConfiguration; status: RepositoryStatus }> { - const response = await api.get(`/instances/${instanceId}/backup/config`); - return response.data; -} - -export async function updateBackupConfiguration( - instanceId: string, - config: BackupConfigurationWithCredentials -): Promise { - await api.put(`/instances/${instanceId}/backup/config`, config); -} - -export async function testBackupConnection( - instanceId: string, - config: BackupConfigurationWithCredentials -): Promise { - const response = await api.post(`/instances/${instanceId}/backup/test`, config); - return response.data; -} - -export async function initializeBackupRepository( - instanceId: string, - config: BackupConfigurationWithCredentials -): Promise<{ repositoryId: string }> { - const response = await api.post(`/instances/${instanceId}/backup/init`, config); - return response.data; -} - -export async function getRepositoryStats( - instanceId: string -): Promise { - const response = await api.get(`/instances/${instanceId}/backup/stats`); - return response.data; -} -``` - -**Estimated Effort**: 2 hours - ---- - -### Task 2.5: Configuration UI Components - -Create the following components in `wild-web-app/src/components/backup/`: - -**BackupConfigurationCard.tsx**: -- Main configuration form -- Backend type selector -- Conditional credential inputs -- Retention policy inputs -- Test/Save/Cancel buttons - -**BackendSelector.tsx**: -- Dropdown for backend types -- Shows available backends with icons - -**CredentialsForm.tsx**: -- Dynamic form based on selected backend -- Password/key inputs with visibility toggle -- Validation - -**RepositoryStatus.tsx**: -- Display repository health -- Show stats (size, snapshots, last backup) -- Visual indicators - -**RetentionPolicyInputs.tsx**: -- Number inputs for retention periods -- Tooltips explaining each period - -**Estimated Effort**: 8 hours - ---- - -### Task 2.6: Integrate with BackupsPage - -**File**: `wild-web-app/src/router/pages/BackupsPage.tsx` - -**Add configuration section above backup list**: - -```typescript -function BackupsPage() { - const { instanceId } = useParams(); - const [showConfig, setShowConfig] = useState(false); - - const { data: backupConfig } = useQuery({ - queryKey: ['backup-config', instanceId], - queryFn: () => getBackupConfiguration(instanceId), - }); - - return ( -
- {/* Repository Status Card */} - {backupConfig && ( - setShowConfig(true)} - /> - )} - - {/* Configuration Card (conditional) */} - {showConfig && ( - setShowConfig(false)} - onCancel={() => setShowConfig(false)} - /> - )} - - {/* Existing backup list */} - -
- ); -} -``` - -**Estimated Effort**: 3 hours - ---- - -### Task 2.7: Backup Configuration API Handlers - -**File**: `wild-central-api/internal/api/v1/handlers_backup.go` - -**Add new handlers**: - -```go -func (h *Handler) BackupConfigGet(c *gin.Context) { - instanceName := c.Param("name") - - cfg, secrets, err := backup.LoadBackupConfig(instanceName) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - // Test repository status - var status backup.RepositoryStatus - if cfg.Repository != "" { - client := backup.NewResticClient(cfg, secrets) - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - status.Initialized = true - status.Reachable = client.TestConnection(ctx) == nil - - if stats, err := client.Stats(ctx); err == nil { - status.SnapshotCount = stats.SnapshotCount - } - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "data": gin.H{ - "config": cfg, - "status": status, - }, - }) -} - -func (h *Handler) BackupConfigUpdate(c *gin.Context) { - instanceName := c.Param("name") - - var req backup.BackupConfigurationWithCredentials - if err := c.BindJSON(&req); err != nil { - c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) - return - } - - // Validate configuration - if err := backup.ValidateBackupConfig(&req.Config, &req.Secrets); err != nil { - c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) - return - } - - // Save to config.yaml and secrets.yaml - if err := config.SaveBackupConfig(instanceName, &req); err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "message": "Backup configuration updated successfully", - }) -} - -func (h *Handler) BackupConnectionTest(c *gin.Context) { - var req backup.BackupConfigurationWithCredentials - if err := c.BindJSON(&req); err != nil { - c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) - return - } - - client := backup.NewResticClient(&req.Config, &req.Secrets) - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - status := backup.RepositoryStatus{ - Reachable: client.TestConnection(ctx) == nil, - } - - if status.Reachable { - if stats, err := client.Stats(ctx); err == nil { - status.Initialized = true - status.SnapshotCount = stats.SnapshotCount - } - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "data": status, - }) -} - -func (h *Handler) BackupRepositoryInit(c *gin.Context) { - var req backup.BackupConfigurationWithCredentials - if err := c.BindJSON(&req); err != nil { - c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) - return - } - - client := backup.NewResticClient(&req.Config, &req.Secrets) - - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) - defer cancel() - - if err := client.Init(ctx); err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "message": "Repository initialized successfully", - }) -} - -func (h *Handler) BackupStatsGet(c *gin.Context) { - instanceName := c.Param("name") - - cfg, secrets, err := backup.LoadBackupConfig(instanceName) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - client := backup.NewResticClient(cfg, secrets) - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - stats, err := client.Stats(ctx) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "data": stats, - }) -} -``` - -**Register routes**: -```go -backupGroup := v1.Group("/instances/:name/backup") -{ - backupGroup.GET("/config", h.BackupConfigGet) - backupGroup.PUT("/config", h.BackupConfigUpdate) - backupGroup.POST("/test", h.BackupConnectionTest) - backupGroup.POST("/init", h.BackupRepositoryInit) - backupGroup.GET("/stats", h.BackupStatsGet) -} -``` - -**Estimated Effort**: 4 hours - ---- - -### Task 2.8: End-to-End Testing - -**Test scenarios**: -1. Configure local repository via UI -2. Configure S3 repository via UI -3. Test connection validation -4. Create backup and verify upload -5. Check repository stats -6. Test error handling - -**Estimated Effort**: 4 hours - ---- - -## Phase 3: Restore from Restic - -### Goal -Enable users to restore backups from restic snapshots. - -### Priority -🟢 **MEDIUM PRIORITY** (after Phase 2 complete) - -### Timeline -3-5 days - -### Task 3.1: List Snapshots API - -**File**: `wild-central-api/internal/api/v1/handlers_backup.go` - -**Implementation**: -```go -func (h *Handler) BackupSnapshotsList(c *gin.Context) { - instanceName := c.Param("name") - appName := c.Query("app") - - cfg, secrets, err := backup.LoadBackupConfig(instanceName) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - client := backup.NewResticClient(cfg, secrets) - - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) - defer cancel() - - var tags []string - if appName != "" { - tags = append(tags, fmt.Sprintf("app:%s", appName)) - } - - snapshots, err := client.ListSnapshots(ctx, tags) - if err != nil { - c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - return - } - - c.JSON(http.StatusOK, gin.H{ - "success": true, - "data": snapshots, - }) -} -``` - -**Estimated Effort**: 2 hours - ---- - -### Task 3.2: Restore Snapshot Function - -**File**: `wild-central-api/internal/backup/backup.go` - -**Implementation**: -```go -func (m *Manager) RestoreFromSnapshot(instanceName, snapshotID string) error { - ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute) - defer cancel() - - // Load restic config - cfg, secrets, err := LoadBackupConfig(instanceName) - if err != nil { - return fmt.Errorf("failed to load config: %w", err) - } - - client := NewResticClient(cfg, secrets) - - // Create temp directory for restore - tempDir := filepath.Join(cfg.Staging, "restore", snapshotID) - if err := os.MkdirAll(tempDir, 0755); err != nil { - return fmt.Errorf("failed to create temp directory: %w", err) - } - defer os.RemoveAll(tempDir) - - // Restore snapshot to temp directory - if err := client.Restore(ctx, snapshotID, tempDir); err != nil { - return fmt.Errorf("restic restore failed: %w", err) - } - - // Parse metadata to determine what to restore - metadataFile := filepath.Join(tempDir, "backup.json") - info, err := m.loadBackupMetadata(metadataFile) - if err != nil { - return fmt.Errorf("failed to load metadata: %w", err) - } - - // Restore databases - for _, file := range info.Files { - if strings.HasSuffix(file, "postgres.sql") { - if err := m.restorePostgres(ctx, info.AppName, filepath.Join(tempDir, "postgres.sql")); err != nil { - return fmt.Errorf("postgres restore failed: %w", err) - } - } else if strings.HasSuffix(file, "mysql.sql") { - if err := m.restoreMySQL(ctx, info.AppName, filepath.Join(tempDir, "mysql.sql")); err != nil { - return fmt.Errorf("mysql restore failed: %w", err) - } - } - } - - // Restore PVCs - for _, file := range info.Files { - if strings.HasSuffix(file, ".tar.gz") { - pvcName := strings.TrimSuffix(filepath.Base(file), ".tar.gz") - if err := m.restorePVC(ctx, info.AppName, pvcName, filepath.Join(tempDir, file)); err != nil { - return fmt.Errorf("pvc restore failed: %w", err) - } - } - } - - return nil -} - -func (m *Manager) restorePostgres(ctx context.Context, appName, dumpFile string) error { - dbName := appName - - podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres") - if err != nil { - return fmt.Errorf("postgres pod not found: %w", err) - } - - // Drop and recreate database - cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres", - podName, "--", "psql", "-U", "postgres", "-c", - fmt.Sprintf("DROP DATABASE IF EXISTS %s; CREATE DATABASE %s;", dbName, dbName)) - - if err := cmd.Run(); err != nil { - return fmt.Errorf("failed to recreate database: %w", err) - } - - // Restore dump - dumpData, err := os.ReadFile(dumpFile) - if err != nil { - return fmt.Errorf("failed to read dump: %w", err) - } - - cmd = exec.CommandContext(ctx, "kubectl", "exec", "-i", "-n", "postgres", - podName, "--", "psql", "-U", "postgres", dbName) - cmd.Stdin = strings.NewReader(string(dumpData)) - - if err := cmd.Run(); err != nil { - return fmt.Errorf("psql restore failed: %w", err) - } - - return nil -} - -func (m *Manager) restoreMySQL(ctx context.Context, appName, dumpFile string) error { - // Similar implementation to restorePostgres - // Use mysqldump with password from secret - return nil -} - -func (m *Manager) restorePVC(ctx context.Context, namespace, pvcName, tarFile string) error { - podName, err := m.findPodUsingPVC(ctx, namespace, pvcName) - if err != nil { - return fmt.Errorf("no pod found using PVC: %w", err) - } - - mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName) - if err != nil { - return fmt.Errorf("failed to get mount path: %w", err) - } - - // Copy tar file to pod - cmd := exec.CommandContext(ctx, "kubectl", "cp", tarFile, - fmt.Sprintf("%s/%s:/tmp/restore.tar.gz", namespace, podName)) - - if err := cmd.Run(); err != nil { - return fmt.Errorf("kubectl cp failed: %w", err) - } - - // Extract tar file - cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace, - podName, "--", "tar", "xzf", "/tmp/restore.tar.gz", "-C", mountPath) - - if err := cmd.Run(); err != nil { - return fmt.Errorf("tar extract failed: %w", err) - } - - // Clean up temp file - cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace, - podName, "--", "rm", "/tmp/restore.tar.gz") - cmd.Run() // Ignore error - - return nil -} -``` - -**Estimated Effort**: 5 hours - ---- - -### Task 3.3: Restore API Handler - -**File**: `wild-central-api/internal/api/v1/handlers_backup.go` - -**Implementation**: -```go -func (h *Handler) BackupSnapshotRestore(c *gin.Context) { - instanceName := c.Param("name") - snapshotID := c.Param("snapshotId") - - // Start restore operation asynchronously - go func() { - if err := h.backupManager.RestoreFromSnapshot(instanceName, snapshotID); err != nil { - log.Printf("Restore failed: %v", err) - } - }() - - c.JSON(http.StatusAccepted, gin.H{ - "success": true, - "message": "Restore operation started", - }) -} -``` - -**Estimated Effort**: 1 hour - ---- - -### Task 3.4: Restore UI - -**File**: `wild-web-app/src/components/backup/RestoreDialog.tsx` - -**Implementation**: -Create dialog that: -- Lists available snapshots -- Shows snapshot details (date, size, files) -- Confirmation before restore -- Progress indicator - -**Estimated Effort**: 4 hours - ---- - -### Task 3.5: End-to-End Restore Testing - -**Test scenarios**: -1. List snapshots for app -2. Select snapshot to restore -3. Restore database -4. Restore PVCs -5. Verify application works after restore -6. Test error handling - -**Estimated Effort**: 3 hours - ---- - -## API Specifications - -### Complete API Reference - -``` -# Backup Operations -POST /api/v1/instances/{name}/backups/app/{appName} # Create app backup -POST /api/v1/instances/{name}/backups/cluster # Create cluster backup -GET /api/v1/instances/{name}/backups/app # List app backups -GET /api/v1/instances/{name}/backups/cluster # List cluster backups -DELETE /api/v1/instances/{name}/backups/app/{appName}/{id} # Delete app backup -DELETE /api/v1/instances/{name}/backups/cluster/{id} # Delete cluster backup - -# Backup Configuration (Phase 2) -GET /api/v1/instances/{name}/backup/config # Get backup configuration -PUT /api/v1/instances/{name}/backup/config # Update configuration -POST /api/v1/instances/{name}/backup/test # Test connection -POST /api/v1/instances/{name}/backup/init # Initialize repository -GET /api/v1/instances/{name}/backup/stats # Get repository stats - -# Restore Operations (Phase 3) -GET /api/v1/instances/{name}/backup/snapshots # List snapshots -POST /api/v1/instances/{name}/backup/snapshots/{id}/restore # Restore snapshot -``` - ---- - -## Web UI Design - -### Page Structure - -**BackupsPage Layout**: -``` -┌─────────────────────────────────────────────────┐ -│ Backups │ -├─────────────────────────────────────────────────┤ -│ │ -│ ┌─ Backup Status ─────────────────────────┐ │ -│ │ Repository: Configured ✓ │ │ -│ │ Last Backup: 2 hours ago │ │ -│ │ Total Size: 2.4 GB │ │ -│ │ Snapshots: 24 │ │ -│ │ [Edit Configuration] │ │ -│ └─────────────────────────────────────────┘ │ -│ │ -│ ┌─ Recent Backups ────────────────────────┐ │ -│ │ [Backup cards with restore/delete] │ │ -│ │ ... │ │ -│ └─────────────────────────────────────────┘ │ -│ │ -│ ┌─ Configuration (when editing) ──────────┐ │ -│ │ Backend Type: [S3 ▼] │ │ -│ │ Repository URI: [s3:bucket/path ] │ │ -│ │ Credentials: │ │ -│ │ Access Key ID: [••••••••••• ] │ │ -│ │ Secret Key: [•••••••••••••••• ] │ │ -│ │ Retention Policy: │ │ -│ │ Daily: [7] Weekly: [4] Monthly: [6] │ │ -│ │ [Test Connection] [Save] [Cancel] │ │ -│ └─────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────┘ -``` - -### Component Hierarchy - -``` -BackupsPage -├── BackupStatusCard (read-only) -│ ├── RepositoryStatus -│ ├── Stats (size, snapshots, last backup) -│ └── EditButton -│ -├── BackupListSection -│ └── BackupCard[] (existing) -│ -└── BackupConfigurationCard (conditional) - ├── BackendTypeSelect - ├── RepositoryUriInput - ├── CredentialsSection - │ ├── S3CredentialsForm (conditional) - │ ├── SFTPCredentialsForm (conditional) - │ └── ... - ├── RetentionPolicyInputs - └── ActionButtons - ├── TestConnectionButton - ├── SaveButton - └── CancelButton -``` - ---- - -## Testing Strategy - -### Phase 1 Testing - -**Unit Tests**: -- Manifest parsing -- Helper functions (contains, findPodInNamespace) -- Backup file creation - -**Integration Tests**: -- End-to-end Gitea backup (PostgreSQL + PVC) -- End-to-end Immich backup (PostgreSQL + multiple PVCs) -- Backup with no database -- Backup with no PVCs - -**Manual Tests**: -1. Create backup via web UI -2. Verify `.sql` file exists with actual data -3. Verify `.tar.gz` files exist with actual data -4. Check metadata accuracy -5. Test delete functionality - -### Phase 2 Testing - -**Unit Tests**: -- Backend type detection -- Environment variable mapping -- Configuration validation - -**Integration Tests**: -- Repository initialization (local, S3, SFTP) -- Backup upload to restic -- Snapshot listing -- Stats retrieval -- Connection testing - -**Manual Tests**: -1. Configure local repository via UI -2. Configure S3 repository via UI -3. Test connection validation before save -4. Create backup and verify in restic -5. Check repository stats display -6. Test error handling for bad credentials - -### Phase 3 Testing - -**Integration Tests**: -- Restore database from snapshot -- Restore PVC from snapshot -- Full app restore -- Handle missing/corrupted snapshots - -**Manual Tests**: -1. List snapshots in UI -2. Select and restore from snapshot -3. Verify database data after restore -4. Verify PVC data after restore -5. Verify application functions correctly - ---- - -## Deployment Guide - -### Phase 1 Deployment - -**Preparation**: -1. Update wild-central-api code -2. Build and test on development instance -3. Verify backup files created with real data -4. Test manual restore - -**Rollout**: -1. Deploy to staging environment -2. Create test backups for multiple apps -3. Verify all backup files exist -4. Manually restore one backup to verify -5. Deploy to production - -**Rollback Plan**: -- Previous version still creates metadata files -- No breaking changes to backup structure -- Users can manually copy backup files if needed - -### Phase 2 Deployment - -**Preparation**: -1. Install restic on Wild Central devices: `apt install restic` -2. Update wild-central-api with restic code -3. Update wild-web-app with configuration UI -4. Test on development with local repository -5. Test with S3 and SFTP backends - -**Migration**: -- Existing local backups remain accessible -- Users opt-in to restic by configuring repository -- Gradual migration: Phase 1 staging continues working - -**Rollout**: -1. Deploy backend API updates -2. Deploy web UI updates -3. Create user documentation with examples -4. Provide migration guide for existing setups - -**Rollback Plan**: -- Restic is optional: users can continue using local backups -- Configuration in config.yaml: easy to revert -- No data loss: existing backups preserved - -### Phase 3 Deployment - -**Preparation**: -1. Ensure Phase 2 is stable -2. Ensure at least one backup exists in restic -3. Test restore in staging environment - -**Rollout**: -1. Deploy restore functionality -2. Document restore procedures -3. Train users on restore process - ---- - -## Task Breakdown - -### Phase 1 Tasks (2-3 days) - -| Task | Description | Effort | Dependencies | -|------|-------------|--------|--------------| -| 1.1 | Manifest-based database detection | 2h | None | -| 1.2 | PostgreSQL backup via kubectl exec | 3h | 1.1 | -| 1.3 | MySQL backup via kubectl exec | 3h | 1.1 | -| 1.4 | PVC discovery and backup | 4h | 1.1 | -| 1.5 | Update BackupApp flow | 4h | 1.2, 1.3, 1.4 | -| 1.6 | Build and test | 4h | 1.5 | - -**Total**: 20 hours (2.5 days) - -### Phase 2 Tasks (5-7 days) - -| Task | Description | Effort | Dependencies | -|------|-------------|--------|--------------| -| 2.1 | Configuration management | 3h | Phase 1 done | -| 2.2 | Restic operations module | 4h | 2.1 | -| 2.3 | Update backup flow for restic | 2h | 2.2 | -| 2.4 | API client updates | 2h | Phase 1 done | -| 2.5 | Configuration UI components | 8h | 2.4 | -| 2.6 | Integrate with BackupsPage | 3h | 2.5 | -| 2.7 | Backup configuration API handlers | 4h | 2.1, 2.2 | -| 2.8 | End-to-end testing | 4h | 2.3, 2.6, 2.7 | - -**Total**: 30 hours (3.75 days) - -### Phase 3 Tasks (3-5 days) - -| Task | Description | Effort | Dependencies | -|------|-------------|--------|--------------| -| 3.1 | List snapshots API | 2h | Phase 2 done | -| 3.2 | Restore snapshot function | 5h | 3.1 | -| 3.3 | Restore API handler | 1h | 3.2 | -| 3.4 | Restore UI | 4h | 3.3 | -| 3.5 | End-to-end restore testing | 3h | 3.4 | - -**Total**: 15 hours (2 days) - -### Grand Total -**65 hours** across 3 phases (8-12 days total) - ---- - -## Success Criteria - -### Phase 1 Success -- ✅ App backups create actual database dumps (`.sql` files) -- ✅ App backups create actual PVC archives (`.tar.gz` files) -- ✅ Backup metadata accurately lists all files -- ✅ Backups organized in timestamped directories -- ✅ In-progress tracking works correctly -- ✅ Delete functionality works for both app and cluster backups -- ✅ No silent failures (clear error messages) -- ✅ Manual restore verified working - -### Phase 2 Success -- ✅ Users can configure restic repository via web UI -- ✅ Configuration persists to config.yaml/secrets.yaml -- ✅ Test connection validates before save -- ✅ Backups automatically upload to restic repository -- ✅ Repository stats display correctly in UI -- ✅ Local, S3, and SFTP backends supported and tested -- ✅ Clear error messages for authentication/connection failures -- ✅ Staging files cleaned after successful upload - -### Phase 3 Success -- ✅ Users can list available snapshots in UI -- ✅ Users can restore from any snapshot via UI -- ✅ Database restoration works correctly -- ✅ PVC restoration works correctly -- ✅ Application functional after restore -- ✅ Error handling for corrupted snapshots - -### Long-Term Metrics -- **Storage Efficiency**: Deduplication achieves 60-80% space savings -- **Reliability**: < 1% backup failures -- **Performance**: Backup TB-scale data in < 4 hours -- **User Satisfaction**: Backup/restore completes without support intervention - ---- - -## Dependencies and Prerequisites - -### External Dependencies - -**Restic** (backup tool): -- Installation: `apt install restic` -- Version: >= 0.16.0 recommended -- License: BSD 2-Clause (compatible) - -**kubectl** (Kubernetes CLI): -- Already required for Wild Cloud operations -- Used for database dumps and PVC backup - -### Infrastructure Prerequisites - -**Storage Requirements**: - -**Staging Directory**: -- Location: `/var/lib/wild-central/backup-staging` (default) -- Space: `max(largest_database, largest_pvc) + 20% buffer` -- Recommendation: Monitor space, warn if < 50GB free - -**Restic Repository**: -- Local: Sufficient disk space on target mount -- Network: Mounted filesystem (NFS/SMB) -- Cloud: Typically unlimited, check quota/billing - -**Network Requirements**: -- Outbound HTTPS (443) for S3/B2/cloud backends -- Outbound SSH (22 or custom) for SFTP -- No inbound ports needed - -### Security Considerations - -**Credentials Storage**: -- Stored in secrets.yaml -- Never logged or exposed in API responses -- Transmitted only via HTTPS to backend APIs - -**Encryption**: -- Restic: AES-256 encryption of all backup data -- Transport: TLS for cloud backends, SSH for SFTP -- At rest: Depends on backend (S3 server-side encryption, etc.) - -**Access Control**: -- API endpoints check instance ownership -- Repository password required for all restic operations -- Backend credentials validated before save - ---- - -## Philosophy Compliance Review - -### KISS (Keep It Simple, Stupid) - -✅ **What We're Doing Right**: -- Restic repository URI as simple string (native format) -- Backend type auto-detected from URI prefix -- Credentials organized by backend type -- No complex abstraction layers - -✅ **What We're Avoiding**: -- Custom backup format -- Complex configuration DSL -- Over-abstracted backend interfaces -- Scheduling/automation (not needed yet) - -### YAGNI (You Aren't Gonna Need It) - -✅ **Building Only What's Needed**: -- Basic configuration (repository, credentials, retention) -- Test connection before save -- Upload to restic after staging -- Display repository stats - -❌ **Not Building** (until proven needed): -- Automated scheduling -- Multiple repository support -- Backup verification automation -- Email notifications -- Bandwidth limiting -- Custom encryption options - -### No Future-Proofing - -✅ **Current Requirements Only**: -- Support TB-scale data (restic deduplication) -- Flexible storage destinations (restic backends) -- Storage constraints (upload to remote, not local-only) - -❌ **Not Speculating On**: -- "What if users want backup versioning rules?" -- "What if users need bandwidth control?" -- "What if users want custom encryption?" -- Build these features WHEN users ask, not before - -### Trust in Emergence - -✅ **Starting Simple**: -- Phase 1: Fix core backup (files actually created) -- Phase 2: Add restic upload (storage flexibility) -- Phase 3: Add restore from restic -- Phase 4+: Wait for user feedback - -**Let complexity emerge from actual needs**, not speculation. - ---- - -## Conclusion - -This complete implementation guide provides everything needed to implement a production-ready backup system for Wild Cloud across three phases: - -1. **Phase 1 (CRITICAL)**: Fix broken app backups by creating actual database dumps and PVC archives using manifest-based detection and kubectl exec -2. **Phase 2 (HIGH)**: Integrate restic for TB-scale data, flexible storage backends, and configuration via web UI -3. **Phase 3 (MEDIUM)**: Enable restore from restic snapshots - -All phases are designed following Wild Cloud's KISS/YAGNI philosophy: build only what's needed now, let complexity emerge from actual requirements, and trust that good architecture emerges from simplicity. - -The implementation is ready for a senior engineer to begin Phase 1 immediately with all necessary context, specifications, code examples, and guidance provided. - ---- - -**Document Version**: 1.0 -**Created**: 2025-11-26 -**Status**: Ready for implementation -**Next Action**: Begin Phase 1, Task 1.1