72 KiB
Wild Cloud Backup System - Complete Implementation Guide
Date: 2025-11-26 Status: 📋 READY FOR IMPLEMENTATION Estimated Effort: Phase 1: 2-3 days | Phase 2: 5-7 days | Phase 3: 3-5 days
Table of Contents
- Executive Summary
- Background and Context
- Problem Analysis
- Architecture Overview
- Configuration Design
- Phase 1: Core Backup Fix
- Phase 2: Restic Integration
- Phase 3: Restore from Restic
- API Specifications
- Web UI Design
- Testing Strategy
- Deployment Guide
- Task Breakdown
- Success Criteria
Executive Summary
Current State
App backups are completely broken - they create only metadata files (backup.json) without any actual backup data:
- ❌ No database dump files (
.sql,.dump) - ❌ No PVC archive files (
.tar.gz) - ❌ Users cannot restore from these "backups"
- ✅ Cluster backups work correctly (different code path)
Root Cause
Database detection uses pod label-based discovery (app=gitea in postgres namespace), but database pods are shared infrastructure labeled app=postgres. Detection always returns empty, so no backups are created.
Why This Matters
- Scale: Applications like Immich may host terabyte-scale photo libraries
- Storage: Wild Central devices may not have sufficient local storage
- Flexibility: Need flexible destinations: local, NFS, S3, Backblaze B2, SFTP, etc.
- Deduplication: Critical for TB-scale data (60-80% space savings)
Solution: Three-Phase Approach
Phase 1 (CRITICAL - 2-3 days): Fix broken app backups
- Manifest-based database detection (declarative)
- kubectl exec for database dumps
- PVC discovery and backup
- Store files locally in staging directory
Phase 2 (HIGH PRIORITY - 5-7 days): Restic integration
- Upload staged files to restic repository
- Configuration via config.yaml and web UI
- Support multiple backends (local, S3, B2, SFTP)
- Repository initialization and testing
Phase 3 (MEDIUM PRIORITY - 3-5 days): Restore from restic
- List available snapshots
- Restore from any snapshot
- Database and PVC restoration
- Web UI for restore operations
Background and Context
Project Philosophy
Wild Cloud follows strict KISS/YAGNI principles:
- KISS: Keep implementations as simple as possible
- YAGNI: Build only what's needed now, not speculative features
- No future-proofing: Let complexity emerge from actual requirements
- Trust in emergence: Start simple, enhance when requirements proven
Key Design Decisions
- Manifest-based detection: Read app dependencies from
manifest.yaml(declarative), not runtime pod discovery - kubectl exec approach: Use standard Kubernetes operations for dumps and tar archives
- Restic for scale: Use battle-tested restic tool for TB-scale data and flexible backends
- Phased implementation: Fix core bugs first, add features incrementally
Why Restic?
Justified by actual requirements (not premature optimization):
- Scale: Handle TB-scale data (Immich with terabytes of photos)
- Flexibility: Multiple backends (local, S3, B2, SFTP, Azure, GCS)
- Efficiency: 60-80% space savings via deduplication
- Security: Built-in AES-256 encryption
- Reliability: Battle-tested, widely adopted
- Incremental: Only backup changed blocks
Problem Analysis
Critical Bug: App Backups Create No Files
Evidence from /home/payne/repos/wild-cloud-dev/.working/in-progress-fix.md:
Backup structure:
apps/
└── gitea/
└── 20241124T143022Z/
└── backup.json ← Only this file exists!
Expected structure:
apps/
└── gitea/
└── 20241124T143022Z/
├── backup.json
├── postgres.sql ← Missing!
└── data.tar.gz ← Missing!
Root Cause Analysis
File: wild-central-api/internal/backup/backup.go (lines 544-569)
func (m *Manager) detectDatabaseType(ctx context.Context, namespace, appLabel string) (string, error) {
// This looks for pods with label "app=gitea" in namespace "postgres"
// But database pods are labeled "app=postgres" in namespace "postgres"
// This ALWAYS returns empty result!
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
"-n", namespace,
"-l", fmt.Sprintf("app=%s", appLabel), // ← Wrong label!
"-o", "jsonpath={.items[0].metadata.name}")
output, err := cmd.Output()
if err != nil || len(output) == 0 {
return "", nil // ← Returns empty, no backup created
}
// ...
}
Why It's Broken:
- Gitea backup tries to find pod with label
app=giteain namespacepostgres - But PostgreSQL pod is labeled
app=postgresin namespacepostgres - Detection always fails → no database dump created
- Same problem for PVC detection → no PVC archive created
- Only
backup.jsonmetadata file is written
Why Cluster Backups Work
Cluster backups don't use app-specific detection:
- Directly use
kubectl getto find etcd pods - Use hardcoded paths for config files
- Don't rely on app-based pod labels
- Actually create
.tar.gzfiles with real data
Architecture Overview
System Components
┌─────────────────────────────────────────────────────────┐
│ Wild Cloud Backup System │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌─ Web UI (wild-web-app) ─────────────────────┐ │
│ │ - Backup configuration form │ │
│ │ - Repository status display │ │
│ │ - Backup creation/restore UI │ │
│ └──────────────────┬───────────────────────────┘ │
│ │ REST API │
│ ┌─ API Layer (wild-central-api) ───────────────┐ │
│ │ - Backup configuration endpoints │ │
│ │ - Backup/restore operation handlers │ │
│ │ - Restic integration layer │ │
│ └──────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─ Backup Engine ────────────────────────────┐ │
│ │ - Manifest parser │ │
│ │ - Database backup (kubectl exec pg_dump) │ │
│ │ - PVC backup (kubectl exec tar) │ │
│ │ - Restic upload (Phase 2) │ │
│ └──────────────────┬───────────────────────────┘ │
│ │ │
│ ┌─ Storage Layer ────────────────────────────┐ │
│ │ Phase 1: Local staging directory │ │
│ │ Phase 2: Restic repository (local/remote) │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Data Flow
Phase 1 (Local Staging):
User clicks "Backup" → API Handler
↓
Read manifest.yaml (detect databases)
↓
kubectl exec pg_dump → postgres.sql
↓
kubectl exec tar → pvc-data.tar.gz
↓
Save to /var/lib/wild-central/backup-staging/
↓
Write backup.json metadata
Phase 2 (Restic Upload):
[Same as Phase 1] → Local staging files created
↓
restic backup <staging-dir>
↓
Upload to repository (S3/B2/local/etc)
↓
Clean staging directory
↓
Update metadata with snapshot ID
Phase 3 (Restore):
User selects snapshot → restic restore <snapshot-id>
↓
Download to staging directory
↓
kubectl exec psql < postgres.sql
↓
kubectl cp tar file → pod
↓
kubectl exec tar -xzf → restore PVC data
Configuration Design
Schema: config.yaml
cloud:
domain: "wildcloud.local"
dns:
ip: "192.168.8.50"
backup:
# Restic repository location (native restic URI format)
repository: "/mnt/backups/wild-cloud" # or "s3:bucket" or "sftp:user@host:/path"
# Local staging directory (always on Wild Central filesystem)
staging: "/var/lib/wild-central/backup-staging"
# Retention policy (restic forget flags)
retention:
keepDaily: 7
keepWeekly: 4
keepMonthly: 6
keepYearly: 2
# Backend-specific configuration (optional, backend-dependent)
backend:
# For S3-compatible backends (B2, Wasabi, MinIO)
endpoint: "s3.us-west-002.backblazeb2.com"
region: "us-west-002"
# For SFTP
port: 22
Schema: secrets.yaml
cloud:
backup:
# Restic repository encryption password
password: "strong-encryption-password"
# Backend credentials (conditional on backend type)
credentials:
# For S3/B2/S3-compatible (auto-detected from repository prefix)
s3:
accessKeyId: "KEY_ID"
secretAccessKey: "SECRET_KEY"
# For SFTP
sftp:
password: "ssh-password"
# OR
privateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----
# For Azure
azure:
accountName: "account"
accountKey: "key"
# For Google Cloud
gcs:
projectId: "project-id"
serviceAccountKey: |
{ "type": "service_account", ... }
Configuration Examples
Example 1: Local Testing
config.yaml:
cloud:
backup:
repository: "/mnt/external-drive/wild-cloud-backups"
staging: "/var/lib/wild-central/backup-staging"
retention:
keepDaily: 7
keepWeekly: 4
keepMonthly: 6
secrets.yaml:
cloud:
backup:
password: "test-backup-password-123"
Example 2: Backblaze B2
config.yaml:
cloud:
backup:
repository: "b2:wild-cloud-backups"
staging: "/var/lib/wild-central/backup-staging"
retention:
keepDaily: 7
keepWeekly: 4
keepMonthly: 6
backend:
endpoint: "s3.us-west-002.backblazeb2.com"
region: "us-west-002"
secrets.yaml:
cloud:
backup:
password: "strong-encryption-password"
credentials:
s3:
accessKeyId: "0020123456789abcdef"
secretAccessKey: "K002abcdefghijklmnop"
Example 3: AWS S3
config.yaml:
cloud:
backup:
repository: "s3:s3.amazonaws.com/my-wild-cloud-backups"
staging: "/var/lib/wild-central/backup-staging"
retention:
keepDaily: 14
keepWeekly: 8
keepMonthly: 12
backend:
region: "us-east-1"
secrets.yaml:
cloud:
backup:
password: "prod-encryption-password"
credentials:
s3:
accessKeyId: "AKIAIOSFODNN7EXAMPLE"
secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCY"
Example 4: SFTP Remote Server
config.yaml:
cloud:
backup:
repository: "sftp:backup-user@backup.example.com:/wild-cloud-backups"
staging: "/var/lib/wild-central/backup-staging"
retention:
keepDaily: 7
keepWeekly: 4
keepMonthly: 6
backend:
port: 2222
secrets.yaml:
cloud:
backup:
password: "restic-repo-password"
credentials:
sftp:
privateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----
Example 5: NFS/SMB Mount (as Local Path)
config.yaml:
cloud:
backup:
repository: "/mnt/nas-backups/wild-cloud" # NFS mounted via OS
staging: "/var/lib/wild-central/backup-staging"
retention:
keepDaily: 7
keepWeekly: 4
keepMonthly: 6
secrets.yaml:
cloud:
backup:
password: "backup-encryption-password"
Backend Detection Logic
func DetectBackendType(repository string) string {
if strings.HasPrefix(repository, "/") {
return "local"
} else if strings.HasPrefix(repository, "sftp:") {
return "sftp"
} else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") {
return "s3"
} else if strings.HasPrefix(repository, "azure:") {
return "azure"
} else if strings.HasPrefix(repository, "gs:") {
return "gcs"
} else if strings.HasPrefix(repository, "rclone:") {
return "rclone"
}
return "unknown"
}
Environment Variable Mapping
func BuildResticEnv(config BackupConfig, secrets BackupSecrets) map[string]string {
env := map[string]string{
"RESTIC_REPOSITORY": config.Repository,
"RESTIC_PASSWORD": secrets.Password,
}
backendType := DetectBackendType(config.Repository)
switch backendType {
case "s3":
env["AWS_ACCESS_KEY_ID"] = secrets.Credentials.S3.AccessKeyID
env["AWS_SECRET_ACCESS_KEY"] = secrets.Credentials.S3.SecretAccessKey
if config.Backend.Endpoint != "" {
env["AWS_S3_ENDPOINT"] = config.Backend.Endpoint
}
if config.Backend.Region != "" {
env["AWS_DEFAULT_REGION"] = config.Backend.Region
}
case "sftp":
if secrets.Credentials.SFTP.Password != "" {
env["RESTIC_SFTP_PASSWORD"] = secrets.Credentials.SFTP.Password
}
// SSH key handling done via temp file
case "azure":
env["AZURE_ACCOUNT_NAME"] = secrets.Credentials.Azure.AccountName
env["AZURE_ACCOUNT_KEY"] = secrets.Credentials.Azure.AccountKey
case "gcs":
// Write service account key to temp file, set GOOGLE_APPLICATION_CREDENTIALS
}
return env
}
Phase 1: Core Backup Fix
Goal
Fix critical bugs and create actual backup files (no restic yet).
Priority
🔴 CRITICAL - Users cannot restore from current backups
Timeline
2-3 days
Overview
Replace broken pod label-based detection with manifest-based detection. Use kubectl exec to create actual database dumps and PVC archives.
Task 1.1: Implement Manifest-Based Database Detection
File: wild-central-api/internal/backup/backup.go
Add New Structures:
type AppDependencies struct {
HasPostgres bool
HasMySQL bool
HasRedis bool
}
Implement Detection Function:
func (m *Manager) getAppDependencies(appName string) (*AppDependencies, error) {
manifestPath := filepath.Join(m.directoryPath, appName, "manifest.yaml")
manifest, err := directory.LoadManifest(manifestPath)
if err != nil {
return nil, fmt.Errorf("failed to load manifest: %w", err)
}
deps := &AppDependencies{
HasPostgres: contains(manifest.Requires, "postgres"),
HasMySQL: contains(manifest.Requires, "mysql"),
HasRedis: contains(manifest.Requires, "redis"),
}
return deps, nil
}
func contains(slice []string, item string) bool {
for _, s := range slice {
if s == item {
return true
}
}
return false
}
Changes Required:
- Add import:
"github.com/wild-cloud/wild-central/daemon/internal/directory" - Remove old
detectDatabaseType()function (lines 544-569)
Acceptance Criteria:
- Reads manifest.yaml for app
- Correctly identifies postgres dependency
- Correctly identifies mysql dependency
- Returns error if manifest not found
- Unit test: parse manifest with postgres
- Unit test: parse manifest without databases
Estimated Effort: 2 hours
Task 1.2: Implement PostgreSQL Backup via kubectl exec
File: wild-central-api/internal/backup/backup.go
Implementation:
func (m *Manager) backupPostgres(ctx context.Context, instanceName, appName, backupDir string) (string, error) {
dbName := appName // Database name convention
// Find postgres pod in postgres namespace
podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres")
if err != nil {
return "", fmt.Errorf("postgres pod not found: %w", err)
}
// Execute pg_dump
dumpFile := filepath.Join(backupDir, "postgres.sql")
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres",
podName, "--", "pg_dump", "-U", "postgres", dbName)
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("pg_dump failed: %w", err)
}
// Write dump to file
if err := os.WriteFile(dumpFile, output, 0600); err != nil {
return "", fmt.Errorf("failed to write dump: %w", err)
}
return dumpFile, nil
}
// Helper function to find pod by label
func (m *Manager) findPodInNamespace(ctx context.Context, namespace, labelSelector string) (string, error) {
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
"-n", namespace,
"-l", labelSelector,
"-o", "jsonpath={.items[0].metadata.name}")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("kubectl get pods failed: %w", err)
}
podName := strings.TrimSpace(string(output))
if podName == "" {
return "", fmt.Errorf("no pod found with label %s in namespace %s", labelSelector, namespace)
}
return podName, nil
}
Acceptance Criteria:
- Finds postgres pod correctly
- Executes pg_dump successfully
- Creates .sql file with actual data
- Handles errors gracefully
- Integration test: backup Gitea database
Estimated Effort: 3 hours
Task 1.3: Implement MySQL Backup via kubectl exec
File: wild-central-api/internal/backup/backup.go
Implementation:
func (m *Manager) backupMySQL(ctx context.Context, instanceName, appName, backupDir string) (string, error) {
dbName := appName
// Find mysql pod
podName, err := m.findPodInNamespace(ctx, "mysql", "app=mysql")
if err != nil {
return "", fmt.Errorf("mysql pod not found: %w", err)
}
// Get MySQL root password from secret
password, err := m.getMySQLPassword(ctx)
if err != nil {
return "", fmt.Errorf("failed to get mysql password: %w", err)
}
// Execute mysqldump
dumpFile := filepath.Join(backupDir, "mysql.sql")
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "mysql",
podName, "--", "mysqldump",
"-uroot",
fmt.Sprintf("-p%s", password),
"--single-transaction",
"--routines",
"--triggers",
dbName)
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("mysqldump failed: %w", err)
}
if err := os.WriteFile(dumpFile, output, 0600); err != nil {
return "", fmt.Errorf("failed to write dump: %w", err)
}
return dumpFile, nil
}
func (m *Manager) getMySQLPassword(ctx context.Context) (string, error) {
cmd := exec.CommandContext(ctx, "kubectl", "get", "secret",
"-n", "mysql",
"mysql-root-password",
"-o", "jsonpath={.data.password}")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("failed to get secret: %w", err)
}
// Decode base64
decoded, err := base64.StdEncoding.DecodeString(string(output))
if err != nil {
return "", fmt.Errorf("failed to decode password: %w", err)
}
return string(decoded), nil
}
Acceptance Criteria:
- Finds mysql pod correctly
- Retrieves password from secret
- Executes mysqldump successfully
- Creates .sql file with actual data
- Handles errors gracefully
Estimated Effort: 3 hours
Task 1.4: Implement PVC Discovery and Backup
File: wild-central-api/internal/backup/backup.go
Implementation:
func (m *Manager) findAppPVCs(ctx context.Context, appName string) ([]string, error) {
// Get namespace for app (convention: app name)
namespace := appName
cmd := exec.CommandContext(ctx, "kubectl", "get", "pvc",
"-n", namespace,
"-o", "jsonpath={.items[*].metadata.name}")
output, err := cmd.Output()
if err != nil {
return nil, fmt.Errorf("kubectl get pvc failed: %w", err)
}
pvcNames := strings.Fields(string(output))
return pvcNames, nil
}
func (m *Manager) backupPVC(ctx context.Context, namespace, pvcName, backupDir string) (string, error) {
// Find pod using this PVC
podName, err := m.findPodUsingPVC(ctx, namespace, pvcName)
if err != nil {
return "", fmt.Errorf("no pod found using PVC %s: %w", pvcName, err)
}
// Get mount path for PVC
mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName)
if err != nil {
return "", fmt.Errorf("failed to get mount path: %w", err)
}
// Create tar archive of PVC data
tarFile := filepath.Join(backupDir, fmt.Sprintf("%s.tar.gz", pvcName))
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
podName, "--", "tar", "czf", "-", "-C", mountPath, ".")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("tar command failed: %w", err)
}
if err := os.WriteFile(tarFile, output, 0600); err != nil {
return "", fmt.Errorf("failed to write tar file: %w", err)
}
return tarFile, nil
}
func (m *Manager) findPodUsingPVC(ctx context.Context, namespace, pvcName string) (string, error) {
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
"-n", namespace,
"-o", "json")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("kubectl get pods failed: %w", err)
}
// Parse JSON to find pod using this PVC
var podList struct {
Items []struct {
Metadata struct {
Name string `json:"name"`
} `json:"metadata"`
Spec struct {
Volumes []struct {
PersistentVolumeClaim *struct {
ClaimName string `json:"claimName"`
} `json:"persistentVolumeClaim"`
} `json:"volumes"`
} `json:"spec"`
} `json:"items"`
}
if err := json.Unmarshal(output, &podList); err != nil {
return "", fmt.Errorf("failed to parse pod list: %w", err)
}
for _, pod := range podList.Items {
for _, volume := range pod.Spec.Volumes {
if volume.PersistentVolumeClaim != nil &&
volume.PersistentVolumeClaim.ClaimName == pvcName {
return pod.Metadata.Name, nil
}
}
}
return "", fmt.Errorf("no pod found using PVC %s", pvcName)
}
func (m *Manager) getPVCMountPath(ctx context.Context, namespace, podName, pvcName string) (string, error) {
cmd := exec.CommandContext(ctx, "kubectl", "get", "pod",
"-n", namespace,
podName,
"-o", "json")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("kubectl get pod failed: %w", err)
}
var pod struct {
Spec struct {
Volumes []struct {
Name string `json:"name"`
PersistentVolumeClaim *struct {
ClaimName string `json:"claimName"`
} `json:"persistentVolumeClaim"`
} `json:"volumes"`
Containers []struct {
VolumeMounts []struct {
Name string `json:"name"`
MountPath string `json:"mountPath"`
} `json:"volumeMounts"`
} `json:"containers"`
} `json:"spec"`
}
if err := json.Unmarshal(output, &pod); err != nil {
return "", fmt.Errorf("failed to parse pod: %w", err)
}
// Find volume name for PVC
var volumeName string
for _, volume := range pod.Spec.Volumes {
if volume.PersistentVolumeClaim != nil &&
volume.PersistentVolumeClaim.ClaimName == pvcName {
volumeName = volume.Name
break
}
}
if volumeName == "" {
return "", fmt.Errorf("PVC %s not found in pod volumes", pvcName)
}
// Find mount path for volume
for _, container := range pod.Spec.Containers {
for _, mount := range container.VolumeMounts {
if mount.Name == volumeName {
return mount.MountPath, nil
}
}
}
return "", fmt.Errorf("mount path not found for volume %s", volumeName)
}
Acceptance Criteria:
- Discovers PVCs in app namespace
- Finds pod using PVC
- Gets correct mount path
- Creates tar.gz with actual data
- Handles multiple PVCs
- Integration test: backup Immich PVCs
Estimated Effort: 4 hours
Task 1.5: Update BackupApp Flow
File: wild-central-api/internal/backup/backup.go
Replace BackupApp function (complete rewrite):
func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
defer cancel()
// Create timestamped backup directory
timestamp := time.Now().UTC().Format("20060102T150405Z")
stagingDir := filepath.Join(m.dataDir, "instances", instanceName, "backups", "staging")
backupDir := filepath.Join(stagingDir, "apps", appName, timestamp)
if err := os.MkdirAll(backupDir, 0755); err != nil {
return nil, fmt.Errorf("failed to create backup directory: %w", err)
}
// Initialize backup info with in_progress status
info := &BackupInfo{
Type: "app",
AppName: appName,
Status: "in_progress",
CreatedAt: time.Now().UTC().Format(time.RFC3339),
Files: []string{},
}
// Save initial metadata
if err := m.saveBackupMetadata(backupDir, info); err != nil {
return nil, fmt.Errorf("failed to save initial metadata: %w", err)
}
// Read app dependencies from manifest
deps, err := m.getAppDependencies(appName)
if err != nil {
info.Status = "failed"
info.Error = fmt.Sprintf("Failed to read manifest: %v", err)
m.saveBackupMetadata(backupDir, info)
return info, err
}
var backupFiles []string
// Backup PostgreSQL if required
if deps.HasPostgres {
file, err := m.backupPostgres(ctx, instanceName, appName, backupDir)
if err != nil {
info.Status = "failed"
info.Error = fmt.Sprintf("PostgreSQL backup failed: %v", err)
m.saveBackupMetadata(backupDir, info)
return info, err
}
backupFiles = append(backupFiles, file)
}
// Backup MySQL if required
if deps.HasMySQL {
file, err := m.backupMySQL(ctx, instanceName, appName, backupDir)
if err != nil {
info.Status = "failed"
info.Error = fmt.Sprintf("MySQL backup failed: %v", err)
m.saveBackupMetadata(backupDir, info)
return info, err
}
backupFiles = append(backupFiles, file)
}
// Discover and backup PVCs
pvcNames, err := m.findAppPVCs(ctx, appName)
if err != nil {
// Log warning but don't fail if no PVCs found
log.Printf("Warning: failed to find PVCs for %s: %v", appName, err)
} else {
for _, pvcName := range pvcNames {
file, err := m.backupPVC(ctx, appName, pvcName, backupDir)
if err != nil {
log.Printf("Warning: failed to backup PVC %s: %v", pvcName, err)
continue
}
backupFiles = append(backupFiles, file)
}
}
// Calculate total backup size
var totalSize int64
for _, file := range backupFiles {
stat, err := os.Stat(file)
if err == nil {
totalSize += stat.Size()
}
}
// Update final metadata
info.Status = "completed"
info.Files = backupFiles
info.Size = totalSize
info.Error = ""
if err := m.saveBackupMetadata(backupDir, info); err != nil {
return info, fmt.Errorf("failed to save final metadata: %w", err)
}
return info, nil
}
func (m *Manager) saveBackupMetadata(backupDir string, info *BackupInfo) error {
metadataFile := filepath.Join(backupDir, "backup.json")
data, err := json.MarshalIndent(info, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal metadata: %w", err)
}
return os.WriteFile(metadataFile, data, 0644)
}
Acceptance Criteria:
- Creates timestamped backup directories
- Reads manifest to detect dependencies
- Backs up databases if present
- Backs up PVCs if present
- Calculates accurate backup size
- Saves complete metadata
- Handles errors gracefully
- Integration test: Full Gitea backup
Estimated Effort: 4 hours
Task 1.6: Build and Test
Steps:
- Build wild-central-api
- Deploy to test environment
- Test Gitea backup (PostgreSQL + PVC)
- Test Immich backup (PostgreSQL + multiple PVCs)
- Verify backup files exist and have data
- Verify metadata accuracy
- Test manual restore
Acceptance Criteria:
- All builds succeed
- App backups create actual files
- Metadata is accurate
- Manual restore works
Estimated Effort: 4 hours
Phase 2: Restic Integration
Goal
Upload staged backups to restic repository with flexible backends.
Priority
🟡 HIGH PRIORITY (after Phase 1 complete)
Timeline
5-7 days
Prerequisites
- Phase 1 completed and tested
- Restic installed on Wild Central device
- Backup destination configured (S3, B2, local, etc.)
Task 2.1: Configuration Management
File: wild-central-api/internal/backup/config.go (new file)
Implementation:
package backup
import (
"fmt"
"strings"
"github.com/wild-cloud/wild-central/daemon/internal/config"
)
type BackupConfig struct {
Repository string
Staging string
Retention RetentionPolicy
Backend BackendConfig
}
type RetentionPolicy struct {
KeepDaily int
KeepWeekly int
KeepMonthly int
KeepYearly int
}
type BackendConfig struct {
Type string
Endpoint string
Region string
Port int
}
type BackupSecrets struct {
Password string
Credentials BackendCredentials
}
type BackendCredentials struct {
S3 *S3Credentials
SFTP *SFTPCredentials
Azure *AzureCredentials
GCS *GCSCredentials
}
type S3Credentials struct {
AccessKeyID string
SecretAccessKey string
}
type SFTPCredentials struct {
Password string
PrivateKey string
}
type AzureCredentials struct {
AccountName string
AccountKey string
}
type GCSCredentials struct {
ProjectID string
ServiceAccountKey string
}
func LoadBackupConfig(instanceName string) (*BackupConfig, *BackupSecrets, error) {
cfg, err := config.LoadInstanceConfig(instanceName)
if err != nil {
return nil, nil, fmt.Errorf("failed to load config: %w", err)
}
secrets, err := config.LoadInstanceSecrets(instanceName)
if err != nil {
return nil, nil, fmt.Errorf("failed to load secrets: %w", err)
}
backupCfg := &BackupConfig{
Repository: cfg.Cloud.Backup.Repository,
Staging: cfg.Cloud.Backup.Staging,
Retention: RetentionPolicy{
KeepDaily: cfg.Cloud.Backup.Retention.KeepDaily,
KeepWeekly: cfg.Cloud.Backup.Retention.KeepWeekly,
KeepMonthly: cfg.Cloud.Backup.Retention.KeepMonthly,
KeepYearly: cfg.Cloud.Backup.Retention.KeepYearly,
},
Backend: BackendConfig{
Type: DetectBackendType(cfg.Cloud.Backup.Repository),
Endpoint: cfg.Cloud.Backup.Backend.Endpoint,
Region: cfg.Cloud.Backup.Backend.Region,
Port: cfg.Cloud.Backup.Backend.Port,
},
}
backupSecrets := &BackupSecrets{
Password: secrets.Cloud.Backup.Password,
Credentials: BackendCredentials{
S3: secrets.Cloud.Backup.Credentials.S3,
SFTP: secrets.Cloud.Backup.Credentials.SFTP,
Azure: secrets.Cloud.Backup.Credentials.Azure,
GCS: secrets.Cloud.Backup.Credentials.GCS,
},
}
return backupCfg, backupSecrets, nil
}
func DetectBackendType(repository string) string {
if strings.HasPrefix(repository, "/") {
return "local"
} else if strings.HasPrefix(repository, "sftp:") {
return "sftp"
} else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") {
return "s3"
} else if strings.HasPrefix(repository, "azure:") {
return "azure"
} else if strings.HasPrefix(repository, "gs:") {
return "gcs"
} else if strings.HasPrefix(repository, "rclone:") {
return "rclone"
}
return "unknown"
}
func ValidateBackupConfig(cfg *BackupConfig, secrets *BackupSecrets) error {
if cfg.Repository == "" {
return fmt.Errorf("repository is required")
}
if secrets.Password == "" {
return fmt.Errorf("repository password is required")
}
// Validate backend-specific credentials
switch cfg.Backend.Type {
case "s3":
if secrets.Credentials.S3 == nil {
return fmt.Errorf("S3 credentials required for S3 backend")
}
if secrets.Credentials.S3.AccessKeyID == "" || secrets.Credentials.S3.SecretAccessKey == "" {
return fmt.Errorf("S3 access key and secret key required")
}
case "sftp":
if secrets.Credentials.SFTP == nil {
return fmt.Errorf("SFTP credentials required for SFTP backend")
}
if secrets.Credentials.SFTP.Password == "" && secrets.Credentials.SFTP.PrivateKey == "" {
return fmt.Errorf("SFTP password or private key required")
}
case "azure":
if secrets.Credentials.Azure == nil {
return fmt.Errorf("Azure credentials required for Azure backend")
}
if secrets.Credentials.Azure.AccountName == "" || secrets.Credentials.Azure.AccountKey == "" {
return fmt.Errorf("Azure account name and key required")
}
case "gcs":
if secrets.Credentials.GCS == nil {
return fmt.Errorf("GCS credentials required for GCS backend")
}
if secrets.Credentials.GCS.ServiceAccountKey == "" {
return fmt.Errorf("GCS service account key required")
}
}
return nil
}
Estimated Effort: 3 hours
Task 2.2: Restic Operations Module
File: wild-central-api/internal/backup/restic.go (new file)
Implementation:
package backup
import (
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
"strings"
)
type ResticClient struct {
config *BackupConfig
secrets *BackupSecrets
}
func NewResticClient(config *BackupConfig, secrets *BackupSecrets) *ResticClient {
return &ResticClient{
config: config,
secrets: secrets,
}
}
func (r *ResticClient) buildEnv() map[string]string {
env := map[string]string{
"RESTIC_REPOSITORY": r.config.Repository,
"RESTIC_PASSWORD": r.secrets.Password,
}
switch r.config.Backend.Type {
case "s3":
if r.secrets.Credentials.S3 != nil {
env["AWS_ACCESS_KEY_ID"] = r.secrets.Credentials.S3.AccessKeyID
env["AWS_SECRET_ACCESS_KEY"] = r.secrets.Credentials.S3.SecretAccessKey
}
if r.config.Backend.Endpoint != "" {
env["AWS_S3_ENDPOINT"] = r.config.Backend.Endpoint
}
if r.config.Backend.Region != "" {
env["AWS_DEFAULT_REGION"] = r.config.Backend.Region
}
case "sftp":
if r.secrets.Credentials.SFTP != nil && r.secrets.Credentials.SFTP.Password != "" {
env["RESTIC_SFTP_PASSWORD"] = r.secrets.Credentials.SFTP.Password
}
case "azure":
if r.secrets.Credentials.Azure != nil {
env["AZURE_ACCOUNT_NAME"] = r.secrets.Credentials.Azure.AccountName
env["AZURE_ACCOUNT_KEY"] = r.secrets.Credentials.Azure.AccountKey
}
}
return env
}
func (r *ResticClient) Init(ctx context.Context) error {
cmd := exec.CommandContext(ctx, "restic", "init")
// Set environment variables
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("restic init failed: %w: %s", err, string(output))
}
return nil
}
func (r *ResticClient) Backup(ctx context.Context, path string, tags []string) (string, error) {
args := []string{"backup", path}
for _, tag := range tags {
args = append(args, "--tag", tag)
}
cmd := exec.CommandContext(ctx, "restic", args...)
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err := cmd.CombinedOutput()
if err != nil {
return "", fmt.Errorf("restic backup failed: %w: %s", err, string(output))
}
// Parse snapshot ID from output
snapshotID := r.parseSnapshotID(string(output))
return snapshotID, nil
}
func (r *ResticClient) ListSnapshots(ctx context.Context, tags []string) ([]Snapshot, error) {
args := []string{"snapshots", "--json"}
for _, tag := range tags {
args = append(args, "--tag", tag)
}
cmd := exec.CommandContext(ctx, "restic", args...)
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err := cmd.Output()
if err != nil {
return nil, fmt.Errorf("restic snapshots failed: %w", err)
}
var snapshots []Snapshot
if err := json.Unmarshal(output, &snapshots); err != nil {
return nil, fmt.Errorf("failed to parse snapshots: %w", err)
}
return snapshots, nil
}
func (r *ResticClient) Restore(ctx context.Context, snapshotID, targetPath string) error {
cmd := exec.CommandContext(ctx, "restic", "restore", snapshotID, "--target", targetPath)
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("restic restore failed: %w: %s", err, string(output))
}
return nil
}
func (r *ResticClient) Stats(ctx context.Context) (*RepositoryStats, error) {
cmd := exec.CommandContext(ctx, "restic", "stats", "--json")
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err := cmd.Output()
if err != nil {
return nil, fmt.Errorf("restic stats failed: %w", err)
}
var stats RepositoryStats
if err := json.Unmarshal(output, &stats); err != nil {
return nil, fmt.Errorf("failed to parse stats: %w", err)
}
return &stats, nil
}
func (r *ResticClient) TestConnection(ctx context.Context) error {
cmd := exec.CommandContext(ctx, "restic", "cat", "config")
cmd.Env = os.Environ()
for k, v := range r.buildEnv() {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
_, err := cmd.Output()
if err != nil {
return fmt.Errorf("connection test failed: %w", err)
}
return nil
}
func (r *ResticClient) parseSnapshotID(output string) string {
lines := strings.Split(output, "\n")
for _, line := range lines {
if strings.Contains(line, "snapshot") && strings.Contains(line, "saved") {
parts := strings.Fields(line)
for i, part := range parts {
if part == "snapshot" && i+1 < len(parts) {
return parts[i+1]
}
}
}
}
return ""
}
type Snapshot struct {
ID string `json:"id"`
Time string `json:"time"`
Hostname string `json:"hostname"`
Tags []string `json:"tags"`
Paths []string `json:"paths"`
}
type RepositoryStats struct {
TotalSize int64 `json:"total_size"`
TotalFileCount int64 `json:"total_file_count"`
SnapshotCount int `json:"snapshot_count"`
}
Estimated Effort: 4 hours
Task 2.3: Update Backup Flow to Upload to Restic
File: wild-central-api/internal/backup/backup.go
Modify BackupApp function to add restic upload after staging:
func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) {
// ... existing Phase 1 code to create local backup ...
// After local backup succeeds, upload to restic if configured
cfg, secrets, err := LoadBackupConfig(instanceName)
if err == nil && cfg.Repository != "" {
// Restic is configured, upload backup
client := NewResticClient(cfg, secrets)
tags := []string{
fmt.Sprintf("type:app"),
fmt.Sprintf("app:%s", appName),
fmt.Sprintf("instance:%s", instanceName),
}
snapshotID, err := client.Backup(ctx, backupDir, tags)
if err != nil {
log.Printf("Warning: restic upload failed: %v", err)
// Don't fail the backup, local files still exist
} else {
info.SnapshotID = snapshotID
// Clean up staging directory after successful upload
if err := os.RemoveAll(backupDir); err != nil {
log.Printf("Warning: failed to clean staging directory: %v", err)
}
}
}
// Save final metadata
if err := m.saveBackupMetadata(backupDir, info); err != nil {
return info, fmt.Errorf("failed to save final metadata: %w", err)
}
return info, nil
}
Estimated Effort: 2 hours
Task 2.4: API Client Updates
File: wild-web-app/src/services/api/backups.ts
Add configuration endpoints:
export interface BackupConfiguration {
repository: string;
staging: string;
retention: {
keepDaily: number;
keepWeekly: number;
keepMonthly: number;
keepYearly: number;
};
backend: {
type: string;
endpoint?: string;
region?: string;
port?: number;
};
}
export interface BackupConfigurationWithCredentials extends BackupConfiguration {
password: string;
credentials?: {
s3?: {
accessKeyId: string;
secretAccessKey: string;
};
sftp?: {
password?: string;
privateKey?: string;
};
azure?: {
accountName: string;
accountKey: string;
};
gcs?: {
projectId: string;
serviceAccountKey: string;
};
};
}
export interface RepositoryStatus {
initialized: boolean;
reachable: boolean;
lastBackup?: string;
snapshotCount: number;
}
export interface RepositoryStats {
repositorySize: number;
repositorySizeHuman: string;
snapshotCount: number;
fileCount: number;
uniqueChunks: number;
compressionRatio: number;
oldestSnapshot?: string;
latestSnapshot?: string;
}
export async function getBackupConfiguration(
instanceId: string
): Promise<{ config: BackupConfiguration; status: RepositoryStatus }> {
const response = await api.get(`/instances/${instanceId}/backup/config`);
return response.data;
}
export async function updateBackupConfiguration(
instanceId: string,
config: BackupConfigurationWithCredentials
): Promise<void> {
await api.put(`/instances/${instanceId}/backup/config`, config);
}
export async function testBackupConnection(
instanceId: string,
config: BackupConfigurationWithCredentials
): Promise<RepositoryStatus> {
const response = await api.post(`/instances/${instanceId}/backup/test`, config);
return response.data;
}
export async function initializeBackupRepository(
instanceId: string,
config: BackupConfigurationWithCredentials
): Promise<{ repositoryId: string }> {
const response = await api.post(`/instances/${instanceId}/backup/init`, config);
return response.data;
}
export async function getRepositoryStats(
instanceId: string
): Promise<RepositoryStats> {
const response = await api.get(`/instances/${instanceId}/backup/stats`);
return response.data;
}
Estimated Effort: 2 hours
Task 2.5: Configuration UI Components
Create the following components in wild-web-app/src/components/backup/:
BackupConfigurationCard.tsx:
- Main configuration form
- Backend type selector
- Conditional credential inputs
- Retention policy inputs
- Test/Save/Cancel buttons
BackendSelector.tsx:
- Dropdown for backend types
- Shows available backends with icons
CredentialsForm.tsx:
- Dynamic form based on selected backend
- Password/key inputs with visibility toggle
- Validation
RepositoryStatus.tsx:
- Display repository health
- Show stats (size, snapshots, last backup)
- Visual indicators
RetentionPolicyInputs.tsx:
- Number inputs for retention periods
- Tooltips explaining each period
Estimated Effort: 8 hours
Task 2.6: Integrate with BackupsPage
File: wild-web-app/src/router/pages/BackupsPage.tsx
Add configuration section above backup list:
function BackupsPage() {
const { instanceId } = useParams();
const [showConfig, setShowConfig] = useState(false);
const { data: backupConfig } = useQuery({
queryKey: ['backup-config', instanceId],
queryFn: () => getBackupConfiguration(instanceId),
});
return (
<div className="space-y-6">
{/* Repository Status Card */}
{backupConfig && (
<RepositoryStatus
status={backupConfig.status}
onEditClick={() => setShowConfig(true)}
/>
)}
{/* Configuration Card (conditional) */}
{showConfig && (
<BackupConfigurationCard
instanceId={instanceId}
currentConfig={backupConfig?.config}
onSave={() => setShowConfig(false)}
onCancel={() => setShowConfig(false)}
/>
)}
{/* Existing backup list */}
<BackupList instanceId={instanceId} />
</div>
);
}
Estimated Effort: 3 hours
Task 2.7: Backup Configuration API Handlers
File: wild-central-api/internal/api/v1/handlers_backup.go
Add new handlers:
func (h *Handler) BackupConfigGet(c *gin.Context) {
instanceName := c.Param("name")
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
// Test repository status
var status backup.RepositoryStatus
if cfg.Repository != "" {
client := backup.NewResticClient(cfg, secrets)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
status.Initialized = true
status.Reachable = client.TestConnection(ctx) == nil
if stats, err := client.Stats(ctx); err == nil {
status.SnapshotCount = stats.SnapshotCount
}
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"data": gin.H{
"config": cfg,
"status": status,
},
})
}
func (h *Handler) BackupConfigUpdate(c *gin.Context) {
instanceName := c.Param("name")
var req backup.BackupConfigurationWithCredentials
if err := c.BindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
// Validate configuration
if err := backup.ValidateBackupConfig(&req.Config, &req.Secrets); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
// Save to config.yaml and secrets.yaml
if err := config.SaveBackupConfig(instanceName, &req); err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"message": "Backup configuration updated successfully",
})
}
func (h *Handler) BackupConnectionTest(c *gin.Context) {
var req backup.BackupConfigurationWithCredentials
if err := c.BindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
client := backup.NewResticClient(&req.Config, &req.Secrets)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
status := backup.RepositoryStatus{
Reachable: client.TestConnection(ctx) == nil,
}
if status.Reachable {
if stats, err := client.Stats(ctx); err == nil {
status.Initialized = true
status.SnapshotCount = stats.SnapshotCount
}
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"data": status,
})
}
func (h *Handler) BackupRepositoryInit(c *gin.Context) {
var req backup.BackupConfigurationWithCredentials
if err := c.BindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
client := backup.NewResticClient(&req.Config, &req.Secrets)
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := client.Init(ctx); err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"message": "Repository initialized successfully",
})
}
func (h *Handler) BackupStatsGet(c *gin.Context) {
instanceName := c.Param("name")
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
client := backup.NewResticClient(cfg, secrets)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
stats, err := client.Stats(ctx)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"data": stats,
})
}
Register routes:
backupGroup := v1.Group("/instances/:name/backup")
{
backupGroup.GET("/config", h.BackupConfigGet)
backupGroup.PUT("/config", h.BackupConfigUpdate)
backupGroup.POST("/test", h.BackupConnectionTest)
backupGroup.POST("/init", h.BackupRepositoryInit)
backupGroup.GET("/stats", h.BackupStatsGet)
}
Estimated Effort: 4 hours
Task 2.8: End-to-End Testing
Test scenarios:
- Configure local repository via UI
- Configure S3 repository via UI
- Test connection validation
- Create backup and verify upload
- Check repository stats
- Test error handling
Estimated Effort: 4 hours
Phase 3: Restore from Restic
Goal
Enable users to restore backups from restic snapshots.
Priority
🟢 MEDIUM PRIORITY (after Phase 2 complete)
Timeline
3-5 days
Task 3.1: List Snapshots API
File: wild-central-api/internal/api/v1/handlers_backup.go
Implementation:
func (h *Handler) BackupSnapshotsList(c *gin.Context) {
instanceName := c.Param("name")
appName := c.Query("app")
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
client := backup.NewResticClient(cfg, secrets)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
var tags []string
if appName != "" {
tags = append(tags, fmt.Sprintf("app:%s", appName))
}
snapshots, err := client.ListSnapshots(ctx, tags)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
c.JSON(http.StatusOK, gin.H{
"success": true,
"data": snapshots,
})
}
Estimated Effort: 2 hours
Task 3.2: Restore Snapshot Function
File: wild-central-api/internal/backup/backup.go
Implementation:
func (m *Manager) RestoreFromSnapshot(instanceName, snapshotID string) error {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
defer cancel()
// Load restic config
cfg, secrets, err := LoadBackupConfig(instanceName)
if err != nil {
return fmt.Errorf("failed to load config: %w", err)
}
client := NewResticClient(cfg, secrets)
// Create temp directory for restore
tempDir := filepath.Join(cfg.Staging, "restore", snapshotID)
if err := os.MkdirAll(tempDir, 0755); err != nil {
return fmt.Errorf("failed to create temp directory: %w", err)
}
defer os.RemoveAll(tempDir)
// Restore snapshot to temp directory
if err := client.Restore(ctx, snapshotID, tempDir); err != nil {
return fmt.Errorf("restic restore failed: %w", err)
}
// Parse metadata to determine what to restore
metadataFile := filepath.Join(tempDir, "backup.json")
info, err := m.loadBackupMetadata(metadataFile)
if err != nil {
return fmt.Errorf("failed to load metadata: %w", err)
}
// Restore databases
for _, file := range info.Files {
if strings.HasSuffix(file, "postgres.sql") {
if err := m.restorePostgres(ctx, info.AppName, filepath.Join(tempDir, "postgres.sql")); err != nil {
return fmt.Errorf("postgres restore failed: %w", err)
}
} else if strings.HasSuffix(file, "mysql.sql") {
if err := m.restoreMySQL(ctx, info.AppName, filepath.Join(tempDir, "mysql.sql")); err != nil {
return fmt.Errorf("mysql restore failed: %w", err)
}
}
}
// Restore PVCs
for _, file := range info.Files {
if strings.HasSuffix(file, ".tar.gz") {
pvcName := strings.TrimSuffix(filepath.Base(file), ".tar.gz")
if err := m.restorePVC(ctx, info.AppName, pvcName, filepath.Join(tempDir, file)); err != nil {
return fmt.Errorf("pvc restore failed: %w", err)
}
}
}
return nil
}
func (m *Manager) restorePostgres(ctx context.Context, appName, dumpFile string) error {
dbName := appName
podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres")
if err != nil {
return fmt.Errorf("postgres pod not found: %w", err)
}
// Drop and recreate database
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres",
podName, "--", "psql", "-U", "postgres", "-c",
fmt.Sprintf("DROP DATABASE IF EXISTS %s; CREATE DATABASE %s;", dbName, dbName))
if err := cmd.Run(); err != nil {
return fmt.Errorf("failed to recreate database: %w", err)
}
// Restore dump
dumpData, err := os.ReadFile(dumpFile)
if err != nil {
return fmt.Errorf("failed to read dump: %w", err)
}
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-i", "-n", "postgres",
podName, "--", "psql", "-U", "postgres", dbName)
cmd.Stdin = strings.NewReader(string(dumpData))
if err := cmd.Run(); err != nil {
return fmt.Errorf("psql restore failed: %w", err)
}
return nil
}
func (m *Manager) restoreMySQL(ctx context.Context, appName, dumpFile string) error {
// Similar implementation to restorePostgres
// Use mysqldump with password from secret
return nil
}
func (m *Manager) restorePVC(ctx context.Context, namespace, pvcName, tarFile string) error {
podName, err := m.findPodUsingPVC(ctx, namespace, pvcName)
if err != nil {
return fmt.Errorf("no pod found using PVC: %w", err)
}
mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName)
if err != nil {
return fmt.Errorf("failed to get mount path: %w", err)
}
// Copy tar file to pod
cmd := exec.CommandContext(ctx, "kubectl", "cp", tarFile,
fmt.Sprintf("%s/%s:/tmp/restore.tar.gz", namespace, podName))
if err := cmd.Run(); err != nil {
return fmt.Errorf("kubectl cp failed: %w", err)
}
// Extract tar file
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
podName, "--", "tar", "xzf", "/tmp/restore.tar.gz", "-C", mountPath)
if err := cmd.Run(); err != nil {
return fmt.Errorf("tar extract failed: %w", err)
}
// Clean up temp file
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
podName, "--", "rm", "/tmp/restore.tar.gz")
cmd.Run() // Ignore error
return nil
}
Estimated Effort: 5 hours
Task 3.3: Restore API Handler
File: wild-central-api/internal/api/v1/handlers_backup.go
Implementation:
func (h *Handler) BackupSnapshotRestore(c *gin.Context) {
instanceName := c.Param("name")
snapshotID := c.Param("snapshotId")
// Start restore operation asynchronously
go func() {
if err := h.backupManager.RestoreFromSnapshot(instanceName, snapshotID); err != nil {
log.Printf("Restore failed: %v", err)
}
}()
c.JSON(http.StatusAccepted, gin.H{
"success": true,
"message": "Restore operation started",
})
}
Estimated Effort: 1 hour
Task 3.4: Restore UI
File: wild-web-app/src/components/backup/RestoreDialog.tsx
Implementation: Create dialog that:
- Lists available snapshots
- Shows snapshot details (date, size, files)
- Confirmation before restore
- Progress indicator
Estimated Effort: 4 hours
Task 3.5: End-to-End Restore Testing
Test scenarios:
- List snapshots for app
- Select snapshot to restore
- Restore database
- Restore PVCs
- Verify application works after restore
- Test error handling
Estimated Effort: 3 hours
API Specifications
Complete API Reference
# Backup Operations
POST /api/v1/instances/{name}/backups/app/{appName} # Create app backup
POST /api/v1/instances/{name}/backups/cluster # Create cluster backup
GET /api/v1/instances/{name}/backups/app # List app backups
GET /api/v1/instances/{name}/backups/cluster # List cluster backups
DELETE /api/v1/instances/{name}/backups/app/{appName}/{id} # Delete app backup
DELETE /api/v1/instances/{name}/backups/cluster/{id} # Delete cluster backup
# Backup Configuration (Phase 2)
GET /api/v1/instances/{name}/backup/config # Get backup configuration
PUT /api/v1/instances/{name}/backup/config # Update configuration
POST /api/v1/instances/{name}/backup/test # Test connection
POST /api/v1/instances/{name}/backup/init # Initialize repository
GET /api/v1/instances/{name}/backup/stats # Get repository stats
# Restore Operations (Phase 3)
GET /api/v1/instances/{name}/backup/snapshots # List snapshots
POST /api/v1/instances/{name}/backup/snapshots/{id}/restore # Restore snapshot
Web UI Design
Page Structure
BackupsPage Layout:
┌─────────────────────────────────────────────────┐
│ Backups │
├─────────────────────────────────────────────────┤
│ │
│ ┌─ Backup Status ─────────────────────────┐ │
│ │ Repository: Configured ✓ │ │
│ │ Last Backup: 2 hours ago │ │
│ │ Total Size: 2.4 GB │ │
│ │ Snapshots: 24 │ │
│ │ [Edit Configuration] │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌─ Recent Backups ────────────────────────┐ │
│ │ [Backup cards with restore/delete] │ │
│ │ ... │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌─ Configuration (when editing) ──────────┐ │
│ │ Backend Type: [S3 ▼] │ │
│ │ Repository URI: [s3:bucket/path ] │ │
│ │ Credentials: │ │
│ │ Access Key ID: [••••••••••• ] │ │
│ │ Secret Key: [•••••••••••••••• ] │ │
│ │ Retention Policy: │ │
│ │ Daily: [7] Weekly: [4] Monthly: [6] │ │
│ │ [Test Connection] [Save] [Cancel] │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
Component Hierarchy
BackupsPage
├── BackupStatusCard (read-only)
│ ├── RepositoryStatus
│ ├── Stats (size, snapshots, last backup)
│ └── EditButton
│
├── BackupListSection
│ └── BackupCard[] (existing)
│
└── BackupConfigurationCard (conditional)
├── BackendTypeSelect
├── RepositoryUriInput
├── CredentialsSection
│ ├── S3CredentialsForm (conditional)
│ ├── SFTPCredentialsForm (conditional)
│ └── ...
├── RetentionPolicyInputs
└── ActionButtons
├── TestConnectionButton
├── SaveButton
└── CancelButton
Testing Strategy
Phase 1 Testing
Unit Tests:
- Manifest parsing
- Helper functions (contains, findPodInNamespace)
- Backup file creation
Integration Tests:
- End-to-end Gitea backup (PostgreSQL + PVC)
- End-to-end Immich backup (PostgreSQL + multiple PVCs)
- Backup with no database
- Backup with no PVCs
Manual Tests:
- Create backup via web UI
- Verify
.sqlfile exists with actual data - Verify
.tar.gzfiles exist with actual data - Check metadata accuracy
- Test delete functionality
Phase 2 Testing
Unit Tests:
- Backend type detection
- Environment variable mapping
- Configuration validation
Integration Tests:
- Repository initialization (local, S3, SFTP)
- Backup upload to restic
- Snapshot listing
- Stats retrieval
- Connection testing
Manual Tests:
- Configure local repository via UI
- Configure S3 repository via UI
- Test connection validation before save
- Create backup and verify in restic
- Check repository stats display
- Test error handling for bad credentials
Phase 3 Testing
Integration Tests:
- Restore database from snapshot
- Restore PVC from snapshot
- Full app restore
- Handle missing/corrupted snapshots
Manual Tests:
- List snapshots in UI
- Select and restore from snapshot
- Verify database data after restore
- Verify PVC data after restore
- Verify application functions correctly
Deployment Guide
Phase 1 Deployment
Preparation:
- Update wild-central-api code
- Build and test on development instance
- Verify backup files created with real data
- Test manual restore
Rollout:
- Deploy to staging environment
- Create test backups for multiple apps
- Verify all backup files exist
- Manually restore one backup to verify
- Deploy to production
Rollback Plan:
- Previous version still creates metadata files
- No breaking changes to backup structure
- Users can manually copy backup files if needed
Phase 2 Deployment
Preparation:
- Install restic on Wild Central devices:
apt install restic - Update wild-central-api with restic code
- Update wild-web-app with configuration UI
- Test on development with local repository
- Test with S3 and SFTP backends
Migration:
- Existing local backups remain accessible
- Users opt-in to restic by configuring repository
- Gradual migration: Phase 1 staging continues working
Rollout:
- Deploy backend API updates
- Deploy web UI updates
- Create user documentation with examples
- Provide migration guide for existing setups
Rollback Plan:
- Restic is optional: users can continue using local backups
- Configuration in config.yaml: easy to revert
- No data loss: existing backups preserved
Phase 3 Deployment
Preparation:
- Ensure Phase 2 is stable
- Ensure at least one backup exists in restic
- Test restore in staging environment
Rollout:
- Deploy restore functionality
- Document restore procedures
- Train users on restore process
Task Breakdown
Phase 1 Tasks (2-3 days)
| Task | Description | Effort | Dependencies |
|---|---|---|---|
| 1.1 | Manifest-based database detection | 2h | None |
| 1.2 | PostgreSQL backup via kubectl exec | 3h | 1.1 |
| 1.3 | MySQL backup via kubectl exec | 3h | 1.1 |
| 1.4 | PVC discovery and backup | 4h | 1.1 |
| 1.5 | Update BackupApp flow | 4h | 1.2, 1.3, 1.4 |
| 1.6 | Build and test | 4h | 1.5 |
Total: 20 hours (2.5 days)
Phase 2 Tasks (5-7 days)
| Task | Description | Effort | Dependencies |
|---|---|---|---|
| 2.1 | Configuration management | 3h | Phase 1 done |
| 2.2 | Restic operations module | 4h | 2.1 |
| 2.3 | Update backup flow for restic | 2h | 2.2 |
| 2.4 | API client updates | 2h | Phase 1 done |
| 2.5 | Configuration UI components | 8h | 2.4 |
| 2.6 | Integrate with BackupsPage | 3h | 2.5 |
| 2.7 | Backup configuration API handlers | 4h | 2.1, 2.2 |
| 2.8 | End-to-end testing | 4h | 2.3, 2.6, 2.7 |
Total: 30 hours (3.75 days)
Phase 3 Tasks (3-5 days)
| Task | Description | Effort | Dependencies |
|---|---|---|---|
| 3.1 | List snapshots API | 2h | Phase 2 done |
| 3.2 | Restore snapshot function | 5h | 3.1 |
| 3.3 | Restore API handler | 1h | 3.2 |
| 3.4 | Restore UI | 4h | 3.3 |
| 3.5 | End-to-end restore testing | 3h | 3.4 |
Total: 15 hours (2 days)
Grand Total
65 hours across 3 phases (8-12 days total)
Success Criteria
Phase 1 Success
- ✅ App backups create actual database dumps (
.sqlfiles) - ✅ App backups create actual PVC archives (
.tar.gzfiles) - ✅ Backup metadata accurately lists all files
- ✅ Backups organized in timestamped directories
- ✅ In-progress tracking works correctly
- ✅ Delete functionality works for both app and cluster backups
- ✅ No silent failures (clear error messages)
- ✅ Manual restore verified working
Phase 2 Success
- ✅ Users can configure restic repository via web UI
- ✅ Configuration persists to config.yaml/secrets.yaml
- ✅ Test connection validates before save
- ✅ Backups automatically upload to restic repository
- ✅ Repository stats display correctly in UI
- ✅ Local, S3, and SFTP backends supported and tested
- ✅ Clear error messages for authentication/connection failures
- ✅ Staging files cleaned after successful upload
Phase 3 Success
- ✅ Users can list available snapshots in UI
- ✅ Users can restore from any snapshot via UI
- ✅ Database restoration works correctly
- ✅ PVC restoration works correctly
- ✅ Application functional after restore
- ✅ Error handling for corrupted snapshots
Long-Term Metrics
- Storage Efficiency: Deduplication achieves 60-80% space savings
- Reliability: < 1% backup failures
- Performance: Backup TB-scale data in < 4 hours
- User Satisfaction: Backup/restore completes without support intervention
Dependencies and Prerequisites
External Dependencies
Restic (backup tool):
- Installation:
apt install restic - Version: >= 0.16.0 recommended
- License: BSD 2-Clause (compatible)
kubectl (Kubernetes CLI):
- Already required for Wild Cloud operations
- Used for database dumps and PVC backup
Infrastructure Prerequisites
Storage Requirements:
Staging Directory:
- Location:
/var/lib/wild-central/backup-staging(default) - Space:
max(largest_database, largest_pvc) + 20% buffer - Recommendation: Monitor space, warn if < 50GB free
Restic Repository:
- Local: Sufficient disk space on target mount
- Network: Mounted filesystem (NFS/SMB)
- Cloud: Typically unlimited, check quota/billing
Network Requirements:
- Outbound HTTPS (443) for S3/B2/cloud backends
- Outbound SSH (22 or custom) for SFTP
- No inbound ports needed
Security Considerations
Credentials Storage:
- Stored in secrets.yaml
- Never logged or exposed in API responses
- Transmitted only via HTTPS to backend APIs
Encryption:
- Restic: AES-256 encryption of all backup data
- Transport: TLS for cloud backends, SSH for SFTP
- At rest: Depends on backend (S3 server-side encryption, etc.)
Access Control:
- API endpoints check instance ownership
- Repository password required for all restic operations
- Backend credentials validated before save
Philosophy Compliance Review
KISS (Keep It Simple, Stupid)
✅ What We're Doing Right:
- Restic repository URI as simple string (native format)
- Backend type auto-detected from URI prefix
- Credentials organized by backend type
- No complex abstraction layers
✅ What We're Avoiding:
- Custom backup format
- Complex configuration DSL
- Over-abstracted backend interfaces
- Scheduling/automation (not needed yet)
YAGNI (You Aren't Gonna Need It)
✅ Building Only What's Needed:
- Basic configuration (repository, credentials, retention)
- Test connection before save
- Upload to restic after staging
- Display repository stats
❌ Not Building (until proven needed):
- Automated scheduling
- Multiple repository support
- Backup verification automation
- Email notifications
- Bandwidth limiting
- Custom encryption options
No Future-Proofing
✅ Current Requirements Only:
- Support TB-scale data (restic deduplication)
- Flexible storage destinations (restic backends)
- Storage constraints (upload to remote, not local-only)
❌ Not Speculating On:
- "What if users want backup versioning rules?"
- "What if users need bandwidth control?"
- "What if users want custom encryption?"
- Build these features WHEN users ask, not before
Trust in Emergence
✅ Starting Simple:
- Phase 1: Fix core backup (files actually created)
- Phase 2: Add restic upload (storage flexibility)
- Phase 3: Add restore from restic
- Phase 4+: Wait for user feedback
Let complexity emerge from actual needs, not speculation.
Conclusion
This complete implementation guide provides everything needed to implement a production-ready backup system for Wild Cloud across three phases:
- Phase 1 (CRITICAL): Fix broken app backups by creating actual database dumps and PVC archives using manifest-based detection and kubectl exec
- Phase 2 (HIGH): Integrate restic for TB-scale data, flexible storage backends, and configuration via web UI
- Phase 3 (MEDIUM): Enable restore from restic snapshots
All phases are designed following Wild Cloud's KISS/YAGNI philosophy: build only what's needed now, let complexity emerge from actual requirements, and trust that good architecture emerges from simplicity.
The implementation is ready for a senior engineer to begin Phase 1 immediately with all necessary context, specifications, code examples, and guidance provided.
Document Version: 1.0 Created: 2025-11-26 Status: Ready for implementation Next Action: Begin Phase 1, Task 1.1