2530 lines
72 KiB
Markdown
2530 lines
72 KiB
Markdown
# Wild Cloud Backup System - Complete Implementation Guide
|
|
|
|
**Date:** 2025-11-26
|
|
**Status:** 📋 READY FOR IMPLEMENTATION
|
|
**Estimated Effort:** Phase 1: 2-3 days | Phase 2: 5-7 days | Phase 3: 3-5 days
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Executive Summary](#executive-summary)
|
|
2. [Background and Context](#background-and-context)
|
|
3. [Problem Analysis](#problem-analysis)
|
|
4. [Architecture Overview](#architecture-overview)
|
|
5. [Configuration Design](#configuration-design)
|
|
6. [Phase 1: Core Backup Fix](#phase-1-core-backup-fix)
|
|
7. [Phase 2: Restic Integration](#phase-2-restic-integration)
|
|
8. [Phase 3: Restore from Restic](#phase-3-restore-from-restic)
|
|
9. [API Specifications](#api-specifications)
|
|
10. [Web UI Design](#web-ui-design)
|
|
11. [Testing Strategy](#testing-strategy)
|
|
12. [Deployment Guide](#deployment-guide)
|
|
13. [Task Breakdown](#task-breakdown)
|
|
14. [Success Criteria](#success-criteria)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
### Current State
|
|
App backups are completely broken - they create only metadata files (`backup.json`) without any actual backup data:
|
|
- ❌ No database dump files (`.sql`, `.dump`)
|
|
- ❌ No PVC archive files (`.tar.gz`)
|
|
- ❌ Users cannot restore from these "backups"
|
|
- ✅ Cluster backups work correctly (different code path)
|
|
|
|
### Root Cause
|
|
Database detection uses pod label-based discovery (`app=gitea` in `postgres` namespace), but database pods are shared infrastructure labeled `app=postgres`. Detection always returns empty, so no backups are created.
|
|
|
|
### Why This Matters
|
|
- **Scale**: Applications like Immich may host terabyte-scale photo libraries
|
|
- **Storage**: Wild Central devices may not have sufficient local storage
|
|
- **Flexibility**: Need flexible destinations: local, NFS, S3, Backblaze B2, SFTP, etc.
|
|
- **Deduplication**: Critical for TB-scale data (60-80% space savings)
|
|
|
|
### Solution: Three-Phase Approach
|
|
|
|
**Phase 1 (CRITICAL - 2-3 days)**: Fix broken app backups
|
|
- Manifest-based database detection (declarative)
|
|
- kubectl exec for database dumps
|
|
- PVC discovery and backup
|
|
- Store files locally in staging directory
|
|
|
|
**Phase 2 (HIGH PRIORITY - 5-7 days)**: Restic integration
|
|
- Upload staged files to restic repository
|
|
- Configuration via config.yaml and web UI
|
|
- Support multiple backends (local, S3, B2, SFTP)
|
|
- Repository initialization and testing
|
|
|
|
**Phase 3 (MEDIUM PRIORITY - 3-5 days)**: Restore from restic
|
|
- List available snapshots
|
|
- Restore from any snapshot
|
|
- Database and PVC restoration
|
|
- Web UI for restore operations
|
|
|
|
---
|
|
|
|
## Background and Context
|
|
|
|
### Project Philosophy
|
|
|
|
Wild Cloud follows strict KISS/YAGNI principles:
|
|
- **KISS**: Keep implementations as simple as possible
|
|
- **YAGNI**: Build only what's needed now, not speculative features
|
|
- **No future-proofing**: Let complexity emerge from actual requirements
|
|
- **Trust in emergence**: Start simple, enhance when requirements proven
|
|
|
|
### Key Design Decisions
|
|
|
|
1. **Manifest-based detection**: Read app dependencies from `manifest.yaml` (declarative), not runtime pod discovery
|
|
2. **kubectl exec approach**: Use standard Kubernetes operations for dumps and tar archives
|
|
3. **Restic for scale**: Use battle-tested restic tool for TB-scale data and flexible backends
|
|
4. **Phased implementation**: Fix core bugs first, add features incrementally
|
|
|
|
### Why Restic?
|
|
|
|
**Justified by actual requirements** (not premature optimization):
|
|
- **Scale**: Handle TB-scale data (Immich with terabytes of photos)
|
|
- **Flexibility**: Multiple backends (local, S3, B2, SFTP, Azure, GCS)
|
|
- **Efficiency**: 60-80% space savings via deduplication
|
|
- **Security**: Built-in AES-256 encryption
|
|
- **Reliability**: Battle-tested, widely adopted
|
|
- **Incremental**: Only backup changed blocks
|
|
|
|
---
|
|
|
|
## Problem Analysis
|
|
|
|
### Critical Bug: App Backups Create No Files
|
|
|
|
**Evidence** from `/home/payne/repos/wild-cloud-dev/.working/in-progress-fix.md`:
|
|
|
|
```
|
|
Backup structure:
|
|
apps/
|
|
└── gitea/
|
|
└── 20241124T143022Z/
|
|
└── backup.json ← Only this file exists!
|
|
|
|
Expected structure:
|
|
apps/
|
|
└── gitea/
|
|
└── 20241124T143022Z/
|
|
├── backup.json
|
|
├── postgres.sql ← Missing!
|
|
└── data.tar.gz ← Missing!
|
|
```
|
|
|
|
### Root Cause Analysis
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go` (lines 544-569)
|
|
|
|
```go
|
|
func (m *Manager) detectDatabaseType(ctx context.Context, namespace, appLabel string) (string, error) {
|
|
// This looks for pods with label "app=gitea" in namespace "postgres"
|
|
// But database pods are labeled "app=postgres" in namespace "postgres"
|
|
// This ALWAYS returns empty result!
|
|
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
|
|
"-n", namespace,
|
|
"-l", fmt.Sprintf("app=%s", appLabel), // ← Wrong label!
|
|
"-o", "jsonpath={.items[0].metadata.name}")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil || len(output) == 0 {
|
|
return "", nil // ← Returns empty, no backup created
|
|
}
|
|
// ...
|
|
}
|
|
```
|
|
|
|
**Why It's Broken**:
|
|
1. Gitea backup tries to find pod with label `app=gitea` in namespace `postgres`
|
|
2. But PostgreSQL pod is labeled `app=postgres` in namespace `postgres`
|
|
3. Detection always fails → no database dump created
|
|
4. Same problem for PVC detection → no PVC archive created
|
|
5. Only `backup.json` metadata file is written
|
|
|
|
### Why Cluster Backups Work
|
|
|
|
Cluster backups don't use app-specific detection:
|
|
- Directly use `kubectl get` to find etcd pods
|
|
- Use hardcoded paths for config files
|
|
- Don't rely on app-based pod labels
|
|
- Actually create `.tar.gz` files with real data
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
### System Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Wild Cloud Backup System │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─ Web UI (wild-web-app) ─────────────────────┐ │
|
|
│ │ - Backup configuration form │ │
|
|
│ │ - Repository status display │ │
|
|
│ │ - Backup creation/restore UI │ │
|
|
│ └──────────────────┬───────────────────────────┘ │
|
|
│ │ REST API │
|
|
│ ┌─ API Layer (wild-central-api) ───────────────┐ │
|
|
│ │ - Backup configuration endpoints │ │
|
|
│ │ - Backup/restore operation handlers │ │
|
|
│ │ - Restic integration layer │ │
|
|
│ └──────────────────┬───────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌─ Backup Engine ────────────────────────────┐ │
|
|
│ │ - Manifest parser │ │
|
|
│ │ - Database backup (kubectl exec pg_dump) │ │
|
|
│ │ - PVC backup (kubectl exec tar) │ │
|
|
│ │ - Restic upload (Phase 2) │ │
|
|
│ └──────────────────┬───────────────────────────┘ │
|
|
│ │ │
|
|
│ ┌─ Storage Layer ────────────────────────────┐ │
|
|
│ │ Phase 1: Local staging directory │ │
|
|
│ │ Phase 2: Restic repository (local/remote) │ │
|
|
│ └─────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
**Phase 1 (Local Staging)**:
|
|
```
|
|
User clicks "Backup" → API Handler
|
|
↓
|
|
Read manifest.yaml (detect databases)
|
|
↓
|
|
kubectl exec pg_dump → postgres.sql
|
|
↓
|
|
kubectl exec tar → pvc-data.tar.gz
|
|
↓
|
|
Save to /var/lib/wild-central/backup-staging/
|
|
↓
|
|
Write backup.json metadata
|
|
```
|
|
|
|
**Phase 2 (Restic Upload)**:
|
|
```
|
|
[Same as Phase 1] → Local staging files created
|
|
↓
|
|
restic backup <staging-dir>
|
|
↓
|
|
Upload to repository (S3/B2/local/etc)
|
|
↓
|
|
Clean staging directory
|
|
↓
|
|
Update metadata with snapshot ID
|
|
```
|
|
|
|
**Phase 3 (Restore)**:
|
|
```
|
|
User selects snapshot → restic restore <snapshot-id>
|
|
↓
|
|
Download to staging directory
|
|
↓
|
|
kubectl exec psql < postgres.sql
|
|
↓
|
|
kubectl cp tar file → pod
|
|
↓
|
|
kubectl exec tar -xzf → restore PVC data
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Design
|
|
|
|
### Schema: config.yaml
|
|
|
|
```yaml
|
|
cloud:
|
|
domain: "wildcloud.local"
|
|
dns:
|
|
ip: "192.168.8.50"
|
|
|
|
backup:
|
|
# Restic repository location (native restic URI format)
|
|
repository: "/mnt/backups/wild-cloud" # or "s3:bucket" or "sftp:user@host:/path"
|
|
|
|
# Local staging directory (always on Wild Central filesystem)
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
|
|
# Retention policy (restic forget flags)
|
|
retention:
|
|
keepDaily: 7
|
|
keepWeekly: 4
|
|
keepMonthly: 6
|
|
keepYearly: 2
|
|
|
|
# Backend-specific configuration (optional, backend-dependent)
|
|
backend:
|
|
# For S3-compatible backends (B2, Wasabi, MinIO)
|
|
endpoint: "s3.us-west-002.backblazeb2.com"
|
|
region: "us-west-002"
|
|
|
|
# For SFTP
|
|
port: 22
|
|
```
|
|
|
|
### Schema: secrets.yaml
|
|
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
# Restic repository encryption password
|
|
password: "strong-encryption-password"
|
|
|
|
# Backend credentials (conditional on backend type)
|
|
credentials:
|
|
# For S3/B2/S3-compatible (auto-detected from repository prefix)
|
|
s3:
|
|
accessKeyId: "KEY_ID"
|
|
secretAccessKey: "SECRET_KEY"
|
|
|
|
# For SFTP
|
|
sftp:
|
|
password: "ssh-password"
|
|
# OR
|
|
privateKey: |
|
|
-----BEGIN OPENSSH PRIVATE KEY-----
|
|
...
|
|
-----END OPENSSH PRIVATE KEY-----
|
|
|
|
# For Azure
|
|
azure:
|
|
accountName: "account"
|
|
accountKey: "key"
|
|
|
|
# For Google Cloud
|
|
gcs:
|
|
projectId: "project-id"
|
|
serviceAccountKey: |
|
|
{ "type": "service_account", ... }
|
|
```
|
|
|
|
### Configuration Examples
|
|
|
|
#### Example 1: Local Testing
|
|
|
|
**config.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
repository: "/mnt/external-drive/wild-cloud-backups"
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
retention:
|
|
keepDaily: 7
|
|
keepWeekly: 4
|
|
keepMonthly: 6
|
|
```
|
|
|
|
**secrets.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
password: "test-backup-password-123"
|
|
```
|
|
|
|
#### Example 2: Backblaze B2
|
|
|
|
**config.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
repository: "b2:wild-cloud-backups"
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
retention:
|
|
keepDaily: 7
|
|
keepWeekly: 4
|
|
keepMonthly: 6
|
|
backend:
|
|
endpoint: "s3.us-west-002.backblazeb2.com"
|
|
region: "us-west-002"
|
|
```
|
|
|
|
**secrets.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
password: "strong-encryption-password"
|
|
credentials:
|
|
s3:
|
|
accessKeyId: "0020123456789abcdef"
|
|
secretAccessKey: "K002abcdefghijklmnop"
|
|
```
|
|
|
|
#### Example 3: AWS S3
|
|
|
|
**config.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
repository: "s3:s3.amazonaws.com/my-wild-cloud-backups"
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
retention:
|
|
keepDaily: 14
|
|
keepWeekly: 8
|
|
keepMonthly: 12
|
|
backend:
|
|
region: "us-east-1"
|
|
```
|
|
|
|
**secrets.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
password: "prod-encryption-password"
|
|
credentials:
|
|
s3:
|
|
accessKeyId: "AKIAIOSFODNN7EXAMPLE"
|
|
secretAccessKey: "wJalrXUtnFEMI/K7MDENG/bPxRfiCY"
|
|
```
|
|
|
|
#### Example 4: SFTP Remote Server
|
|
|
|
**config.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
repository: "sftp:backup-user@backup.example.com:/wild-cloud-backups"
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
retention:
|
|
keepDaily: 7
|
|
keepWeekly: 4
|
|
keepMonthly: 6
|
|
backend:
|
|
port: 2222
|
|
```
|
|
|
|
**secrets.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
password: "restic-repo-password"
|
|
credentials:
|
|
sftp:
|
|
privateKey: |
|
|
-----BEGIN OPENSSH PRIVATE KEY-----
|
|
...
|
|
-----END OPENSSH PRIVATE KEY-----
|
|
```
|
|
|
|
#### Example 5: NFS/SMB Mount (as Local Path)
|
|
|
|
**config.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
repository: "/mnt/nas-backups/wild-cloud" # NFS mounted via OS
|
|
staging: "/var/lib/wild-central/backup-staging"
|
|
retention:
|
|
keepDaily: 7
|
|
keepWeekly: 4
|
|
keepMonthly: 6
|
|
```
|
|
|
|
**secrets.yaml**:
|
|
```yaml
|
|
cloud:
|
|
backup:
|
|
password: "backup-encryption-password"
|
|
```
|
|
|
|
### Backend Detection Logic
|
|
|
|
```go
|
|
func DetectBackendType(repository string) string {
|
|
if strings.HasPrefix(repository, "/") {
|
|
return "local"
|
|
} else if strings.HasPrefix(repository, "sftp:") {
|
|
return "sftp"
|
|
} else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") {
|
|
return "s3"
|
|
} else if strings.HasPrefix(repository, "azure:") {
|
|
return "azure"
|
|
} else if strings.HasPrefix(repository, "gs:") {
|
|
return "gcs"
|
|
} else if strings.HasPrefix(repository, "rclone:") {
|
|
return "rclone"
|
|
}
|
|
return "unknown"
|
|
}
|
|
```
|
|
|
|
### Environment Variable Mapping
|
|
|
|
```go
|
|
func BuildResticEnv(config BackupConfig, secrets BackupSecrets) map[string]string {
|
|
env := map[string]string{
|
|
"RESTIC_REPOSITORY": config.Repository,
|
|
"RESTIC_PASSWORD": secrets.Password,
|
|
}
|
|
|
|
backendType := DetectBackendType(config.Repository)
|
|
|
|
switch backendType {
|
|
case "s3":
|
|
env["AWS_ACCESS_KEY_ID"] = secrets.Credentials.S3.AccessKeyID
|
|
env["AWS_SECRET_ACCESS_KEY"] = secrets.Credentials.S3.SecretAccessKey
|
|
|
|
if config.Backend.Endpoint != "" {
|
|
env["AWS_S3_ENDPOINT"] = config.Backend.Endpoint
|
|
}
|
|
if config.Backend.Region != "" {
|
|
env["AWS_DEFAULT_REGION"] = config.Backend.Region
|
|
}
|
|
|
|
case "sftp":
|
|
if secrets.Credentials.SFTP.Password != "" {
|
|
env["RESTIC_SFTP_PASSWORD"] = secrets.Credentials.SFTP.Password
|
|
}
|
|
// SSH key handling done via temp file
|
|
|
|
case "azure":
|
|
env["AZURE_ACCOUNT_NAME"] = secrets.Credentials.Azure.AccountName
|
|
env["AZURE_ACCOUNT_KEY"] = secrets.Credentials.Azure.AccountKey
|
|
|
|
case "gcs":
|
|
// Write service account key to temp file, set GOOGLE_APPLICATION_CREDENTIALS
|
|
}
|
|
|
|
return env
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 1: Core Backup Fix
|
|
|
|
### Goal
|
|
Fix critical bugs and create actual backup files (no restic yet).
|
|
|
|
### Priority
|
|
🔴 **CRITICAL** - Users cannot restore from current backups
|
|
|
|
### Timeline
|
|
2-3 days
|
|
|
|
### Overview
|
|
|
|
Replace broken pod label-based detection with manifest-based detection. Use kubectl exec to create actual database dumps and PVC archives.
|
|
|
|
### Task 1.1: Implement Manifest-Based Database Detection
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Add New Structures**:
|
|
```go
|
|
type AppDependencies struct {
|
|
HasPostgres bool
|
|
HasMySQL bool
|
|
HasRedis bool
|
|
}
|
|
```
|
|
|
|
**Implement Detection Function**:
|
|
```go
|
|
func (m *Manager) getAppDependencies(appName string) (*AppDependencies, error) {
|
|
manifestPath := filepath.Join(m.directoryPath, appName, "manifest.yaml")
|
|
|
|
manifest, err := directory.LoadManifest(manifestPath)
|
|
if err != nil {
|
|
return nil, fmt.Errorf("failed to load manifest: %w", err)
|
|
}
|
|
|
|
deps := &AppDependencies{
|
|
HasPostgres: contains(manifest.Requires, "postgres"),
|
|
HasMySQL: contains(manifest.Requires, "mysql"),
|
|
HasRedis: contains(manifest.Requires, "redis"),
|
|
}
|
|
|
|
return deps, nil
|
|
}
|
|
|
|
func contains(slice []string, item string) bool {
|
|
for _, s := range slice {
|
|
if s == item {
|
|
return true
|
|
}
|
|
}
|
|
return false
|
|
}
|
|
```
|
|
|
|
**Changes Required**:
|
|
- Add import: `"github.com/wild-cloud/wild-central/daemon/internal/directory"`
|
|
- Remove old `detectDatabaseType()` function (lines 544-569)
|
|
|
|
**Acceptance Criteria**:
|
|
- Reads manifest.yaml for app
|
|
- Correctly identifies postgres dependency
|
|
- Correctly identifies mysql dependency
|
|
- Returns error if manifest not found
|
|
- Unit test: parse manifest with postgres
|
|
- Unit test: parse manifest without databases
|
|
|
|
**Estimated Effort**: 2 hours
|
|
|
|
---
|
|
|
|
### Task 1.2: Implement PostgreSQL Backup via kubectl exec
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (m *Manager) backupPostgres(ctx context.Context, instanceName, appName, backupDir string) (string, error) {
|
|
dbName := appName // Database name convention
|
|
|
|
// Find postgres pod in postgres namespace
|
|
podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres")
|
|
if err != nil {
|
|
return "", fmt.Errorf("postgres pod not found: %w", err)
|
|
}
|
|
|
|
// Execute pg_dump
|
|
dumpFile := filepath.Join(backupDir, "postgres.sql")
|
|
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres",
|
|
podName, "--", "pg_dump", "-U", "postgres", dbName)
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("pg_dump failed: %w", err)
|
|
}
|
|
|
|
// Write dump to file
|
|
if err := os.WriteFile(dumpFile, output, 0600); err != nil {
|
|
return "", fmt.Errorf("failed to write dump: %w", err)
|
|
}
|
|
|
|
return dumpFile, nil
|
|
}
|
|
|
|
// Helper function to find pod by label
|
|
func (m *Manager) findPodInNamespace(ctx context.Context, namespace, labelSelector string) (string, error) {
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
|
|
"-n", namespace,
|
|
"-l", labelSelector,
|
|
"-o", "jsonpath={.items[0].metadata.name}")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("kubectl get pods failed: %w", err)
|
|
}
|
|
|
|
podName := strings.TrimSpace(string(output))
|
|
if podName == "" {
|
|
return "", fmt.Errorf("no pod found with label %s in namespace %s", labelSelector, namespace)
|
|
}
|
|
|
|
return podName, nil
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria**:
|
|
- Finds postgres pod correctly
|
|
- Executes pg_dump successfully
|
|
- Creates .sql file with actual data
|
|
- Handles errors gracefully
|
|
- Integration test: backup Gitea database
|
|
|
|
**Estimated Effort**: 3 hours
|
|
|
|
---
|
|
|
|
### Task 1.3: Implement MySQL Backup via kubectl exec
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (m *Manager) backupMySQL(ctx context.Context, instanceName, appName, backupDir string) (string, error) {
|
|
dbName := appName
|
|
|
|
// Find mysql pod
|
|
podName, err := m.findPodInNamespace(ctx, "mysql", "app=mysql")
|
|
if err != nil {
|
|
return "", fmt.Errorf("mysql pod not found: %w", err)
|
|
}
|
|
|
|
// Get MySQL root password from secret
|
|
password, err := m.getMySQLPassword(ctx)
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to get mysql password: %w", err)
|
|
}
|
|
|
|
// Execute mysqldump
|
|
dumpFile := filepath.Join(backupDir, "mysql.sql")
|
|
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "mysql",
|
|
podName, "--", "mysqldump",
|
|
"-uroot",
|
|
fmt.Sprintf("-p%s", password),
|
|
"--single-transaction",
|
|
"--routines",
|
|
"--triggers",
|
|
dbName)
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("mysqldump failed: %w", err)
|
|
}
|
|
|
|
if err := os.WriteFile(dumpFile, output, 0600); err != nil {
|
|
return "", fmt.Errorf("failed to write dump: %w", err)
|
|
}
|
|
|
|
return dumpFile, nil
|
|
}
|
|
|
|
func (m *Manager) getMySQLPassword(ctx context.Context) (string, error) {
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "secret",
|
|
"-n", "mysql",
|
|
"mysql-root-password",
|
|
"-o", "jsonpath={.data.password}")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to get secret: %w", err)
|
|
}
|
|
|
|
// Decode base64
|
|
decoded, err := base64.StdEncoding.DecodeString(string(output))
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to decode password: %w", err)
|
|
}
|
|
|
|
return string(decoded), nil
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria**:
|
|
- Finds mysql pod correctly
|
|
- Retrieves password from secret
|
|
- Executes mysqldump successfully
|
|
- Creates .sql file with actual data
|
|
- Handles errors gracefully
|
|
|
|
**Estimated Effort**: 3 hours
|
|
|
|
---
|
|
|
|
### Task 1.4: Implement PVC Discovery and Backup
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (m *Manager) findAppPVCs(ctx context.Context, appName string) ([]string, error) {
|
|
// Get namespace for app (convention: app name)
|
|
namespace := appName
|
|
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "pvc",
|
|
"-n", namespace,
|
|
"-o", "jsonpath={.items[*].metadata.name}")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return nil, fmt.Errorf("kubectl get pvc failed: %w", err)
|
|
}
|
|
|
|
pvcNames := strings.Fields(string(output))
|
|
return pvcNames, nil
|
|
}
|
|
|
|
func (m *Manager) backupPVC(ctx context.Context, namespace, pvcName, backupDir string) (string, error) {
|
|
// Find pod using this PVC
|
|
podName, err := m.findPodUsingPVC(ctx, namespace, pvcName)
|
|
if err != nil {
|
|
return "", fmt.Errorf("no pod found using PVC %s: %w", pvcName, err)
|
|
}
|
|
|
|
// Get mount path for PVC
|
|
mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName)
|
|
if err != nil {
|
|
return "", fmt.Errorf("failed to get mount path: %w", err)
|
|
}
|
|
|
|
// Create tar archive of PVC data
|
|
tarFile := filepath.Join(backupDir, fmt.Sprintf("%s.tar.gz", pvcName))
|
|
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
|
|
podName, "--", "tar", "czf", "-", "-C", mountPath, ".")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("tar command failed: %w", err)
|
|
}
|
|
|
|
if err := os.WriteFile(tarFile, output, 0600); err != nil {
|
|
return "", fmt.Errorf("failed to write tar file: %w", err)
|
|
}
|
|
|
|
return tarFile, nil
|
|
}
|
|
|
|
func (m *Manager) findPodUsingPVC(ctx context.Context, namespace, pvcName string) (string, error) {
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "pods",
|
|
"-n", namespace,
|
|
"-o", "json")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("kubectl get pods failed: %w", err)
|
|
}
|
|
|
|
// Parse JSON to find pod using this PVC
|
|
var podList struct {
|
|
Items []struct {
|
|
Metadata struct {
|
|
Name string `json:"name"`
|
|
} `json:"metadata"`
|
|
Spec struct {
|
|
Volumes []struct {
|
|
PersistentVolumeClaim *struct {
|
|
ClaimName string `json:"claimName"`
|
|
} `json:"persistentVolumeClaim"`
|
|
} `json:"volumes"`
|
|
} `json:"spec"`
|
|
} `json:"items"`
|
|
}
|
|
|
|
if err := json.Unmarshal(output, &podList); err != nil {
|
|
return "", fmt.Errorf("failed to parse pod list: %w", err)
|
|
}
|
|
|
|
for _, pod := range podList.Items {
|
|
for _, volume := range pod.Spec.Volumes {
|
|
if volume.PersistentVolumeClaim != nil &&
|
|
volume.PersistentVolumeClaim.ClaimName == pvcName {
|
|
return pod.Metadata.Name, nil
|
|
}
|
|
}
|
|
}
|
|
|
|
return "", fmt.Errorf("no pod found using PVC %s", pvcName)
|
|
}
|
|
|
|
func (m *Manager) getPVCMountPath(ctx context.Context, namespace, podName, pvcName string) (string, error) {
|
|
cmd := exec.CommandContext(ctx, "kubectl", "get", "pod",
|
|
"-n", namespace,
|
|
podName,
|
|
"-o", "json")
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return "", fmt.Errorf("kubectl get pod failed: %w", err)
|
|
}
|
|
|
|
var pod struct {
|
|
Spec struct {
|
|
Volumes []struct {
|
|
Name string `json:"name"`
|
|
PersistentVolumeClaim *struct {
|
|
ClaimName string `json:"claimName"`
|
|
} `json:"persistentVolumeClaim"`
|
|
} `json:"volumes"`
|
|
Containers []struct {
|
|
VolumeMounts []struct {
|
|
Name string `json:"name"`
|
|
MountPath string `json:"mountPath"`
|
|
} `json:"volumeMounts"`
|
|
} `json:"containers"`
|
|
} `json:"spec"`
|
|
}
|
|
|
|
if err := json.Unmarshal(output, &pod); err != nil {
|
|
return "", fmt.Errorf("failed to parse pod: %w", err)
|
|
}
|
|
|
|
// Find volume name for PVC
|
|
var volumeName string
|
|
for _, volume := range pod.Spec.Volumes {
|
|
if volume.PersistentVolumeClaim != nil &&
|
|
volume.PersistentVolumeClaim.ClaimName == pvcName {
|
|
volumeName = volume.Name
|
|
break
|
|
}
|
|
}
|
|
|
|
if volumeName == "" {
|
|
return "", fmt.Errorf("PVC %s not found in pod volumes", pvcName)
|
|
}
|
|
|
|
// Find mount path for volume
|
|
for _, container := range pod.Spec.Containers {
|
|
for _, mount := range container.VolumeMounts {
|
|
if mount.Name == volumeName {
|
|
return mount.MountPath, nil
|
|
}
|
|
}
|
|
}
|
|
|
|
return "", fmt.Errorf("mount path not found for volume %s", volumeName)
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria**:
|
|
- Discovers PVCs in app namespace
|
|
- Finds pod using PVC
|
|
- Gets correct mount path
|
|
- Creates tar.gz with actual data
|
|
- Handles multiple PVCs
|
|
- Integration test: backup Immich PVCs
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
### Task 1.5: Update BackupApp Flow
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Replace BackupApp function** (complete rewrite):
|
|
|
|
```go
|
|
func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) {
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
|
|
defer cancel()
|
|
|
|
// Create timestamped backup directory
|
|
timestamp := time.Now().UTC().Format("20060102T150405Z")
|
|
stagingDir := filepath.Join(m.dataDir, "instances", instanceName, "backups", "staging")
|
|
backupDir := filepath.Join(stagingDir, "apps", appName, timestamp)
|
|
|
|
if err := os.MkdirAll(backupDir, 0755); err != nil {
|
|
return nil, fmt.Errorf("failed to create backup directory: %w", err)
|
|
}
|
|
|
|
// Initialize backup info with in_progress status
|
|
info := &BackupInfo{
|
|
Type: "app",
|
|
AppName: appName,
|
|
Status: "in_progress",
|
|
CreatedAt: time.Now().UTC().Format(time.RFC3339),
|
|
Files: []string{},
|
|
}
|
|
|
|
// Save initial metadata
|
|
if err := m.saveBackupMetadata(backupDir, info); err != nil {
|
|
return nil, fmt.Errorf("failed to save initial metadata: %w", err)
|
|
}
|
|
|
|
// Read app dependencies from manifest
|
|
deps, err := m.getAppDependencies(appName)
|
|
if err != nil {
|
|
info.Status = "failed"
|
|
info.Error = fmt.Sprintf("Failed to read manifest: %v", err)
|
|
m.saveBackupMetadata(backupDir, info)
|
|
return info, err
|
|
}
|
|
|
|
var backupFiles []string
|
|
|
|
// Backup PostgreSQL if required
|
|
if deps.HasPostgres {
|
|
file, err := m.backupPostgres(ctx, instanceName, appName, backupDir)
|
|
if err != nil {
|
|
info.Status = "failed"
|
|
info.Error = fmt.Sprintf("PostgreSQL backup failed: %v", err)
|
|
m.saveBackupMetadata(backupDir, info)
|
|
return info, err
|
|
}
|
|
backupFiles = append(backupFiles, file)
|
|
}
|
|
|
|
// Backup MySQL if required
|
|
if deps.HasMySQL {
|
|
file, err := m.backupMySQL(ctx, instanceName, appName, backupDir)
|
|
if err != nil {
|
|
info.Status = "failed"
|
|
info.Error = fmt.Sprintf("MySQL backup failed: %v", err)
|
|
m.saveBackupMetadata(backupDir, info)
|
|
return info, err
|
|
}
|
|
backupFiles = append(backupFiles, file)
|
|
}
|
|
|
|
// Discover and backup PVCs
|
|
pvcNames, err := m.findAppPVCs(ctx, appName)
|
|
if err != nil {
|
|
// Log warning but don't fail if no PVCs found
|
|
log.Printf("Warning: failed to find PVCs for %s: %v", appName, err)
|
|
} else {
|
|
for _, pvcName := range pvcNames {
|
|
file, err := m.backupPVC(ctx, appName, pvcName, backupDir)
|
|
if err != nil {
|
|
log.Printf("Warning: failed to backup PVC %s: %v", pvcName, err)
|
|
continue
|
|
}
|
|
backupFiles = append(backupFiles, file)
|
|
}
|
|
}
|
|
|
|
// Calculate total backup size
|
|
var totalSize int64
|
|
for _, file := range backupFiles {
|
|
stat, err := os.Stat(file)
|
|
if err == nil {
|
|
totalSize += stat.Size()
|
|
}
|
|
}
|
|
|
|
// Update final metadata
|
|
info.Status = "completed"
|
|
info.Files = backupFiles
|
|
info.Size = totalSize
|
|
info.Error = ""
|
|
|
|
if err := m.saveBackupMetadata(backupDir, info); err != nil {
|
|
return info, fmt.Errorf("failed to save final metadata: %w", err)
|
|
}
|
|
|
|
return info, nil
|
|
}
|
|
|
|
func (m *Manager) saveBackupMetadata(backupDir string, info *BackupInfo) error {
|
|
metadataFile := filepath.Join(backupDir, "backup.json")
|
|
data, err := json.MarshalIndent(info, "", " ")
|
|
if err != nil {
|
|
return fmt.Errorf("failed to marshal metadata: %w", err)
|
|
}
|
|
return os.WriteFile(metadataFile, data, 0644)
|
|
}
|
|
```
|
|
|
|
**Acceptance Criteria**:
|
|
- Creates timestamped backup directories
|
|
- Reads manifest to detect dependencies
|
|
- Backs up databases if present
|
|
- Backs up PVCs if present
|
|
- Calculates accurate backup size
|
|
- Saves complete metadata
|
|
- Handles errors gracefully
|
|
- Integration test: Full Gitea backup
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
### Task 1.6: Build and Test
|
|
|
|
**Steps**:
|
|
1. Build wild-central-api
|
|
2. Deploy to test environment
|
|
3. Test Gitea backup (PostgreSQL + PVC)
|
|
4. Test Immich backup (PostgreSQL + multiple PVCs)
|
|
5. Verify backup files exist and have data
|
|
6. Verify metadata accuracy
|
|
7. Test manual restore
|
|
|
|
**Acceptance Criteria**:
|
|
- All builds succeed
|
|
- App backups create actual files
|
|
- Metadata is accurate
|
|
- Manual restore works
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
## Phase 2: Restic Integration
|
|
|
|
### Goal
|
|
Upload staged backups to restic repository with flexible backends.
|
|
|
|
### Priority
|
|
🟡 **HIGH PRIORITY** (after Phase 1 complete)
|
|
|
|
### Timeline
|
|
5-7 days
|
|
|
|
### Prerequisites
|
|
- Phase 1 completed and tested
|
|
- Restic installed on Wild Central device
|
|
- Backup destination configured (S3, B2, local, etc.)
|
|
|
|
### Task 2.1: Configuration Management
|
|
|
|
**File**: `wild-central-api/internal/backup/config.go` (new file)
|
|
|
|
**Implementation**:
|
|
```go
|
|
package backup
|
|
|
|
import (
|
|
"fmt"
|
|
"strings"
|
|
|
|
"github.com/wild-cloud/wild-central/daemon/internal/config"
|
|
)
|
|
|
|
type BackupConfig struct {
|
|
Repository string
|
|
Staging string
|
|
Retention RetentionPolicy
|
|
Backend BackendConfig
|
|
}
|
|
|
|
type RetentionPolicy struct {
|
|
KeepDaily int
|
|
KeepWeekly int
|
|
KeepMonthly int
|
|
KeepYearly int
|
|
}
|
|
|
|
type BackendConfig struct {
|
|
Type string
|
|
Endpoint string
|
|
Region string
|
|
Port int
|
|
}
|
|
|
|
type BackupSecrets struct {
|
|
Password string
|
|
Credentials BackendCredentials
|
|
}
|
|
|
|
type BackendCredentials struct {
|
|
S3 *S3Credentials
|
|
SFTP *SFTPCredentials
|
|
Azure *AzureCredentials
|
|
GCS *GCSCredentials
|
|
}
|
|
|
|
type S3Credentials struct {
|
|
AccessKeyID string
|
|
SecretAccessKey string
|
|
}
|
|
|
|
type SFTPCredentials struct {
|
|
Password string
|
|
PrivateKey string
|
|
}
|
|
|
|
type AzureCredentials struct {
|
|
AccountName string
|
|
AccountKey string
|
|
}
|
|
|
|
type GCSCredentials struct {
|
|
ProjectID string
|
|
ServiceAccountKey string
|
|
}
|
|
|
|
func LoadBackupConfig(instanceName string) (*BackupConfig, *BackupSecrets, error) {
|
|
cfg, err := config.LoadInstanceConfig(instanceName)
|
|
if err != nil {
|
|
return nil, nil, fmt.Errorf("failed to load config: %w", err)
|
|
}
|
|
|
|
secrets, err := config.LoadInstanceSecrets(instanceName)
|
|
if err != nil {
|
|
return nil, nil, fmt.Errorf("failed to load secrets: %w", err)
|
|
}
|
|
|
|
backupCfg := &BackupConfig{
|
|
Repository: cfg.Cloud.Backup.Repository,
|
|
Staging: cfg.Cloud.Backup.Staging,
|
|
Retention: RetentionPolicy{
|
|
KeepDaily: cfg.Cloud.Backup.Retention.KeepDaily,
|
|
KeepWeekly: cfg.Cloud.Backup.Retention.KeepWeekly,
|
|
KeepMonthly: cfg.Cloud.Backup.Retention.KeepMonthly,
|
|
KeepYearly: cfg.Cloud.Backup.Retention.KeepYearly,
|
|
},
|
|
Backend: BackendConfig{
|
|
Type: DetectBackendType(cfg.Cloud.Backup.Repository),
|
|
Endpoint: cfg.Cloud.Backup.Backend.Endpoint,
|
|
Region: cfg.Cloud.Backup.Backend.Region,
|
|
Port: cfg.Cloud.Backup.Backend.Port,
|
|
},
|
|
}
|
|
|
|
backupSecrets := &BackupSecrets{
|
|
Password: secrets.Cloud.Backup.Password,
|
|
Credentials: BackendCredentials{
|
|
S3: secrets.Cloud.Backup.Credentials.S3,
|
|
SFTP: secrets.Cloud.Backup.Credentials.SFTP,
|
|
Azure: secrets.Cloud.Backup.Credentials.Azure,
|
|
GCS: secrets.Cloud.Backup.Credentials.GCS,
|
|
},
|
|
}
|
|
|
|
return backupCfg, backupSecrets, nil
|
|
}
|
|
|
|
func DetectBackendType(repository string) string {
|
|
if strings.HasPrefix(repository, "/") {
|
|
return "local"
|
|
} else if strings.HasPrefix(repository, "sftp:") {
|
|
return "sftp"
|
|
} else if strings.HasPrefix(repository, "s3:") || strings.HasPrefix(repository, "b2:") {
|
|
return "s3"
|
|
} else if strings.HasPrefix(repository, "azure:") {
|
|
return "azure"
|
|
} else if strings.HasPrefix(repository, "gs:") {
|
|
return "gcs"
|
|
} else if strings.HasPrefix(repository, "rclone:") {
|
|
return "rclone"
|
|
}
|
|
return "unknown"
|
|
}
|
|
|
|
func ValidateBackupConfig(cfg *BackupConfig, secrets *BackupSecrets) error {
|
|
if cfg.Repository == "" {
|
|
return fmt.Errorf("repository is required")
|
|
}
|
|
|
|
if secrets.Password == "" {
|
|
return fmt.Errorf("repository password is required")
|
|
}
|
|
|
|
// Validate backend-specific credentials
|
|
switch cfg.Backend.Type {
|
|
case "s3":
|
|
if secrets.Credentials.S3 == nil {
|
|
return fmt.Errorf("S3 credentials required for S3 backend")
|
|
}
|
|
if secrets.Credentials.S3.AccessKeyID == "" || secrets.Credentials.S3.SecretAccessKey == "" {
|
|
return fmt.Errorf("S3 access key and secret key required")
|
|
}
|
|
case "sftp":
|
|
if secrets.Credentials.SFTP == nil {
|
|
return fmt.Errorf("SFTP credentials required for SFTP backend")
|
|
}
|
|
if secrets.Credentials.SFTP.Password == "" && secrets.Credentials.SFTP.PrivateKey == "" {
|
|
return fmt.Errorf("SFTP password or private key required")
|
|
}
|
|
case "azure":
|
|
if secrets.Credentials.Azure == nil {
|
|
return fmt.Errorf("Azure credentials required for Azure backend")
|
|
}
|
|
if secrets.Credentials.Azure.AccountName == "" || secrets.Credentials.Azure.AccountKey == "" {
|
|
return fmt.Errorf("Azure account name and key required")
|
|
}
|
|
case "gcs":
|
|
if secrets.Credentials.GCS == nil {
|
|
return fmt.Errorf("GCS credentials required for GCS backend")
|
|
}
|
|
if secrets.Credentials.GCS.ServiceAccountKey == "" {
|
|
return fmt.Errorf("GCS service account key required")
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 3 hours
|
|
|
|
---
|
|
|
|
### Task 2.2: Restic Operations Module
|
|
|
|
**File**: `wild-central-api/internal/backup/restic.go` (new file)
|
|
|
|
**Implementation**:
|
|
```go
|
|
package backup
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"os"
|
|
"os/exec"
|
|
"strings"
|
|
)
|
|
|
|
type ResticClient struct {
|
|
config *BackupConfig
|
|
secrets *BackupSecrets
|
|
}
|
|
|
|
func NewResticClient(config *BackupConfig, secrets *BackupSecrets) *ResticClient {
|
|
return &ResticClient{
|
|
config: config,
|
|
secrets: secrets,
|
|
}
|
|
}
|
|
|
|
func (r *ResticClient) buildEnv() map[string]string {
|
|
env := map[string]string{
|
|
"RESTIC_REPOSITORY": r.config.Repository,
|
|
"RESTIC_PASSWORD": r.secrets.Password,
|
|
}
|
|
|
|
switch r.config.Backend.Type {
|
|
case "s3":
|
|
if r.secrets.Credentials.S3 != nil {
|
|
env["AWS_ACCESS_KEY_ID"] = r.secrets.Credentials.S3.AccessKeyID
|
|
env["AWS_SECRET_ACCESS_KEY"] = r.secrets.Credentials.S3.SecretAccessKey
|
|
}
|
|
if r.config.Backend.Endpoint != "" {
|
|
env["AWS_S3_ENDPOINT"] = r.config.Backend.Endpoint
|
|
}
|
|
if r.config.Backend.Region != "" {
|
|
env["AWS_DEFAULT_REGION"] = r.config.Backend.Region
|
|
}
|
|
|
|
case "sftp":
|
|
if r.secrets.Credentials.SFTP != nil && r.secrets.Credentials.SFTP.Password != "" {
|
|
env["RESTIC_SFTP_PASSWORD"] = r.secrets.Credentials.SFTP.Password
|
|
}
|
|
|
|
case "azure":
|
|
if r.secrets.Credentials.Azure != nil {
|
|
env["AZURE_ACCOUNT_NAME"] = r.secrets.Credentials.Azure.AccountName
|
|
env["AZURE_ACCOUNT_KEY"] = r.secrets.Credentials.Azure.AccountKey
|
|
}
|
|
}
|
|
|
|
return env
|
|
}
|
|
|
|
func (r *ResticClient) Init(ctx context.Context) error {
|
|
cmd := exec.CommandContext(ctx, "restic", "init")
|
|
|
|
// Set environment variables
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
output, err := cmd.CombinedOutput()
|
|
if err != nil {
|
|
return fmt.Errorf("restic init failed: %w: %s", err, string(output))
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (r *ResticClient) Backup(ctx context.Context, path string, tags []string) (string, error) {
|
|
args := []string{"backup", path}
|
|
for _, tag := range tags {
|
|
args = append(args, "--tag", tag)
|
|
}
|
|
|
|
cmd := exec.CommandContext(ctx, "restic", args...)
|
|
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
output, err := cmd.CombinedOutput()
|
|
if err != nil {
|
|
return "", fmt.Errorf("restic backup failed: %w: %s", err, string(output))
|
|
}
|
|
|
|
// Parse snapshot ID from output
|
|
snapshotID := r.parseSnapshotID(string(output))
|
|
|
|
return snapshotID, nil
|
|
}
|
|
|
|
func (r *ResticClient) ListSnapshots(ctx context.Context, tags []string) ([]Snapshot, error) {
|
|
args := []string{"snapshots", "--json"}
|
|
for _, tag := range tags {
|
|
args = append(args, "--tag", tag)
|
|
}
|
|
|
|
cmd := exec.CommandContext(ctx, "restic", args...)
|
|
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return nil, fmt.Errorf("restic snapshots failed: %w", err)
|
|
}
|
|
|
|
var snapshots []Snapshot
|
|
if err := json.Unmarshal(output, &snapshots); err != nil {
|
|
return nil, fmt.Errorf("failed to parse snapshots: %w", err)
|
|
}
|
|
|
|
return snapshots, nil
|
|
}
|
|
|
|
func (r *ResticClient) Restore(ctx context.Context, snapshotID, targetPath string) error {
|
|
cmd := exec.CommandContext(ctx, "restic", "restore", snapshotID, "--target", targetPath)
|
|
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
output, err := cmd.CombinedOutput()
|
|
if err != nil {
|
|
return fmt.Errorf("restic restore failed: %w: %s", err, string(output))
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (r *ResticClient) Stats(ctx context.Context) (*RepositoryStats, error) {
|
|
cmd := exec.CommandContext(ctx, "restic", "stats", "--json")
|
|
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
output, err := cmd.Output()
|
|
if err != nil {
|
|
return nil, fmt.Errorf("restic stats failed: %w", err)
|
|
}
|
|
|
|
var stats RepositoryStats
|
|
if err := json.Unmarshal(output, &stats); err != nil {
|
|
return nil, fmt.Errorf("failed to parse stats: %w", err)
|
|
}
|
|
|
|
return &stats, nil
|
|
}
|
|
|
|
func (r *ResticClient) TestConnection(ctx context.Context) error {
|
|
cmd := exec.CommandContext(ctx, "restic", "cat", "config")
|
|
|
|
cmd.Env = os.Environ()
|
|
for k, v := range r.buildEnv() {
|
|
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
|
|
}
|
|
|
|
_, err := cmd.Output()
|
|
if err != nil {
|
|
return fmt.Errorf("connection test failed: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (r *ResticClient) parseSnapshotID(output string) string {
|
|
lines := strings.Split(output, "\n")
|
|
for _, line := range lines {
|
|
if strings.Contains(line, "snapshot") && strings.Contains(line, "saved") {
|
|
parts := strings.Fields(line)
|
|
for i, part := range parts {
|
|
if part == "snapshot" && i+1 < len(parts) {
|
|
return parts[i+1]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
return ""
|
|
}
|
|
|
|
type Snapshot struct {
|
|
ID string `json:"id"`
|
|
Time string `json:"time"`
|
|
Hostname string `json:"hostname"`
|
|
Tags []string `json:"tags"`
|
|
Paths []string `json:"paths"`
|
|
}
|
|
|
|
type RepositoryStats struct {
|
|
TotalSize int64 `json:"total_size"`
|
|
TotalFileCount int64 `json:"total_file_count"`
|
|
SnapshotCount int `json:"snapshot_count"`
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
### Task 2.3: Update Backup Flow to Upload to Restic
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Modify BackupApp function** to add restic upload after staging:
|
|
|
|
```go
|
|
func (m *Manager) BackupApp(instanceName, appName string) (*BackupInfo, error) {
|
|
// ... existing Phase 1 code to create local backup ...
|
|
|
|
// After local backup succeeds, upload to restic if configured
|
|
cfg, secrets, err := LoadBackupConfig(instanceName)
|
|
if err == nil && cfg.Repository != "" {
|
|
// Restic is configured, upload backup
|
|
client := NewResticClient(cfg, secrets)
|
|
|
|
tags := []string{
|
|
fmt.Sprintf("type:app"),
|
|
fmt.Sprintf("app:%s", appName),
|
|
fmt.Sprintf("instance:%s", instanceName),
|
|
}
|
|
|
|
snapshotID, err := client.Backup(ctx, backupDir, tags)
|
|
if err != nil {
|
|
log.Printf("Warning: restic upload failed: %v", err)
|
|
// Don't fail the backup, local files still exist
|
|
} else {
|
|
info.SnapshotID = snapshotID
|
|
|
|
// Clean up staging directory after successful upload
|
|
if err := os.RemoveAll(backupDir); err != nil {
|
|
log.Printf("Warning: failed to clean staging directory: %v", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
// Save final metadata
|
|
if err := m.saveBackupMetadata(backupDir, info); err != nil {
|
|
return info, fmt.Errorf("failed to save final metadata: %w", err)
|
|
}
|
|
|
|
return info, nil
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 2 hours
|
|
|
|
---
|
|
|
|
### Task 2.4: API Client Updates
|
|
|
|
**File**: `wild-web-app/src/services/api/backups.ts`
|
|
|
|
**Add configuration endpoints**:
|
|
|
|
```typescript
|
|
export interface BackupConfiguration {
|
|
repository: string;
|
|
staging: string;
|
|
retention: {
|
|
keepDaily: number;
|
|
keepWeekly: number;
|
|
keepMonthly: number;
|
|
keepYearly: number;
|
|
};
|
|
backend: {
|
|
type: string;
|
|
endpoint?: string;
|
|
region?: string;
|
|
port?: number;
|
|
};
|
|
}
|
|
|
|
export interface BackupConfigurationWithCredentials extends BackupConfiguration {
|
|
password: string;
|
|
credentials?: {
|
|
s3?: {
|
|
accessKeyId: string;
|
|
secretAccessKey: string;
|
|
};
|
|
sftp?: {
|
|
password?: string;
|
|
privateKey?: string;
|
|
};
|
|
azure?: {
|
|
accountName: string;
|
|
accountKey: string;
|
|
};
|
|
gcs?: {
|
|
projectId: string;
|
|
serviceAccountKey: string;
|
|
};
|
|
};
|
|
}
|
|
|
|
export interface RepositoryStatus {
|
|
initialized: boolean;
|
|
reachable: boolean;
|
|
lastBackup?: string;
|
|
snapshotCount: number;
|
|
}
|
|
|
|
export interface RepositoryStats {
|
|
repositorySize: number;
|
|
repositorySizeHuman: string;
|
|
snapshotCount: number;
|
|
fileCount: number;
|
|
uniqueChunks: number;
|
|
compressionRatio: number;
|
|
oldestSnapshot?: string;
|
|
latestSnapshot?: string;
|
|
}
|
|
|
|
export async function getBackupConfiguration(
|
|
instanceId: string
|
|
): Promise<{ config: BackupConfiguration; status: RepositoryStatus }> {
|
|
const response = await api.get(`/instances/${instanceId}/backup/config`);
|
|
return response.data;
|
|
}
|
|
|
|
export async function updateBackupConfiguration(
|
|
instanceId: string,
|
|
config: BackupConfigurationWithCredentials
|
|
): Promise<void> {
|
|
await api.put(`/instances/${instanceId}/backup/config`, config);
|
|
}
|
|
|
|
export async function testBackupConnection(
|
|
instanceId: string,
|
|
config: BackupConfigurationWithCredentials
|
|
): Promise<RepositoryStatus> {
|
|
const response = await api.post(`/instances/${instanceId}/backup/test`, config);
|
|
return response.data;
|
|
}
|
|
|
|
export async function initializeBackupRepository(
|
|
instanceId: string,
|
|
config: BackupConfigurationWithCredentials
|
|
): Promise<{ repositoryId: string }> {
|
|
const response = await api.post(`/instances/${instanceId}/backup/init`, config);
|
|
return response.data;
|
|
}
|
|
|
|
export async function getRepositoryStats(
|
|
instanceId: string
|
|
): Promise<RepositoryStats> {
|
|
const response = await api.get(`/instances/${instanceId}/backup/stats`);
|
|
return response.data;
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 2 hours
|
|
|
|
---
|
|
|
|
### Task 2.5: Configuration UI Components
|
|
|
|
Create the following components in `wild-web-app/src/components/backup/`:
|
|
|
|
**BackupConfigurationCard.tsx**:
|
|
- Main configuration form
|
|
- Backend type selector
|
|
- Conditional credential inputs
|
|
- Retention policy inputs
|
|
- Test/Save/Cancel buttons
|
|
|
|
**BackendSelector.tsx**:
|
|
- Dropdown for backend types
|
|
- Shows available backends with icons
|
|
|
|
**CredentialsForm.tsx**:
|
|
- Dynamic form based on selected backend
|
|
- Password/key inputs with visibility toggle
|
|
- Validation
|
|
|
|
**RepositoryStatus.tsx**:
|
|
- Display repository health
|
|
- Show stats (size, snapshots, last backup)
|
|
- Visual indicators
|
|
|
|
**RetentionPolicyInputs.tsx**:
|
|
- Number inputs for retention periods
|
|
- Tooltips explaining each period
|
|
|
|
**Estimated Effort**: 8 hours
|
|
|
|
---
|
|
|
|
### Task 2.6: Integrate with BackupsPage
|
|
|
|
**File**: `wild-web-app/src/router/pages/BackupsPage.tsx`
|
|
|
|
**Add configuration section above backup list**:
|
|
|
|
```typescript
|
|
function BackupsPage() {
|
|
const { instanceId } = useParams();
|
|
const [showConfig, setShowConfig] = useState(false);
|
|
|
|
const { data: backupConfig } = useQuery({
|
|
queryKey: ['backup-config', instanceId],
|
|
queryFn: () => getBackupConfiguration(instanceId),
|
|
});
|
|
|
|
return (
|
|
<div className="space-y-6">
|
|
{/* Repository Status Card */}
|
|
{backupConfig && (
|
|
<RepositoryStatus
|
|
status={backupConfig.status}
|
|
onEditClick={() => setShowConfig(true)}
|
|
/>
|
|
)}
|
|
|
|
{/* Configuration Card (conditional) */}
|
|
{showConfig && (
|
|
<BackupConfigurationCard
|
|
instanceId={instanceId}
|
|
currentConfig={backupConfig?.config}
|
|
onSave={() => setShowConfig(false)}
|
|
onCancel={() => setShowConfig(false)}
|
|
/>
|
|
)}
|
|
|
|
{/* Existing backup list */}
|
|
<BackupList instanceId={instanceId} />
|
|
</div>
|
|
);
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 3 hours
|
|
|
|
---
|
|
|
|
### Task 2.7: Backup Configuration API Handlers
|
|
|
|
**File**: `wild-central-api/internal/api/v1/handlers_backup.go`
|
|
|
|
**Add new handlers**:
|
|
|
|
```go
|
|
func (h *Handler) BackupConfigGet(c *gin.Context) {
|
|
instanceName := c.Param("name")
|
|
|
|
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
// Test repository status
|
|
var status backup.RepositoryStatus
|
|
if cfg.Repository != "" {
|
|
client := backup.NewResticClient(cfg, secrets)
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
|
|
status.Initialized = true
|
|
status.Reachable = client.TestConnection(ctx) == nil
|
|
|
|
if stats, err := client.Stats(ctx); err == nil {
|
|
status.SnapshotCount = stats.SnapshotCount
|
|
}
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"data": gin.H{
|
|
"config": cfg,
|
|
"status": status,
|
|
},
|
|
})
|
|
}
|
|
|
|
func (h *Handler) BackupConfigUpdate(c *gin.Context) {
|
|
instanceName := c.Param("name")
|
|
|
|
var req backup.BackupConfigurationWithCredentials
|
|
if err := c.BindJSON(&req); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
// Validate configuration
|
|
if err := backup.ValidateBackupConfig(&req.Config, &req.Secrets); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
// Save to config.yaml and secrets.yaml
|
|
if err := config.SaveBackupConfig(instanceName, &req); err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"message": "Backup configuration updated successfully",
|
|
})
|
|
}
|
|
|
|
func (h *Handler) BackupConnectionTest(c *gin.Context) {
|
|
var req backup.BackupConfigurationWithCredentials
|
|
if err := c.BindJSON(&req); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
client := backup.NewResticClient(&req.Config, &req.Secrets)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
|
|
status := backup.RepositoryStatus{
|
|
Reachable: client.TestConnection(ctx) == nil,
|
|
}
|
|
|
|
if status.Reachable {
|
|
if stats, err := client.Stats(ctx); err == nil {
|
|
status.Initialized = true
|
|
status.SnapshotCount = stats.SnapshotCount
|
|
}
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"data": status,
|
|
})
|
|
}
|
|
|
|
func (h *Handler) BackupRepositoryInit(c *gin.Context) {
|
|
var req backup.BackupConfigurationWithCredentials
|
|
if err := c.BindJSON(&req); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
client := backup.NewResticClient(&req.Config, &req.Secrets)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer cancel()
|
|
|
|
if err := client.Init(ctx); err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"message": "Repository initialized successfully",
|
|
})
|
|
}
|
|
|
|
func (h *Handler) BackupStatsGet(c *gin.Context) {
|
|
instanceName := c.Param("name")
|
|
|
|
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
client := backup.NewResticClient(cfg, secrets)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
|
|
stats, err := client.Stats(ctx)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"data": stats,
|
|
})
|
|
}
|
|
```
|
|
|
|
**Register routes**:
|
|
```go
|
|
backupGroup := v1.Group("/instances/:name/backup")
|
|
{
|
|
backupGroup.GET("/config", h.BackupConfigGet)
|
|
backupGroup.PUT("/config", h.BackupConfigUpdate)
|
|
backupGroup.POST("/test", h.BackupConnectionTest)
|
|
backupGroup.POST("/init", h.BackupRepositoryInit)
|
|
backupGroup.GET("/stats", h.BackupStatsGet)
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
### Task 2.8: End-to-End Testing
|
|
|
|
**Test scenarios**:
|
|
1. Configure local repository via UI
|
|
2. Configure S3 repository via UI
|
|
3. Test connection validation
|
|
4. Create backup and verify upload
|
|
5. Check repository stats
|
|
6. Test error handling
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
## Phase 3: Restore from Restic
|
|
|
|
### Goal
|
|
Enable users to restore backups from restic snapshots.
|
|
|
|
### Priority
|
|
🟢 **MEDIUM PRIORITY** (after Phase 2 complete)
|
|
|
|
### Timeline
|
|
3-5 days
|
|
|
|
### Task 3.1: List Snapshots API
|
|
|
|
**File**: `wild-central-api/internal/api/v1/handlers_backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (h *Handler) BackupSnapshotsList(c *gin.Context) {
|
|
instanceName := c.Param("name")
|
|
appName := c.Query("app")
|
|
|
|
cfg, secrets, err := backup.LoadBackupConfig(instanceName)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
client := backup.NewResticClient(cfg, secrets)
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
|
|
var tags []string
|
|
if appName != "" {
|
|
tags = append(tags, fmt.Sprintf("app:%s", appName))
|
|
}
|
|
|
|
snapshots, err := client.ListSnapshots(ctx, tags)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"success": true,
|
|
"data": snapshots,
|
|
})
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 2 hours
|
|
|
|
---
|
|
|
|
### Task 3.2: Restore Snapshot Function
|
|
|
|
**File**: `wild-central-api/internal/backup/backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (m *Manager) RestoreFromSnapshot(instanceName, snapshotID string) error {
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
|
|
defer cancel()
|
|
|
|
// Load restic config
|
|
cfg, secrets, err := LoadBackupConfig(instanceName)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to load config: %w", err)
|
|
}
|
|
|
|
client := NewResticClient(cfg, secrets)
|
|
|
|
// Create temp directory for restore
|
|
tempDir := filepath.Join(cfg.Staging, "restore", snapshotID)
|
|
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
|
return fmt.Errorf("failed to create temp directory: %w", err)
|
|
}
|
|
defer os.RemoveAll(tempDir)
|
|
|
|
// Restore snapshot to temp directory
|
|
if err := client.Restore(ctx, snapshotID, tempDir); err != nil {
|
|
return fmt.Errorf("restic restore failed: %w", err)
|
|
}
|
|
|
|
// Parse metadata to determine what to restore
|
|
metadataFile := filepath.Join(tempDir, "backup.json")
|
|
info, err := m.loadBackupMetadata(metadataFile)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to load metadata: %w", err)
|
|
}
|
|
|
|
// Restore databases
|
|
for _, file := range info.Files {
|
|
if strings.HasSuffix(file, "postgres.sql") {
|
|
if err := m.restorePostgres(ctx, info.AppName, filepath.Join(tempDir, "postgres.sql")); err != nil {
|
|
return fmt.Errorf("postgres restore failed: %w", err)
|
|
}
|
|
} else if strings.HasSuffix(file, "mysql.sql") {
|
|
if err := m.restoreMySQL(ctx, info.AppName, filepath.Join(tempDir, "mysql.sql")); err != nil {
|
|
return fmt.Errorf("mysql restore failed: %w", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
// Restore PVCs
|
|
for _, file := range info.Files {
|
|
if strings.HasSuffix(file, ".tar.gz") {
|
|
pvcName := strings.TrimSuffix(filepath.Base(file), ".tar.gz")
|
|
if err := m.restorePVC(ctx, info.AppName, pvcName, filepath.Join(tempDir, file)); err != nil {
|
|
return fmt.Errorf("pvc restore failed: %w", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (m *Manager) restorePostgres(ctx context.Context, appName, dumpFile string) error {
|
|
dbName := appName
|
|
|
|
podName, err := m.findPodInNamespace(ctx, "postgres", "app=postgres")
|
|
if err != nil {
|
|
return fmt.Errorf("postgres pod not found: %w", err)
|
|
}
|
|
|
|
// Drop and recreate database
|
|
cmd := exec.CommandContext(ctx, "kubectl", "exec", "-n", "postgres",
|
|
podName, "--", "psql", "-U", "postgres", "-c",
|
|
fmt.Sprintf("DROP DATABASE IF EXISTS %s; CREATE DATABASE %s;", dbName, dbName))
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
return fmt.Errorf("failed to recreate database: %w", err)
|
|
}
|
|
|
|
// Restore dump
|
|
dumpData, err := os.ReadFile(dumpFile)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to read dump: %w", err)
|
|
}
|
|
|
|
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-i", "-n", "postgres",
|
|
podName, "--", "psql", "-U", "postgres", dbName)
|
|
cmd.Stdin = strings.NewReader(string(dumpData))
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
return fmt.Errorf("psql restore failed: %w", err)
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func (m *Manager) restoreMySQL(ctx context.Context, appName, dumpFile string) error {
|
|
// Similar implementation to restorePostgres
|
|
// Use mysqldump with password from secret
|
|
return nil
|
|
}
|
|
|
|
func (m *Manager) restorePVC(ctx context.Context, namespace, pvcName, tarFile string) error {
|
|
podName, err := m.findPodUsingPVC(ctx, namespace, pvcName)
|
|
if err != nil {
|
|
return fmt.Errorf("no pod found using PVC: %w", err)
|
|
}
|
|
|
|
mountPath, err := m.getPVCMountPath(ctx, namespace, podName, pvcName)
|
|
if err != nil {
|
|
return fmt.Errorf("failed to get mount path: %w", err)
|
|
}
|
|
|
|
// Copy tar file to pod
|
|
cmd := exec.CommandContext(ctx, "kubectl", "cp", tarFile,
|
|
fmt.Sprintf("%s/%s:/tmp/restore.tar.gz", namespace, podName))
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
return fmt.Errorf("kubectl cp failed: %w", err)
|
|
}
|
|
|
|
// Extract tar file
|
|
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
|
|
podName, "--", "tar", "xzf", "/tmp/restore.tar.gz", "-C", mountPath)
|
|
|
|
if err := cmd.Run(); err != nil {
|
|
return fmt.Errorf("tar extract failed: %w", err)
|
|
}
|
|
|
|
// Clean up temp file
|
|
cmd = exec.CommandContext(ctx, "kubectl", "exec", "-n", namespace,
|
|
podName, "--", "rm", "/tmp/restore.tar.gz")
|
|
cmd.Run() // Ignore error
|
|
|
|
return nil
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 5 hours
|
|
|
|
---
|
|
|
|
### Task 3.3: Restore API Handler
|
|
|
|
**File**: `wild-central-api/internal/api/v1/handlers_backup.go`
|
|
|
|
**Implementation**:
|
|
```go
|
|
func (h *Handler) BackupSnapshotRestore(c *gin.Context) {
|
|
instanceName := c.Param("name")
|
|
snapshotID := c.Param("snapshotId")
|
|
|
|
// Start restore operation asynchronously
|
|
go func() {
|
|
if err := h.backupManager.RestoreFromSnapshot(instanceName, snapshotID); err != nil {
|
|
log.Printf("Restore failed: %v", err)
|
|
}
|
|
}()
|
|
|
|
c.JSON(http.StatusAccepted, gin.H{
|
|
"success": true,
|
|
"message": "Restore operation started",
|
|
})
|
|
}
|
|
```
|
|
|
|
**Estimated Effort**: 1 hour
|
|
|
|
---
|
|
|
|
### Task 3.4: Restore UI
|
|
|
|
**File**: `wild-web-app/src/components/backup/RestoreDialog.tsx`
|
|
|
|
**Implementation**:
|
|
Create dialog that:
|
|
- Lists available snapshots
|
|
- Shows snapshot details (date, size, files)
|
|
- Confirmation before restore
|
|
- Progress indicator
|
|
|
|
**Estimated Effort**: 4 hours
|
|
|
|
---
|
|
|
|
### Task 3.5: End-to-End Restore Testing
|
|
|
|
**Test scenarios**:
|
|
1. List snapshots for app
|
|
2. Select snapshot to restore
|
|
3. Restore database
|
|
4. Restore PVCs
|
|
5. Verify application works after restore
|
|
6. Test error handling
|
|
|
|
**Estimated Effort**: 3 hours
|
|
|
|
---
|
|
|
|
## API Specifications
|
|
|
|
### Complete API Reference
|
|
|
|
```
|
|
# Backup Operations
|
|
POST /api/v1/instances/{name}/backups/app/{appName} # Create app backup
|
|
POST /api/v1/instances/{name}/backups/cluster # Create cluster backup
|
|
GET /api/v1/instances/{name}/backups/app # List app backups
|
|
GET /api/v1/instances/{name}/backups/cluster # List cluster backups
|
|
DELETE /api/v1/instances/{name}/backups/app/{appName}/{id} # Delete app backup
|
|
DELETE /api/v1/instances/{name}/backups/cluster/{id} # Delete cluster backup
|
|
|
|
# Backup Configuration (Phase 2)
|
|
GET /api/v1/instances/{name}/backup/config # Get backup configuration
|
|
PUT /api/v1/instances/{name}/backup/config # Update configuration
|
|
POST /api/v1/instances/{name}/backup/test # Test connection
|
|
POST /api/v1/instances/{name}/backup/init # Initialize repository
|
|
GET /api/v1/instances/{name}/backup/stats # Get repository stats
|
|
|
|
# Restore Operations (Phase 3)
|
|
GET /api/v1/instances/{name}/backup/snapshots # List snapshots
|
|
POST /api/v1/instances/{name}/backup/snapshots/{id}/restore # Restore snapshot
|
|
```
|
|
|
|
---
|
|
|
|
## Web UI Design
|
|
|
|
### Page Structure
|
|
|
|
**BackupsPage Layout**:
|
|
```
|
|
┌─────────────────────────────────────────────────┐
|
|
│ Backups │
|
|
├─────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─ Backup Status ─────────────────────────┐ │
|
|
│ │ Repository: Configured ✓ │ │
|
|
│ │ Last Backup: 2 hours ago │ │
|
|
│ │ Total Size: 2.4 GB │ │
|
|
│ │ Snapshots: 24 │ │
|
|
│ │ [Edit Configuration] │ │
|
|
│ └─────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─ Recent Backups ────────────────────────┐ │
|
|
│ │ [Backup cards with restore/delete] │ │
|
|
│ │ ... │ │
|
|
│ └─────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─ Configuration (when editing) ──────────┐ │
|
|
│ │ Backend Type: [S3 ▼] │ │
|
|
│ │ Repository URI: [s3:bucket/path ] │ │
|
|
│ │ Credentials: │ │
|
|
│ │ Access Key ID: [••••••••••• ] │ │
|
|
│ │ Secret Key: [•••••••••••••••• ] │ │
|
|
│ │ Retention Policy: │ │
|
|
│ │ Daily: [7] Weekly: [4] Monthly: [6] │ │
|
|
│ │ [Test Connection] [Save] [Cancel] │ │
|
|
│ └─────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Component Hierarchy
|
|
|
|
```
|
|
BackupsPage
|
|
├── BackupStatusCard (read-only)
|
|
│ ├── RepositoryStatus
|
|
│ ├── Stats (size, snapshots, last backup)
|
|
│ └── EditButton
|
|
│
|
|
├── BackupListSection
|
|
│ └── BackupCard[] (existing)
|
|
│
|
|
└── BackupConfigurationCard (conditional)
|
|
├── BackendTypeSelect
|
|
├── RepositoryUriInput
|
|
├── CredentialsSection
|
|
│ ├── S3CredentialsForm (conditional)
|
|
│ ├── SFTPCredentialsForm (conditional)
|
|
│ └── ...
|
|
├── RetentionPolicyInputs
|
|
└── ActionButtons
|
|
├── TestConnectionButton
|
|
├── SaveButton
|
|
└── CancelButton
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Phase 1 Testing
|
|
|
|
**Unit Tests**:
|
|
- Manifest parsing
|
|
- Helper functions (contains, findPodInNamespace)
|
|
- Backup file creation
|
|
|
|
**Integration Tests**:
|
|
- End-to-end Gitea backup (PostgreSQL + PVC)
|
|
- End-to-end Immich backup (PostgreSQL + multiple PVCs)
|
|
- Backup with no database
|
|
- Backup with no PVCs
|
|
|
|
**Manual Tests**:
|
|
1. Create backup via web UI
|
|
2. Verify `.sql` file exists with actual data
|
|
3. Verify `.tar.gz` files exist with actual data
|
|
4. Check metadata accuracy
|
|
5. Test delete functionality
|
|
|
|
### Phase 2 Testing
|
|
|
|
**Unit Tests**:
|
|
- Backend type detection
|
|
- Environment variable mapping
|
|
- Configuration validation
|
|
|
|
**Integration Tests**:
|
|
- Repository initialization (local, S3, SFTP)
|
|
- Backup upload to restic
|
|
- Snapshot listing
|
|
- Stats retrieval
|
|
- Connection testing
|
|
|
|
**Manual Tests**:
|
|
1. Configure local repository via UI
|
|
2. Configure S3 repository via UI
|
|
3. Test connection validation before save
|
|
4. Create backup and verify in restic
|
|
5. Check repository stats display
|
|
6. Test error handling for bad credentials
|
|
|
|
### Phase 3 Testing
|
|
|
|
**Integration Tests**:
|
|
- Restore database from snapshot
|
|
- Restore PVC from snapshot
|
|
- Full app restore
|
|
- Handle missing/corrupted snapshots
|
|
|
|
**Manual Tests**:
|
|
1. List snapshots in UI
|
|
2. Select and restore from snapshot
|
|
3. Verify database data after restore
|
|
4. Verify PVC data after restore
|
|
5. Verify application functions correctly
|
|
|
|
---
|
|
|
|
## Deployment Guide
|
|
|
|
### Phase 1 Deployment
|
|
|
|
**Preparation**:
|
|
1. Update wild-central-api code
|
|
2. Build and test on development instance
|
|
3. Verify backup files created with real data
|
|
4. Test manual restore
|
|
|
|
**Rollout**:
|
|
1. Deploy to staging environment
|
|
2. Create test backups for multiple apps
|
|
3. Verify all backup files exist
|
|
4. Manually restore one backup to verify
|
|
5. Deploy to production
|
|
|
|
**Rollback Plan**:
|
|
- Previous version still creates metadata files
|
|
- No breaking changes to backup structure
|
|
- Users can manually copy backup files if needed
|
|
|
|
### Phase 2 Deployment
|
|
|
|
**Preparation**:
|
|
1. Install restic on Wild Central devices: `apt install restic`
|
|
2. Update wild-central-api with restic code
|
|
3. Update wild-web-app with configuration UI
|
|
4. Test on development with local repository
|
|
5. Test with S3 and SFTP backends
|
|
|
|
**Migration**:
|
|
- Existing local backups remain accessible
|
|
- Users opt-in to restic by configuring repository
|
|
- Gradual migration: Phase 1 staging continues working
|
|
|
|
**Rollout**:
|
|
1. Deploy backend API updates
|
|
2. Deploy web UI updates
|
|
3. Create user documentation with examples
|
|
4. Provide migration guide for existing setups
|
|
|
|
**Rollback Plan**:
|
|
- Restic is optional: users can continue using local backups
|
|
- Configuration in config.yaml: easy to revert
|
|
- No data loss: existing backups preserved
|
|
|
|
### Phase 3 Deployment
|
|
|
|
**Preparation**:
|
|
1. Ensure Phase 2 is stable
|
|
2. Ensure at least one backup exists in restic
|
|
3. Test restore in staging environment
|
|
|
|
**Rollout**:
|
|
1. Deploy restore functionality
|
|
2. Document restore procedures
|
|
3. Train users on restore process
|
|
|
|
---
|
|
|
|
## Task Breakdown
|
|
|
|
### Phase 1 Tasks (2-3 days)
|
|
|
|
| Task | Description | Effort | Dependencies |
|
|
|------|-------------|--------|--------------|
|
|
| 1.1 | Manifest-based database detection | 2h | None |
|
|
| 1.2 | PostgreSQL backup via kubectl exec | 3h | 1.1 |
|
|
| 1.3 | MySQL backup via kubectl exec | 3h | 1.1 |
|
|
| 1.4 | PVC discovery and backup | 4h | 1.1 |
|
|
| 1.5 | Update BackupApp flow | 4h | 1.2, 1.3, 1.4 |
|
|
| 1.6 | Build and test | 4h | 1.5 |
|
|
|
|
**Total**: 20 hours (2.5 days)
|
|
|
|
### Phase 2 Tasks (5-7 days)
|
|
|
|
| Task | Description | Effort | Dependencies |
|
|
|------|-------------|--------|--------------|
|
|
| 2.1 | Configuration management | 3h | Phase 1 done |
|
|
| 2.2 | Restic operations module | 4h | 2.1 |
|
|
| 2.3 | Update backup flow for restic | 2h | 2.2 |
|
|
| 2.4 | API client updates | 2h | Phase 1 done |
|
|
| 2.5 | Configuration UI components | 8h | 2.4 |
|
|
| 2.6 | Integrate with BackupsPage | 3h | 2.5 |
|
|
| 2.7 | Backup configuration API handlers | 4h | 2.1, 2.2 |
|
|
| 2.8 | End-to-end testing | 4h | 2.3, 2.6, 2.7 |
|
|
|
|
**Total**: 30 hours (3.75 days)
|
|
|
|
### Phase 3 Tasks (3-5 days)
|
|
|
|
| Task | Description | Effort | Dependencies |
|
|
|------|-------------|--------|--------------|
|
|
| 3.1 | List snapshots API | 2h | Phase 2 done |
|
|
| 3.2 | Restore snapshot function | 5h | 3.1 |
|
|
| 3.3 | Restore API handler | 1h | 3.2 |
|
|
| 3.4 | Restore UI | 4h | 3.3 |
|
|
| 3.5 | End-to-end restore testing | 3h | 3.4 |
|
|
|
|
**Total**: 15 hours (2 days)
|
|
|
|
### Grand Total
|
|
**65 hours** across 3 phases (8-12 days total)
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Phase 1 Success
|
|
- ✅ App backups create actual database dumps (`.sql` files)
|
|
- ✅ App backups create actual PVC archives (`.tar.gz` files)
|
|
- ✅ Backup metadata accurately lists all files
|
|
- ✅ Backups organized in timestamped directories
|
|
- ✅ In-progress tracking works correctly
|
|
- ✅ Delete functionality works for both app and cluster backups
|
|
- ✅ No silent failures (clear error messages)
|
|
- ✅ Manual restore verified working
|
|
|
|
### Phase 2 Success
|
|
- ✅ Users can configure restic repository via web UI
|
|
- ✅ Configuration persists to config.yaml/secrets.yaml
|
|
- ✅ Test connection validates before save
|
|
- ✅ Backups automatically upload to restic repository
|
|
- ✅ Repository stats display correctly in UI
|
|
- ✅ Local, S3, and SFTP backends supported and tested
|
|
- ✅ Clear error messages for authentication/connection failures
|
|
- ✅ Staging files cleaned after successful upload
|
|
|
|
### Phase 3 Success
|
|
- ✅ Users can list available snapshots in UI
|
|
- ✅ Users can restore from any snapshot via UI
|
|
- ✅ Database restoration works correctly
|
|
- ✅ PVC restoration works correctly
|
|
- ✅ Application functional after restore
|
|
- ✅ Error handling for corrupted snapshots
|
|
|
|
### Long-Term Metrics
|
|
- **Storage Efficiency**: Deduplication achieves 60-80% space savings
|
|
- **Reliability**: < 1% backup failures
|
|
- **Performance**: Backup TB-scale data in < 4 hours
|
|
- **User Satisfaction**: Backup/restore completes without support intervention
|
|
|
|
---
|
|
|
|
## Dependencies and Prerequisites
|
|
|
|
### External Dependencies
|
|
|
|
**Restic** (backup tool):
|
|
- Installation: `apt install restic`
|
|
- Version: >= 0.16.0 recommended
|
|
- License: BSD 2-Clause (compatible)
|
|
|
|
**kubectl** (Kubernetes CLI):
|
|
- Already required for Wild Cloud operations
|
|
- Used for database dumps and PVC backup
|
|
|
|
### Infrastructure Prerequisites
|
|
|
|
**Storage Requirements**:
|
|
|
|
**Staging Directory**:
|
|
- Location: `/var/lib/wild-central/backup-staging` (default)
|
|
- Space: `max(largest_database, largest_pvc) + 20% buffer`
|
|
- Recommendation: Monitor space, warn if < 50GB free
|
|
|
|
**Restic Repository**:
|
|
- Local: Sufficient disk space on target mount
|
|
- Network: Mounted filesystem (NFS/SMB)
|
|
- Cloud: Typically unlimited, check quota/billing
|
|
|
|
**Network Requirements**:
|
|
- Outbound HTTPS (443) for S3/B2/cloud backends
|
|
- Outbound SSH (22 or custom) for SFTP
|
|
- No inbound ports needed
|
|
|
|
### Security Considerations
|
|
|
|
**Credentials Storage**:
|
|
- Stored in secrets.yaml
|
|
- Never logged or exposed in API responses
|
|
- Transmitted only via HTTPS to backend APIs
|
|
|
|
**Encryption**:
|
|
- Restic: AES-256 encryption of all backup data
|
|
- Transport: TLS for cloud backends, SSH for SFTP
|
|
- At rest: Depends on backend (S3 server-side encryption, etc.)
|
|
|
|
**Access Control**:
|
|
- API endpoints check instance ownership
|
|
- Repository password required for all restic operations
|
|
- Backend credentials validated before save
|
|
|
|
---
|
|
|
|
## Philosophy Compliance Review
|
|
|
|
### KISS (Keep It Simple, Stupid)
|
|
|
|
✅ **What We're Doing Right**:
|
|
- Restic repository URI as simple string (native format)
|
|
- Backend type auto-detected from URI prefix
|
|
- Credentials organized by backend type
|
|
- No complex abstraction layers
|
|
|
|
✅ **What We're Avoiding**:
|
|
- Custom backup format
|
|
- Complex configuration DSL
|
|
- Over-abstracted backend interfaces
|
|
- Scheduling/automation (not needed yet)
|
|
|
|
### YAGNI (You Aren't Gonna Need It)
|
|
|
|
✅ **Building Only What's Needed**:
|
|
- Basic configuration (repository, credentials, retention)
|
|
- Test connection before save
|
|
- Upload to restic after staging
|
|
- Display repository stats
|
|
|
|
❌ **Not Building** (until proven needed):
|
|
- Automated scheduling
|
|
- Multiple repository support
|
|
- Backup verification automation
|
|
- Email notifications
|
|
- Bandwidth limiting
|
|
- Custom encryption options
|
|
|
|
### No Future-Proofing
|
|
|
|
✅ **Current Requirements Only**:
|
|
- Support TB-scale data (restic deduplication)
|
|
- Flexible storage destinations (restic backends)
|
|
- Storage constraints (upload to remote, not local-only)
|
|
|
|
❌ **Not Speculating On**:
|
|
- "What if users want backup versioning rules?"
|
|
- "What if users need bandwidth control?"
|
|
- "What if users want custom encryption?"
|
|
- Build these features WHEN users ask, not before
|
|
|
|
### Trust in Emergence
|
|
|
|
✅ **Starting Simple**:
|
|
- Phase 1: Fix core backup (files actually created)
|
|
- Phase 2: Add restic upload (storage flexibility)
|
|
- Phase 3: Add restore from restic
|
|
- Phase 4+: Wait for user feedback
|
|
|
|
**Let complexity emerge from actual needs**, not speculation.
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
This complete implementation guide provides everything needed to implement a production-ready backup system for Wild Cloud across three phases:
|
|
|
|
1. **Phase 1 (CRITICAL)**: Fix broken app backups by creating actual database dumps and PVC archives using manifest-based detection and kubectl exec
|
|
2. **Phase 2 (HIGH)**: Integrate restic for TB-scale data, flexible storage backends, and configuration via web UI
|
|
3. **Phase 3 (MEDIUM)**: Enable restore from restic snapshots
|
|
|
|
All phases are designed following Wild Cloud's KISS/YAGNI philosophy: build only what's needed now, let complexity emerge from actual requirements, and trust that good architecture emerges from simplicity.
|
|
|
|
The implementation is ready for a senior engineer to begin Phase 1 immediately with all necessary context, specifications, code examples, and guidance provided.
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Created**: 2025-11-26
|
|
**Status**: Ready for implementation
|
|
**Next Action**: Begin Phase 1, Task 1.1
|