Files
wild-cloud-dev/ai/wildcloud-v.PoC/setup-process.md
2025-10-11 18:08:04 +00:00

12 KiB

Wild Cloud Setup Process & Infrastructure

Wild Cloud provides a complete, production-ready Kubernetes infrastructure designed for personal use. It combines enterprise-grade technologies to create a self-hosted cloud platform with automated deployment, HTTPS certificates, and web management interfaces.

Setup Phases Overview

The Wild Cloud setup follows a sequential, dependency-aware process:

  1. Environment Setup - Install required tools and dependencies
  2. DNS/Network Foundation - Set up dnsmasq for DNS and PXE booting
  3. Cluster Infrastructure - Deploy Talos Linux nodes and Kubernetes cluster
  4. Cluster Services - Install core services (ingress, storage, certificates, etc.)

Phase 1: Environment Setup

Dependencies Installation

Script: scripts/setup-utils.sh

Required Tools:

  • kubectl - Kubernetes CLI
  • gomplate - Template processor for configuration
  • kustomize - Kubernetes configuration management
  • yq - YAML processor
  • restic - Backup tool
  • talosctl - Talos Linux cluster management

Project Initialization

Command: wild-init

Creates the basic Wild Cloud directory structure:

  • .wildcloud/ - Project marker and cache
  • config.yaml - Main configuration file
  • secrets.yaml - Sensitive data storage
  • Basic project scaffolding

Phase 2: DNS/Network Foundation

dnsmasq Infrastructure

Location: setup/dnsmasq/ Requirements: Dedicated Linux machine with static IP

Services Provided:

  1. LAN DNS Server

    • Forwards internal domains (*.internal.domain.com) to cluster
    • Forwards external domains (*.domain.com) to cluster
    • Provides DNS resolution for entire network
  2. PXE Boot Server

    • Enables network booting for cluster node installation
    • DHCP/TFTP services for Talos Linux deployment
    • Automated node provisioning

Network Configuration Example:

network:
  subnet: 192.168.1.0/24
  gateway: 192.168.1.1
  dnsmasq_ip: 192.168.1.50
  dhcp_range: 192.168.1.100-200
  metallb_pool: 192.168.1.80-89
  control_plane_vip: 192.168.1.90
  node_ips: 192.168.1.91-93

Phase 3: Cluster Infrastructure Setup

Talos Linux Foundation

Command: wild-setup-cluster

Talos Configuration:

  • Version: v1.11.0 (configurable)
  • Immutable OS: Designed specifically for Kubernetes
  • System Extensions:
    • Intel microcode updates
    • iSCSI tools for storage
    • gVisor container runtime
    • NVIDIA GPU support (optional)
    • Additional system utilities

Cluster Setup Process

1. Configuration Generation

Script: wild-cluster-config-generate

  • Generates base Talos configurations (controlplane.yaml, worker.yaml)
  • Creates cluster secrets using talosctl gen config
  • Establishes foundation for all node configurations

2. Node Setup (Atomic Operations)

Script: wild-node-setup <node-name> [options]

Complete Node Lifecycle Management:

  • Hardware Detection: Discovers network interfaces and storage devices
  • Configuration Generation: Creates node-specific patches and final configs
  • Deployment: Applies Talos configuration to the node

Options:

  • --detect: Force hardware re-detection
  • --no-deploy: Generate configuration only, skip deployment

Integration with Cluster Setup:

  • wild-setup-cluster automatically calls wild-node-setup for each node
  • Individual node failures don't break cluster setup
  • Clear retry instructions for failed nodes

Cluster Architecture

Control Plane:

  • 3 nodes for high availability
  • Virtual IP (VIP) for load balancing
  • etcd distributed across all control plane nodes

Worker Nodes:

  • Variable count (configured during setup)
  • Dedicated workload execution
  • Storage participation via Longhorn

Networking:

  • All nodes on same LAN segment
  • Sequential IP assignment
  • MetalLB integration for load balancing

Phase 4: Cluster Services Installation

Services Deployment Process

Command: wild-setup-services [options]

  • --fetch: Fetch fresh templates before setup
  • --no-deploy: Configure only, skip deployment

New Architecture: Per-service atomic operations

  • Uses wild-service-setup <service> for each service in dependency order
  • Each service handles complete lifecycle: fetch → configure → deploy
  • Dependency validation before each service deployment
  • Fail-fast with clear recovery instructions

Individual Service Management: wild-service-setup <service> [options]

  • Default: Configure and deploy using existing templates
  • --fetch: Fetch fresh templates before setup
  • --no-deploy: Configure only, skip deployment

Core Services (Installed in Order)

1. MetalLB Load Balancer

Location: setup/cluster-services/metallb/

  • Purpose: Provides load balancing for bare metal clusters
  • Functionality: Assigns external IPs to Kubernetes services
  • Configuration: IP address pool from local network range
  • Integration: Foundation for ingress traffic routing

2. Longhorn Distributed Storage

Location: setup/cluster-services/longhorn/

  • Purpose: Distributed block storage for persistent volumes
  • Features:
    • Cross-node data replication
    • Snapshot and backup capabilities
    • Volume expansion and management
    • Web-based management interface
  • Storage: Uses local disks from all cluster nodes

3. Traefik Ingress Controller

Location: setup/cluster-services/traefik/

  • Purpose: HTTP/HTTPS reverse proxy and ingress controller
  • Features:
    • Automatic service discovery
    • TLS termination
    • Load balancing and routing
    • Gateway API support
  • Integration: Works with MetalLB for external traffic

4. CoreDNS

Location: setup/cluster-services/coredns/

  • Purpose: DNS resolution for cluster services
  • Integration: Connects with external DNS providers
  • Functionality: Service discovery and DNS forwarding

5. cert-manager

Location: setup/cluster-services/cert-manager/

  • Purpose: Automatic TLS certificate management
  • Features:
    • Let's Encrypt integration
    • Automatic certificate issuance and renewal
    • Multiple certificate authorities support
    • Certificate lifecycle management

6. ExternalDNS

Location: setup/cluster-services/externaldns/

  • Purpose: Automatic DNS record management
  • Functionality:
    • Syncs Kubernetes services with DNS providers
    • Automatic A/CNAME record creation
    • Supports multiple DNS providers (Cloudflare, Route53, etc.)

7. Kubernetes Dashboard

Location: setup/cluster-services/kubernetes-dashboard/

  • Purpose: Web UI for cluster management
  • Access: https://dashboard.internal.domain.com
  • Authentication: Token-based access via wild-dashboard-token
  • Features: Resource management, monitoring, troubleshooting

8. NFS Storage (Optional)

Location: setup/cluster-services/nfs/

  • Purpose: Network file system for shared storage
  • Use Cases: Media storage, backups, shared data
  • Integration: Mounted as persistent volumes in applications

9. Docker Registry

Location: setup/cluster-services/docker-registry/

  • Purpose: Private container registry
  • Features: Store custom images locally
  • Integration: Used by applications and CI/CD pipelines

Infrastructure Components Deep Dive

DNS and Domain Architecture

Internet → External DNS → MetalLB LoadBalancer → Traefik → Kubernetes Services
                               ↑
                           Internal DNS (dnsmasq)
                               ↑
                         Internal Network

Domain Types:

  • External: app.domain.com - Public-facing services
  • Internal: app.internal.domain.com - Admin interfaces only
  • Resolution: dnsmasq forwards all domain traffic to cluster

Certificate and TLS Management

Automatic Certificate Flow:

  1. Service deployed with ingress annotation
  2. cert-manager detects certificate requirement
  3. Let's Encrypt challenge initiated
  4. Certificate issued and stored in Kubernetes secret
  5. Traefik uses certificate for TLS termination
  6. Automatic renewal before expiration

Storage Architecture

Longhorn Distributed Storage:

  • Block-level replication across nodes
  • Default 3-replica policy for data durability
  • Automatic failover and recovery
  • Snapshot and backup capabilities
  • Web UI for management and monitoring

Storage Classes:

  • longhorn - Default replicated storage
  • longhorn-single - Single replica for non-critical data
  • nfs - Shared network storage (if configured)

Network Traffic Flow

External Request Flow:

  1. DNS resolution via dnsmasq → cluster IP
  2. Traffic hits MetalLB load balancer
  3. MetalLB forwards to Traefik ingress
  4. Traefik terminates TLS and routes to service
  5. Service forwards to appropriate pod
  6. Response follows reverse path

High Availability Features

Control Plane HA:

  • 3 control plane nodes with leader election
  • Virtual IP for API server access
  • etcd cluster with automatic failover
  • Distributed workload scheduling

Storage HA:

  • Longhorn 3-way replication
  • Automatic replica placement across nodes
  • Node failure recovery
  • Data integrity verification

Networking HA:

  • MetalLB speaker pods on all nodes
  • Automatic load balancer failover
  • Multiple ingress controller replicas

Hardware Requirements

Minimum Specifications

  • Nodes: 3 control plane + optional workers
  • RAM: 8GB minimum per node (16GB+ recommended)
  • CPU: 4 cores minimum per node
  • Storage: 100GB+ local storage per node
  • Network: Gigabit ethernet connectivity

Network Requirements

  • All nodes on same LAN segment
  • Static IP assignments or DHCP reservations
  • dnsmasq server accessible by all nodes
  • Internet connectivity for image pulls and Let's Encrypt
  • Control Plane: 16GB RAM, 8 cores, 200GB NVMe SSD
  • Workers: 32GB RAM, 16 cores, 500GB NVMe SSD
  • Network: Dedicated VLAN or network segment
  • Redundancy: UPS protection, dual network interfaces

Configuration Management

Configuration Files

  • config.yaml - Main configuration (domains, network, apps)
  • secrets.yaml - Sensitive data (passwords, API keys, certificates)
  • .wildcloud/ - Cache and temporary files

Template System

gomplate Integration:

  • All configurations processed as templates
  • Access to config and secrets via template variables
  • Dynamic configuration generation
  • Environment-specific customization

Configuration Commands

# Read configuration values
wild-config cluster.name
wild-config apps.ghost.domain

# Set configuration values
wild-config-set cloud.domain "example.com"
wild-config-set cluster.nodeCount 5

# Secret management
wild-secret apps.database.password
wild-secret-set apps.api.key "secret-value"

Setup Commands Reference

Complete Setup

wild-init              # Initialize project
wild-setup             # Complete automated setup

Phase-by-Phase Setup

wild-setup-cluster     # Cluster infrastructure only
wild-setup-services    # Cluster services only

Individual Operations

wild-cluster-config-generate     # Generate base configs
wild-node-setup <node-name>      # Complete node setup (detect → configure → deploy)
wild-node-setup <node-name> --detect    # Force hardware re-detection
wild-node-setup <node-name> --no-deploy # Configuration only
wild-dashboard-token            # Get dashboard access
wild-health                     # System health check

Troubleshooting and Validation

Health Checks

wild-health                     # Overall system status
kubectl get nodes              # Node status
kubectl get pods -A            # All pod status
talosctl health                # Talos cluster health

Service Validation

kubectl get svc -n metallb-system     # MetalLB status
kubectl get pods -n longhorn-system   # Storage status
kubectl get pods -n traefik          # Ingress status
kubectl get certificates -A           # Certificate status

Log Analysis

talosctl logs -f machined             # Talos system logs
kubectl logs -n traefik deployment/traefik  # Ingress logs
kubectl logs -n cert-manager deployment/cert-manager  # Certificate logs

This comprehensive setup process creates a production-ready personal cloud infrastructure with enterprise-grade reliability, security, and management capabilities.