Update docs.

2025-08-31 14:30:09 -07:00
parent 3b8b6de338
commit 1aa9f1050d
22 changed files with 230 additions and 1083 deletions
--- a/docs/MAINTENANCE.md
+++ b/docs/MAINTENANCE.md
@@ -1,328 +1,22 @@
 # Maintenance Guide

-This guide covers essential maintenance tasks for your personal cloud infrastructure, including troubleshooting, backups, updates, and security best practices.
+Keep your wild cloud running smoothly.
+
+- [Security Best Practices](./guides/security.md)
+- [Monitoring](./guides/monitoring.md)
+- [Backup and Restore](./guides/backup-and-restore.md)
+
+## Upgrade
+
+- [Upgrade applications](./guides/upgrade-applications.md)
+- [Upgrade kubernetes](./guides/upgrade-kubernetes.md)
+- [Upgrade Talos](./guides/upgrade-talos.md)
+- [Upgrade Wild Cloud](./guides/upgrade-wild-cloud.md)

 ## Troubleshooting

-### General Troubleshooting Steps
-
-1. **Check Component Status**:
-   ```bash
-   # Check all pods across all namespaces
-   kubectl get pods -A
-   
-   # Look for pods that aren't Running or Ready
-   kubectl get pods -A | grep -v "Running\|Completed"
-   ```
-
-2. **View Detailed Pod Information**:
-   ```bash
-   # Get detailed info about problematic pods
-   kubectl describe pod <pod-name> -n <namespace>
-   
-   # Check pod logs
-   kubectl logs <pod-name> -n <namespace>
-   ```
-
-3. **Run Validation Script**:
-   ```bash
-   ./infrastructure_setup/validate_setup.sh
-   ```
-
-4. **Check Node Status**:
-   ```bash
-   kubectl get nodes
-   kubectl describe node <node-name>
-   ```
-
-### Common Issues
-
-#### Certificate Problems
-
-If services show invalid certificates:
-
-1. Check certificate status:
-   ```bash
-   kubectl get certificates -A
-   ```
-
-2. Examine certificate details:
-   ```bash
-   kubectl describe certificate <cert-name> -n <namespace>
-   ```
-
-3. Check for cert-manager issues:
-   ```bash
-   kubectl get pods -n cert-manager
-   kubectl logs -l app=cert-manager -n cert-manager
-   ```
-
-4. Verify the Cloudflare API token is correctly set up:
-   ```bash
-   kubectl get secret cloudflare-api-token -n internal
-   ```
-
-#### DNS Issues
-
-If DNS resolution isn't working properly:
-
-1. Check CoreDNS status:
-   ```bash
-   kubectl get pods -n kube-system -l k8s-app=kube-dns
-   kubectl logs -l k8s-app=kube-dns -n kube-system
-   ```
-
-2. Verify CoreDNS configuration:
-   ```bash
-   kubectl get configmap -n kube-system coredns -o yaml
-   ```
-
-3. Test DNS resolution from inside the cluster:
-   ```bash
-   kubectl run -i --tty --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default
-   ```
-
-#### Service Connectivity
-
-If services can't communicate:
-
-1. Check network policies:
-   ```bash
-   kubectl get networkpolicies -A
-   ```
-
-2. Verify service endpoints:
-   ```bash
-   kubectl get endpoints -n <namespace>
-   ```
-
-3. Test connectivity from within the cluster:
-   ```bash
-   kubectl run -i --tty --rm debug --image=busybox --restart=Never -- wget -O- <service-name>.<namespace>
-   ```
-
-## Backup and Restore
-
-### What to Back Up
-
-1. **Persistent Data**:
-   - Database volumes
-   - Application storage
-   - Configuration files
-
-2. **Kubernetes Resources**:
-   - Custom Resource Definitions (CRDs)
-   - Deployments, Services, Ingresses
-   - Secrets and ConfigMaps
-
-### Backup Methods
-
-#### Simple Backup Script
-
-Create a backup script at `bin/backup.sh` (to be implemented):
-
-```bash
-#!/bin/bash
-# Simple backup script for your personal cloud
-# This is a placeholder for future implementation
-
-BACKUP_DIR="/path/to/backups/$(date +%Y-%m-%d)"
-mkdir -p "$BACKUP_DIR"
-
-# Back up Kubernetes resources
-kubectl get all -A -o yaml > "$BACKUP_DIR/all-resources.yaml"
-kubectl get secrets -A -o yaml > "$BACKUP_DIR/secrets.yaml"
-kubectl get configmaps -A -o yaml > "$BACKUP_DIR/configmaps.yaml"
-
-# Back up persistent volumes
-# TODO: Add logic to back up persistent volume data
-
-echo "Backup completed: $BACKUP_DIR"
-```
-
-#### Using Velero (Recommended for Future)
-
-[Velero](https://velero.io/) is a powerful backup solution for Kubernetes:
-
-```bash
-# Install Velero (future implementation)
-helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
-helm install velero vmware-tanzu/velero --namespace velero --create-namespace
-
-# Create a backup
-velero backup create my-backup --include-namespaces default,internal
-
-# Restore from backup
-velero restore create --from-backup my-backup
-```
-
-### Database Backups
-
-For database services, set up regular dumps:
-
-```bash
-# PostgreSQL backup (placeholder)
-kubectl exec <postgres-pod> -n <namespace> -- pg_dump -U <username> <database> > backup.sql
-
-# MariaDB/MySQL backup (placeholder)
-kubectl exec <mariadb-pod> -n <namespace> -- mysqldump -u root -p<password> <database> > backup.sql
-```
-
-## Updates
-
-### Updating Kubernetes (K3s)
-
-1. Check current version:
-   ```bash
-   k3s --version
-   ```
-
-2. Update K3s:
-   ```bash
-   curl -sfL https://get.k3s.io | sh -
-   ```
-
-3. Verify the update:
-   ```bash
-   k3s --version
-   kubectl get nodes
-   ```
-
-### Updating Infrastructure Components
-
-1. Update the repository:
-   ```bash
-   git pull
-   ```
-
-2. Re-run the setup script:
-   ```bash
-   ./infrastructure_setup/setup-all.sh
-   ```
-
-3. Or update specific components:
-   ```bash
-   ./infrastructure_setup/setup-cert-manager.sh
-   ./infrastructure_setup/setup-dashboard.sh
-   # etc.
-   ```
-
-### Updating Applications
-
-For Helm chart applications:
-
-```bash
-# Update Helm repositories
-helm repo update
-
-# Upgrade a specific application
-./bin/helm-install <chart-name> --upgrade
-```
-
-For services deployed with `deploy-service`:
-
-```bash
-# Edit the service YAML
-nano services/<service-name>/service.yaml
-
-# Apply changes
-kubectl apply -f services/<service-name>/service.yaml
-```
-
-## Security
-
-### Best Practices
-
-1. **Keep Everything Updated**:
-   - Regularly update K3s
-   - Update all infrastructure components
-   - Keep application images up to date
-
-2. **Network Security**:
-   - Use internal services whenever possible
-   - Limit exposed services to only what's necessary
-   - Configure your home router's firewall properly
-
-3. **Access Control**:
-   - Use strong passwords for all services
-   - Implement a secrets management strategy
-   - Rotate API tokens and keys regularly
-
-4. **Regular Audits**:
-   - Review running services periodically
-   - Check for unused or outdated deployments
-   - Monitor resource usage for anomalies
-
-### Security Scanning (Future Implementation)
-
-Tools to consider implementing:
-
-1. **Trivy** for image scanning:
-   ```bash
-   # Example Trivy usage (placeholder)
-   trivy image <your-image>
-   ```
-
-2. **kube-bench** for Kubernetes security checks:
-   ```bash
-   # Example kube-bench usage (placeholder)
-   kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
-   ```
-
-3. **Falco** for runtime security monitoring:
-   ```bash
-   # Example Falco installation (placeholder)
-   helm repo add falcosecurity https://falcosecurity.github.io/charts
-   helm install falco falcosecurity/falco --namespace falco --create-namespace
-   ```
-
-## System Health Monitoring
-
-### Basic Monitoring
-
-Check system health with:
-
-```bash
-# Node resource usage
-kubectl top nodes
-
-# Pod resource usage
-kubectl top pods -A
-
-# Persistent volume claims
-kubectl get pvc -A
-```
-
-### Advanced Monitoring (Future Implementation)
-
-Consider implementing:
-
-1. **Prometheus + Grafana** for comprehensive monitoring:
-   ```bash
-   # Placeholder for future implementation
-   helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-   helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
-   ```
-
-2. **Loki** for log aggregation:
-   ```bash
-   # Placeholder for future implementation
-   helm repo add grafana https://grafana.github.io/helm-charts
-   helm install loki grafana/loki-stack --namespace logging --create-namespace
-   ```
-
-## Additional Resources
-
-This document will be expanded in the future with:
-
- Detailed backup and restore procedures
- Monitoring setup instructions
- Comprehensive security hardening guide
- Automated maintenance scripts
-
-For now, refer to the following external resources:
-
- [K3s Documentation](https://docs.k3s.io/)
- [Kubernetes Troubleshooting Guide](https://kubernetes.io/docs/tasks/debug/)
- [Velero Backup Documentation](https://velero.io/docs/latest/)
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
+- [Cluster issues](./guides/troubleshoot-cluster.md)
+- [DNS issues](./guides/troubleshoot-dns.md)
+- [Service connectivity issues](./guides/troubleshoot-service-connectivity.md)
+- [TLS certificate issues](./guides/troubleshoot-tls-certificates.md)
+- [Visibility issues](./guides/troubleshoot-visibility.md)
--- a/docs/SETUP.md
+++ b/docs/SETUP.md
@@ -1,23 +1,3 @@
 # Setting Up Your Wild Cloud

-Install dependencies:
-
-```bash
-scripts/setup-utils.sh
-```
-
-Add the `bin` directory to your path.
-
-Initialize a personal wild-cloud in any empty directory, for example:
-
-```bash
-cd ~
-mkdir ~/my-wild-cloud
-cd my-wild-cloud
-```
-
-Run:
-
-```bash
-wild-setup
-```
+Visit https://mywildcloud.org/get-started for full wild cloud setup instructions.
--- a/docs/SETUP_FULL.md
+++ b/docs/SETUP_FULL.md
@@ -1,114 +0,0 @@
-# Wild Cloud Setup
-
-## Hardware prerequisites
-
-Procure the following before setup:
-
- Any machine for running setup and managing your cloud.
- One small machine for dnsmasq (running Ubuntu linux)
- Three machines for control nodes (2GB memory, 100GB hard drive).
- Any number of worker node machines.
- A network switch connecting all these machines to your router.
- A network router (e.g. Fluke 2) connected to the Internet.
- A domain of your choice registerd (or managed) on Cloudflare.
-
-## Setup
-
-Clone this repo (you probably already did this).
-
-```bash
-source env.sh
-```
-
-Initialize a personal wild-cloud in any empty directory, for example:
-
-```bash
-cd ~
-mkdir ~/my-wild-cloud
-cd my-wild-cloud
-
-wild-setup-scaffold
-```
-
-## Download Cluster Node Boot Assets
-
-We use Talos linux for node operating systems. Run this script to download the OS for use in the rest of the setup.
-
-```bash
-# Generate node boot assets (PXE, iPXE, ISO)
-wild-cluster-node-boot-assets-download
-```
-
-## Dnsmasq
-
- Install a Linux machine on your LAN. Record it's IP address in your `config:cloud.dns.ip`.
- Ensure it is accessible with ssh.
-
-```bash
-# Install dnsmasq with PXE boot support
-wild-dnsmasq-install --install
-```
-
-## Cluster Setup
-
-### Cluster Infrastructure Setup
-
-```bash
-# Configure network, cluster settings, and register nodes
-wild-setup-cluster
-```
-
-This interactive script will:
- Configure network settings (router IP, DNS, DHCP range)
- Configure cluster settings (Talos version, schematic ID, MetalLB pool)
- Help you register control plane and worker nodes by detecting their hardware
- Generate machine configurations for each node
- Apply machine configurations to nodes
- Bootstrap the cluster after the first node.
-
-### Install Cluster Services
-
-```bash
-wild-setup-services
-```
-
-## Installing Wild Cloud Apps
-
-```bash
-# List available applications
-wild-apps-list
-
-# Deploy an application
-wild-app-deploy <app-name>
-
-# Check app status
-wild-app-doctor <app-name>
-
-# Remove an application
-wild-app-delete <app-name>
-```
-
-## Individual Node Management
-
-If you need to manage individual nodes:
-
-```bash
-# Generate patch for a specific node
-wild-cluster-node-patch-generate <node-ip>
-
-# Generate final machine config (uses existing patch)
-wild-cluster-node-machine-config-generate <node-ip>
-
-# Apply configuration with options
-wild-cluster-node-up <node-ip> [--insecure] [--skip-patch] [--dry-run]
-```
-
-## Asset Management
-
-```bash
-# Download/cache boot assets (kernel, initramfs, ISO, iPXE)
-wild-cluster-node-boot-assets-download
-
-# Install dnsmasq with specific schematic
-wild-dnsmasq-install --schematic-id <id> --install
-```
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -1,15 +0,0 @@
-# Cluster
-
- LAN
- cluster
-
-## LAN
-
- router
-
-## Cluster
-
- nameserver
- node
- master
- load balancer
--- a/docs/guides/app-workflow.md
+++ b/docs/guides/app-workflow.md
@@ -43,4 +43,4 @@ wild-app-deploy <app>  # Deploys to Kubernetes

 ## App Directory Structure

-Your wild-cloud apps are stored in the `apps/` directory. You can change them however you like. You should keep them all in git and make commits anytime you change something. Some `wild` commands will overwrite files in your app directory (like when you are updating apps, or updating your configuration) so you'll want to review any changes made to your files after using them using `git`.
+Your wild-cloud apps are stored in the `apps/` directory. You can change them however you like. You should keep them all in git and make commits anytime you change something. Some `wild` commands will overwrite files in your app directory (like when you are updating apps, or updating your configuration) so you'll want to review any changes made to your files after using them using `git`.
--- a/docs/guides/backup-and-restore.md
+++ b/docs/guides/backup-and-restore.md
@@ -0,0 +1,3 @@
+# Backup and Restore
+
+TBD
--- a/docs/guides/monitoring.md
+++ b/docs/guides/monitoring.md
@@ -0,0 +1,50 @@
+# System Health Monitoring
+
+## Basic Monitoring
+
+Check system health with:
+
+```bash
+# Node resource usage
+kubectl top nodes
+
+# Pod resource usage
+kubectl top pods -A
+
+# Persistent volume claims
+kubectl get pvc -A
+```
+
+## Advanced Monitoring (Future Implementation)
+
+Consider implementing:
+
+1. **Prometheus + Grafana** for comprehensive monitoring:
+   ```bash
+   # Placeholder for future implementation
+   helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+   helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
+   ```
+
+2. **Loki** for log aggregation:
+   ```bash
+   # Placeholder for future implementation
+   helm repo add grafana https://grafana.github.io/helm-charts
+   helm install loki grafana/loki-stack --namespace logging --create-namespace
+   ```
+
+## Additional Resources
+
+This document will be expanded in the future with:
+
+- Detailed backup and restore procedures
+- Monitoring setup instructions
+- Comprehensive security hardening guide
+- Automated maintenance scripts
+
+For now, refer to the following external resources:
+
+- [K3s Documentation](https://docs.k3s.io/)
+- [Kubernetes Troubleshooting Guide](https://kubernetes.io/docs/tasks/debug/)
+- [Velero Backup Documentation](https://velero.io/docs/latest/)
+- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
--- a/docs/guides/node-setup.md
+++ b/docs/guides/node-setup.md
@@ -1,246 +0,0 @@
-# Node Setup Guide
-
-This guide covers setting up Talos Linux nodes for your Kubernetes cluster using USB boot.
-
-## Overview
-
-There are two main approaches for booting Talos nodes:
-
-1. **USB Boot** (covered here) - Boot from a custom USB drive with system extensions
-2. **PXE Boot** - Network boot using dnsmasq setup (see `setup/dnsmasq/README.md`)
-
-## USB Boot Setup
-
-### Prerequisites
-
- Target hardware for Kubernetes nodes
- USB drive (8GB+ recommended)
- Admin access to create bootable USB drives
-
-### Step 1: Upload Schematic and Download Custom Talos ISO
-
-First, upload the system extensions schematic to Talos Image Factory, then download the custom ISO.
-
-```bash
-# Upload schematic configuration to get schematic ID
-wild-talos-schema
-
-# Download custom ISO with system extensions
-wild-talos-iso
-```
-
-The custom ISO includes system extensions (iscsi-tools, util-linux-tools, intel-ucode, gvisor) needed for the cluster and is saved to `.wildcloud/iso/talos-v1.10.3-metal-amd64.iso`.
-
-### Step 2: Create Bootable USB Drive
-
-#### Linux (Recommended)
-
-```bash
-# Find your USB device (be careful to select the right device!)
-lsblk
-sudo dmesg | tail  # Check for recently connected USB devices
-
-# Create bootable USB (replace /dev/sdX with your USB device)
-sudo dd if=.wildcloud/iso/talos-v1.10.3-metal-amd64.iso of=/dev/sdX bs=4M status=progress sync
-
-# Verify the write completed
-sync
-```
-
-**⚠️ Warning**: Double-check the device path (`/dev/sdX`). Writing to the wrong device will destroy data!
-
-#### macOS
-
-```bash
-# Find your USB device
-diskutil list
-
-# Unmount the USB drive (replace diskX with your USB device)
-diskutil unmountDisk /dev/diskX
-
-# Create bootable USB
-sudo dd if=.wildcloud/iso/talos-v1.10.3-metal-amd64.iso of=/dev/rdiskX bs=4m
-
-# Eject when complete
-diskutil eject /dev/diskX
-```
-
-#### Windows
-
-Use one of these tools:
-
-1. **Rufus** (Recommended)
-
-   - Download from https://rufus.ie/
-   - Select the Talos ISO file
-   - Choose your USB drive
-   - Use "DD Image" mode
-   - Click "START"
-
-2. **Balena Etcher**
-
-   - Download from https://www.balena.io/etcher/
-   - Flash from file → Select Talos ISO
-   - Select target USB drive
-   - Flash!
-
-3. **Command Line** (Windows 10/11)
-
-   ```cmd
-   # List disks to find USB drive number
-   diskpart
-   list disk
-   exit
-
-   # Write ISO (replace X with your USB disk number)
-   dd if=.wildcloud\iso\talos-v1.10.3-metal-amd64.iso of=\\.\PhysicalDriveX bs=4M --progress
-   ```
-
-### Step 3: Boot Target Machine
-
-1. **Insert USB** into target machine
-2. **Boot from USB**:
-   - Restart machine and enter BIOS/UEFI (usually F2, F12, DEL, or ESC during startup)
-   - Change boot order to prioritize USB drive
-   - Or use one-time boot menu (usually F12)
-3. **Talos will boot** in maintenance mode with a DHCP IP
-
-### Step 4: Hardware Detection and Configuration
-
-Once the machine boots, it will be in maintenance mode with a DHCP IP address.
-
-```bash
-# Find the node's maintenance IP (check your router/DHCP server)
-# Then detect hardware and register the node
-cd setup/cluster-nodes
-./detect-node-hardware.sh <maintenance-ip> <node-number>
-
-# Example: Node got DHCP IP 192.168.8.150, registering as node 1
-./detect-node-hardware.sh 192.168.8.150 1
-```
-
-This script will:
-
- Discover network interface names (e.g., `enp4s0`)
- List available disks for installation
- Update `config.yaml` with node-specific hardware settings
-
-### Step 5: Generate and Apply Configuration
-
-```bash
-# Generate machine configurations with detected hardware
-./generate-machine-configs.sh
-
-# Apply configuration (node will reboot with static IP)
-talosctl apply-config --insecure -n <maintenance-ip> --file final/controlplane-node-<number>.yaml
-
-# Example:
-talosctl apply-config --insecure -n 192.168.8.150 --file final/controlplane-node-1.yaml
-```
-
-### Step 6: Verify Installation
-
-After reboot, the node should come up with its assigned static IP:
-
-```bash
-# Check connectivity (node 1 should be at 192.168.8.31)
-ping 192.168.8.31
-
-# Verify system extensions are installed
-talosctl -e 192.168.8.31 -n 192.168.8.31 get extensions
-
-# Check for iscsi tools
-talosctl -e 192.168.8.31 -n 192.168.8.31 list /usr/local/bin/ | grep iscsi
-```
-
-## Repeat for Additional Nodes
-
-For each additional control plane node:
-
-1. Boot with the same USB drive
-2. Run hardware detection with the new maintenance IP and node number
-3. Generate and apply configurations
-4. Verify the node comes up at its static IP
-
-Example for node 2:
-
-```bash
-./detect-node-hardware.sh 192.168.8.151 2
-./generate-machine-configs.sh
-talosctl apply-config --insecure -n 192.168.8.151 --file final/controlplane-node-2.yaml
-```
-
-## Cluster Bootstrap
-
-Once all control plane nodes are configured:
-
-```bash
-# Bootstrap the cluster using the VIP
-talosctl bootstrap -n 192.168.8.30
-
-# Get kubeconfig
-talosctl kubeconfig
-
-# Verify cluster
-kubectl get nodes
-```
-
-## Troubleshooting
-
-### USB Boot Issues
-
- **Machine won't boot from USB**: Check BIOS boot order, disable Secure Boot if needed
- **Talos doesn't start**: Verify ISO was written correctly, try re-creating USB
- **Network issues**: Ensure DHCP is available on your network
-
-### Hardware Detection Issues
-
- **Node not accessible**: Check IP assignment, firewall settings
- **Wrong interface detected**: Manual override in `config.yaml` if needed
- **Disk not found**: Verify disk size (must be >10GB), check disk health
-
-### Installation Issues
-
- **Static IP not assigned**: Check network configuration in machine config
- **Extensions not installed**: Verify ISO includes extensions, check upgrade logs
- **Node won't join cluster**: Check certificates, network connectivity to VIP
-
-### Checking Logs
-
-```bash
-# View system logs
-talosctl -e <node-ip> -n <node-ip> logs machined
-
-# Check kernel messages
-talosctl -e <node-ip> -n <node-ip> dmesg
-
-# Monitor services
-talosctl -e <node-ip> -n <node-ip> get services
-```
-
-## System Extensions Included
-
-The custom ISO includes these extensions:
-
- **siderolabs/iscsi-tools**: iSCSI initiator tools for persistent storage
- **siderolabs/util-linux-tools**: Utility tools including fstrim for storage
- **siderolabs/intel-ucode**: Intel CPU microcode updates (harmless on AMD)
- **siderolabs/gvisor**: Container runtime sandbox (optional security enhancement)
-
-These extensions enable:
-
- Longhorn distributed storage
- Improved security isolation
- CPU microcode updates
- Storage optimization tools
-
-## Next Steps
-
-After all nodes are configured:
-
-1. **Install CNI**: Deploy a Container Network Interface (Cilium, Calico, etc.)
-2. **Install CSI**: Deploy Container Storage Interface (Longhorn for persistent storage)
-3. **Deploy workloads**: Your applications and services
-4. **Monitor cluster**: Set up monitoring and logging
-
-See the main project documentation for application deployment guides.
--- a/docs/guides/security.md
+++ b/docs/guides/security.md
@@ -0,0 +1,46 @@
+# Security
+
+## Best Practices
+
+1. **Keep Everything Updated**:
+   - Regularly update K3s
+   - Update all infrastructure components
+   - Keep application images up to date
+
+2. **Network Security**:
+   - Use internal services whenever possible
+   - Limit exposed services to only what's necessary
+   - Configure your home router's firewall properly
+
+3. **Access Control**:
+   - Use strong passwords for all services
+   - Implement a secrets management strategy
+   - Rotate API tokens and keys regularly
+
+4. **Regular Audits**:
+   - Review running services periodically
+   - Check for unused or outdated deployments
+   - Monitor resource usage for anomalies
+
+## Security Scanning (Future Implementation)
+
+Tools to consider implementing:
+
+1. **Trivy** for image scanning:
+   ```bash
+   # Example Trivy usage (placeholder)
+   trivy image <your-image>
+   ```
+
+2. **kube-bench** for Kubernetes security checks:
+   ```bash
+   # Example kube-bench usage (placeholder)
+   kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
+   ```
+
+3. **Falco** for runtime security monitoring:
+   ```bash
+   # Example Falco installation (placeholder)
+   helm repo add falcosecurity https://falcosecurity.github.io/charts
+   helm install falco falcosecurity/falco --namespace falco --create-namespace
+   ```
--- a/docs/guides/taslos.md
+++ b/docs/guides/taslos.md
@@ -0,0 +1,18 @@
+# Talos
+
+
+## System Extensions Included
+
+The custom ISO includes these extensions:
+
+- **siderolabs/iscsi-tools**: iSCSI initiator tools for persistent storage
+- **siderolabs/util-linux-tools**: Utility tools including fstrim for storage
+- **siderolabs/intel-ucode**: Intel CPU microcode updates (harmless on AMD)
+- **siderolabs/gvisor**: Container runtime sandbox (optional security enhancement)
+
+These extensions enable:
+
+- Longhorn distributed storage
+- Improved security isolation
+- CPU microcode updates
+- Storage optimization tools
--- a/docs/guides/troubleshoot-cluster.md
+++ b/docs/guides/troubleshoot-cluster.md
@@ -0,0 +1,19 @@
+# Troubleshoot Wild Cloud Cluster issues
+
+## General Troubleshooting Steps
+
+1. **Check Node Status**:
+   ```bash
+   kubectl get nodes
+   kubectl describe node <node-name>
+   ```
+
+1. **Check Component Status**:
+   ```bash
+   # Check all pods across all namespaces
+   kubectl get pods -A
+   
+   # Look for pods that aren't Running or Ready
+   kubectl get pods -A | grep -v "Running\|Completed"
+   ```
+
--- a/docs/guides/troubleshoot-dns.md
+++ b/docs/guides/troubleshoot-dns.md
@@ -0,0 +1,20 @@
+# Troubleshoot DNS
+
+If DNS resolution isn't working properly:
+
+1. Check CoreDNS status:
+   ```bash
+   kubectl get pods -n kube-system -l k8s-app=kube-dns
+   kubectl logs -l k8s-app=kube-dns -n kube-system
+   ```
+
+2. Verify CoreDNS configuration:
+   ```bash
+   kubectl get configmap -n kube-system coredns -o yaml
+   ```
+
+3. Test DNS resolution from inside the cluster:
+   ```bash
+   kubectl run -i --tty --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default
+   ```
+
--- a/docs/guides/troubleshoot-service-connectivity.md
+++ b/docs/guides/troubleshoot-service-connectivity.md
@@ -0,0 +1,18 @@
+# Troubleshoot Service Connectivity
+
+If services can't communicate:
+
+1. Check network policies:
+   ```bash
+   kubectl get networkpolicies -A
+   ```
+
+2. Verify service endpoints:
+   ```bash
+   kubectl get endpoints -n <namespace>
+   ```
+
+3. Test connectivity from within the cluster:
+   ```bash
+   kubectl run -i --tty --rm debug --image=busybox --restart=Never -- wget -O- <service-name>.<namespace>
+   ```
--- a/docs/guides/troubleshoot-tls-certificates.md
+++ b/docs/guides/troubleshoot-tls-certificates.md
@@ -0,0 +1,24 @@
+# Troubleshoot TLS Certificates
+
+If services show invalid certificates:
+
+1. Check certificate status:
+   ```bash
+   kubectl get certificates -A
+   ```
+
+2. Examine certificate details:
+   ```bash
+   kubectl describe certificate <cert-name> -n <namespace>
+   ```
+
+3. Check for cert-manager issues:
+   ```bash
+   kubectl get pods -n cert-manager
+   kubectl logs -l app=cert-manager -n cert-manager
+   ```
+
+4. Verify the Cloudflare API token is correctly set up:
+   ```bash
+   kubectl get secret cloudflare-api-token -n internal
+   ```
--- a/docs/guides/troubleshoot-visibility.md
+++ b/docs/guides/troubleshoot-visibility.md
@@ -1,4 +1,4 @@
-# Troubleshooting Service Visibility
+# Troubleshoot Service Visibility

 This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them.

--- a/docs/guides/upgrade-applications.md
+++ b/docs/guides/upgrade-applications.md
@@ -0,0 +1,3 @@
+# Upgrade Applications
+
+TBD
--- a/docs/guides/upgrade-kubernetes.md
+++ b/docs/guides/upgrade-kubernetes.md
@@ -0,0 +1,3 @@
+# Upgrade Kubernetes
+
+TBD
--- a/docs/guides/upgrade-talos.md
+++ b/docs/guides/upgrade-talos.md
@@ -0,0 +1,3 @@
+# Upgrade Talos
+
+TBD
--- a/docs/guides/upgrade-wild-cloud.md
+++ b/docs/guides/upgrade-wild-cloud.md
@@ -0,0 +1,3 @@
+# Upgrade Wild Cloud
+
+TBD
--- a/docs/lan-routers/GL-iNet.md
+++ b/docs/lan-routers/GL-iNet.md
@@ -1,12 +0,0 @@
-# GL-iNet LAN Router Setup
-
- Applications > Dynamic DNS > Enable DDNS
-  - Enable
-  - Use Host Name as your CNAME at Cloudflare.
- Network > LAN > Address Reservation
-  - Add all cluster nodes.
- Network > Port Forwarding
-  - Add TCP, port 22 to your bastion
-  - Add TCP/UDP, port 443 to your cluster load balancer.
- Network > DNS > DNS Server Settings
-  - Set to cluster DNS server IP
--- a/docs/learning/visibility.md
+++ b/docs/learning/visibility.md
@@ -1,331 +0,0 @@
-# Understanding Network Visibility in Kubernetes
-
-This guide explains how applications deployed on our Kubernetes cluster become accessible from both internal and external networks. Whether you're deploying a public-facing website or an internal admin panel, this document will help you understand the journey from deployment to accessibility.
-
-## The Visibility Pipeline
-
-When you deploy an application to the cluster, making it accessible involves several coordinated components working together:
-
-1. **Kubernetes Services** - Direct traffic to your application pods
-2. **Ingress Controllers** - Route external HTTP/HTTPS traffic to services
-3. **Load Balancers** - Assign external IPs to services
-4. **DNS Management** - Map domain names to IPs
-5. **TLS Certificates** - Secure connections with HTTPS
-
-Let's walk through how each part works and how they interconnect.
-
-## From Deployment to Visibility
-
-### 1. Application Deployment
-
-Your journey begins with deploying your application on Kubernetes. This typically involves:
-
-```yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: my-app
-  namespace: my-namespace
-spec:
-  replicas: 1
-  selector:
-    matchLabels:
-      app: my-app
-  template:
-    metadata:
-      labels:
-        app: my-app
-    spec:
-      containers:
-        - name: my-app
-          image: myapp:latest
-          ports:
-            - containerPort: 80
-```
-
-This creates pods running your application, but they're not yet accessible outside their namespace.
-
-### 2. Kubernetes Service: Internal Connectivity
-
-A Kubernetes Service provides a stable endpoint to access your pods:
-
-```yaml
-apiVersion: v1
-kind: Service
-metadata:
-  name: my-app
-  namespace: my-namespace
-spec:
-  selector:
-    app: my-app
-  ports:
-    - port: 80
-      targetPort: 80
-  type: ClusterIP
-```
-
-With this `ClusterIP` service, your application is accessible within the cluster at `my-app.my-namespace.svc.cluster.local`, but not from outside.
-
-### 3. Ingress: Defining HTTP Routes
-
-For HTTP/HTTPS traffic, an Ingress resource defines routing rules:
-
-```yaml
-apiVersion: networking.k8s.io/v1
-kind: Ingress
-metadata:
-  name: my-app
-  namespace: my-namespace
-  annotations:
-    kubernetes.io/ingress.class: "traefik"
-    external-dns.alpha.kubernetes.io/target: "CLOUD_DOMAIN"
-    external-dns.alpha.kubernetes.io/ttl: "60"
-spec:
-  rules:
-    - host: my-app.CLOUD_DOMAIN
-      http:
-        paths:
-          - path: /
-            pathType: Prefix
-            backend:
-              service:
-                name: my-app
-                port:
-                  number: 80
-  tls:
-    - hosts:
-        - my-app.CLOUD_DOMAIN
-      secretName: wildcard-wild-cloud-tls
-```
-
-This Ingress tells the cluster to route requests for `my-app.CLOUD_DOMAIN` to your service. The annotations provide hints to other systems like ExternalDNS.
-
-### 4. Traefik: The Ingress Controller
-
-Our cluster uses Traefik as the ingress controller. Traefik watches for Ingress resources and configures itself to handle the routing rules. It acts as a reverse proxy and edge router, handling:
-
- HTTP/HTTPS routing
- TLS termination
- Load balancing
- Path-based routing
- Host-based routing
-
-Traefik runs as a service in the cluster with its own external IP (provided by MetalLB).
-
-### 5. MetalLB: Assigning External IPs
-
-Since we're running on-premises (not in a cloud that provides load balancers), we use MetalLB to assign external IPs to services. MetalLB manages a pool of IP addresses from our local network:
-
-```yaml
-apiVersion: metallb.io/v1beta1
-kind: IPAddressPool
-metadata:
-  name: default
-  namespace: metallb-system
-spec:
-  addresses:
-    - 192.168.8.240-192.168.8.250
-```
-
-This allows Traefik and any other LoadBalancer services to receive a real IP address from our network.
-
-### 6. ExternalDNS: Automated DNS Management
-
-ExternalDNS automatically creates and updates DNS records in our CloudFlare DNS zone:
-
-```yaml
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  name: external-dns
-  namespace: externaldns
-spec:
-  # ...
-  template:
-    spec:
-      containers:
-        - name: external-dns
-          image: registry.k8s.io/external-dns/external-dns
-          args:
-            - --source=service
-            - --source=ingress
-            - --provider=cloudflare
-            - --txt-owner-id=wild-cloud
-```
-
-ExternalDNS watches Kubernetes Services and Ingresses with appropriate annotations, then creates corresponding DNS records in CloudFlare, making your applications discoverable by domain name.
-
-### 7. Cert-Manager: TLS Certificate Automation
-
-To secure connections with HTTPS, we use cert-manager to automatically obtain and renew TLS certificates:
-
-```yaml
-apiVersion: cert-manager.io/v1
-kind: Certificate
-metadata:
-  name: wildcard-wild-cloud-io
-  namespace: default
-spec:
-  secretName: wildcard-wild-cloud-tls
-  dnsNames:
-    - "*.CLOUD_DOMAIN"
-    - "CLOUD_DOMAIN"
-  issuerRef:
-    name: letsencrypt-prod
-    kind: ClusterIssuer
-```
-
-Cert-manager handles:
-
- Certificate request and issuance
- DNS validation (for wildcard certificates)
- Automatic renewal
- Secret storage of certificates
-
-## The Two Visibility Paths
-
-In our infrastructure, we support two primary visibility paths:
-
-### Public Services (External Access)
-
-Public services are those meant to be accessible from the public internet:
-
-1. **Service**: Kubernetes ClusterIP service (internal)
-2. **Ingress**: Defines routing with hostname like `service-name.CLOUD_DOMAIN`
-3. **DNS**: ExternalDNS creates a CNAME record pointing to `CLOUD_DOMAIN`
-4. **TLS**: Uses wildcard certificate for `*.CLOUD_DOMAIN`
-5. **IP Addressing**: Traffic reaches the MetalLB-assigned IP for Traefik
-6. **Network**: Traffic flows from external internet → router → MetalLB IP → Traefik → Kubernetes Service → Application Pods
-
-**Deploy a public service with:**
-
-```bash
-./bin/deploy-service --type public --name myservice
-```
-
-### Internal Services (Private Access)
-
-Internal services are restricted to the internal network:
-
-1. **Service**: Kubernetes ClusterIP service (internal)
-2. **Ingress**: Defines routing with hostname like `service-name.internal.CLOUD_DOMAIN`
-3. **DNS**: ExternalDNS creates an A record pointing to the internal load balancer IP
-4. **TLS**: Uses wildcard certificate for `*.internal.CLOUD_DOMAIN`
-5. **IP Addressing**: Traffic reaches the MetalLB-assigned IP for Traefik
-6. **Network**: Traffic flows from internal network → MetalLB IP → Traefik → Service → Pods
-7. **Security**: Traefik middleware restricts access to internal network IPs
-
-**Deploy an internal service with:**
-
-```bash
-./bin/deploy-service --type internal --name adminpanel
-```
-
-## How It All Works Together
-
-1. **You deploy** an application using our deploy-service script
-2. **Kubernetes** schedules and runs your application pods
-3. **Services** provide a stable endpoint for your pods
-4. **Traefik** configures routing based on Ingress definitions
-5. **MetalLB** assigns real network IPs to LoadBalancer services
-6. **ExternalDNS** creates DNS records for your services
-7. **Cert-Manager** ensures valid TLS certificates for HTTPS
-
-### Network Flow Diagram
-
-```mermaid
-flowchart TD
- subgraph Internet["Internet"]
-        User("User Browser")
-        CloudDNS("CloudFlare DNS")
-  end
- subgraph Cluster["Cluster"]
-        Router("Router")
-        MetalLB("MetalLB")
-        Traefik("Traefik Ingress")
-        IngSvc("Service")
-        IngPods("Application Pods")
-        Ingress("Ingress")
-        CertManager("cert-manager")
-        WildcardCert("Wildcard Certificate")
-        ExtDNS("ExternalDNS")
-  end
-    User -- "1\. DNS Query" --> CloudDNS
-    CloudDNS -- "2\. IP Address" --> User
-    User -- "3\. HTTPS Request" --> Router
-    Router -- "4\. Forward" --> MetalLB
-    MetalLB -- "5\. Route" --> Traefik
-    Traefik -- "6\. Route" --> Ingress
-    Ingress -- "7\. Forward" --> IngSvc
-    IngSvc -- "8\. Balance" --> IngPods
-    ExtDNS -- "A. Update DNS" --> CloudDNS
-    Ingress -- "B. Configure" --> ExtDNS
-    CertManager -- "C. Issue Cert" --> WildcardCert
-    Ingress -- "D. Use" --> WildcardCert
-
-     User:::internet
-     CloudDNS:::internet
-     Router:::cluster
-     MetalLB:::cluster
-     Traefik:::cluster
-     IngSvc:::cluster
-     IngPods:::cluster
-     Ingress:::cluster
-     CertManager:::cluster
-     WildcardCert:::cluster
-     ExtDNS:::cluster
-    classDef internet fill:#fcfcfc,stroke:#333
-    classDef cluster fill:#a6f3ff,stroke:#333
-    style User fill:#C8E6C9
-    style CloudDNS fill:#C8E6C9
-    style Router fill:#C8E6C9
-    style MetalLB fill:#C8E6C9
-    style Traefik fill:#C8E6C9
-    style IngSvc fill:#C8E6C9
-    style IngPods fill:#C8E6C9
-    style Ingress fill:#C8E6C9
-    style CertManager fill:#C8E6C9
-    style WildcardCert fill:#C8E6C9
-    style ExtDNS fill:#C8E6C9
-```
-
-A successful deployment creates a chain of connections:
-
-```
-Internet → DNS (domain name) → External IP → Traefik → Kubernetes Service → Application Pod
-```
-
-## Behind the Scenes: The Technical Magic
-
-When you use our `deploy-service` script, several things happen:
-
-1. **Template Processing**: The script processes a YAML template for your service type, using environment variables to customize it
-2. **Namespace Management**: Creates or uses your service's namespace
-3. **Resource Application**: Applies the generated YAML to create/update all Kubernetes resources
-4. **DNS Configuration**: ExternalDNS detects the new resources and creates DNS records
-5. **Certificate Management**: Cert-manager ensures TLS certificates exist or creates new ones
-6. **Secret Distribution**: For internal services, certificates are copied to the appropriate namespaces
-
-## Troubleshooting Visibility Issues
-
-When services aren't accessible, the issue usually lies in one of these areas:
-
-1. **DNS Resolution**: Domain not resolving to the correct IP
-2. **Certificate Problems**: Invalid, expired, or missing TLS certificates
-3. **Ingress Configuration**: Incorrect routing rules or annotations
-4. **Network Issues**: Firewall rules or internal/external network segregation
-
-Our [Visibility Troubleshooting Guide](/docs/troubleshooting/VISIBILITY.md) provides detailed steps for diagnosing these issues.
-
-## Conclusion
-
-The visibility layer in our infrastructure represents a sophisticated interplay of multiple systems working together. While complex under the hood, it provides a streamlined experience for developers to deploy applications with proper networking, DNS, and security.
-
-By understanding these components and their relationships, you'll be better equipped to deploy applications and diagnose any visibility issues that arise.
-
-## Further Reading
-
- [Traefik Documentation](https://doc.traefik.io/traefik/)
- [ExternalDNS Project](https://github.com/kubernetes-sigs/external-dns)
- [Cert-Manager Documentation](https://cert-manager.io/docs/)
- [MetalLB Project](https://metallb.universe.tf/)
--- a/docs/tutorial/README.md
+++ b/docs/tutorial/README.md
@@ -1,19 +0,0 @@
-# Welcome to the Wild Cloud tutorial!
-
-## Hi! I'm Paul.
-
-Welcome! I am SO excited you're here!
-
-Why am I so excited?? When I was an eight year old kid, I had a computer named the Commodore64. One of the coolest things about it was that it came with a User Manual that told you all about how to not just use that computer, but to actually _use computers_. It taught me how to write my own programs and run them! That experience of wonder, that I could write something and have it do something, is the single biggest reason why I have spent the last 40 years working with computers.
-
-When I was 12, I found out I could plug a cartridge into the back of my Commodore, plug a telephone line into it (maybe some of you don't even know what that is anymore!), and _actually call_ other people's computers in my city. We developed such a sense of community, connecting our computers together and leaving each other messages about the things we were thinking. It was a tiny taste of the early Internet.
-
-I had a similar experience when I was 19 and installed something called the "World Wide Web" on the computers I managed in a computer lab at college. My heart skipped a beat when I clicked on a few "links" and actually saw an image from a computer in Israel just magically appear on my screen! It felt like I was teleported to the other side of the world. Pretty amazing for a kid who had rarely been out of Nebraska!
-
-Everything in those days was basically free. My Commodore cost $200, people connected to each other out of pure curiosity. If you wanted to be a presence on the Internet, you could just connect your computer to it and people around the world could visit you! _All_ of the early websites were entirely non-commercial. No ads! No sign-ups! No monthly subscription fees! It felt like the whole world was coming together to build something amazing for everyone.
-
-Of course, as we all know, it didn't stay that way. After college, I had to figure out ways to pay for Internet connections myself. At some point search engines decided to make money by selling ads on their pages... and then providing ad services to other pages--"monetize" they called it. Then commercial companies found out about it and wanted to sell books and shoes to other people, and the government decided they wanted to capture that tax money. Instead of making the free and open software better, and the open communities stronger, and encouraging people to participate by running their own computers and software, companies started offering people to connect _inside_ their controlled computers. "Hey! You don't have to do all that stuff" they would say, "You can just jump on our servers for free!".
-
-So people stopped being curious about what we could do with our computers together, and they got a login name, and they couldn't do their own things on their own computers anymore, and their data became the property of the company whose computer they were using, and those companies started working together to make it faster to go to their own computers, and to make it go very, very, slow if you wanted to let people come to your computer, or even to forbid having people come to your computer entirely. So now, we are _safe_ and _simple_ and _secure_ and we get whatever the companies want to give us, which seems to usually be ads (so many ads) or monthly fee increases, and they really, really, love getting our attention and putting it where they want it. Mostly, it's just all so... boring. So boring.
-
-So, why am I excited you're here? Because with this project, this Wild Cloud project, I think I just might be able to pass on some of that sense of wonder that captured me so many years ago!