Compare commits
9 Commits
8947da88eb
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
47cdf16b54 | ||
|
|
b9cf5c3760 | ||
|
|
e098267c81 | ||
|
|
97999fa099 | ||
|
|
5e3c49a17f | ||
|
|
72176d5401 | ||
|
|
998b6fa369 | ||
|
|
f0b0cc0965 | ||
|
|
f76e374ef0 |
@@ -1,11 +1,31 @@
|
||||
# Custom Dictionary Words
|
||||
amplifier
|
||||
charliermarsh
|
||||
dnsmasq
|
||||
dnsmasq
|
||||
dpkg
|
||||
esbenp
|
||||
ftpd
|
||||
GOARCH
|
||||
GOARCH
|
||||
goimports
|
||||
golangci
|
||||
GOOS
|
||||
gopls
|
||||
ipxe
|
||||
oklch
|
||||
pnpm
|
||||
postinst
|
||||
postrm
|
||||
prerm
|
||||
pyproject
|
||||
ruff
|
||||
shadcn
|
||||
talos
|
||||
tsconfig
|
||||
venv
|
||||
vite
|
||||
vitest
|
||||
wildcloud
|
||||
wildcloud
|
||||
wildd
|
||||
|
||||
13
.envrc
13
.envrc
@@ -1,13 +0,0 @@
|
||||
# API dev
|
||||
export WILD_CENTRAL_ENV=development
|
||||
export WILD_CENTRAL_DATA=$PWD/data
|
||||
export WILD_DIRECTORY=$PWD/directory
|
||||
|
||||
# CLI/App dev
|
||||
export WILD_DAEMON_URL=http://localhost:5055
|
||||
export WILD_CLI_DATA=$HOME/.wildcloud
|
||||
|
||||
# Source activate.sh in interactive shells
|
||||
if [[ $- == *i* ]]; then
|
||||
source ./activate.sh
|
||||
fi
|
||||
17
.envrc.example
Normal file
17
.envrc.example
Normal file
@@ -0,0 +1,17 @@
|
||||
# API dev
|
||||
export WILD_CENTRAL_ENV=development
|
||||
export WILD_API_DATA_DIR=$PWD/data
|
||||
export WILD_DIRECTORY=$PWD/directory
|
||||
|
||||
# CORS configuration for production (comma-separated origins)
|
||||
# Defaults to localhost development origins if not set
|
||||
# export WILD_CORS_ORIGINS=https://app.wildcloud.com,https://www.wildcloud.com
|
||||
|
||||
# CLI/App dev
|
||||
export WILD_API_URI=http://localhost:5055
|
||||
export WILD_CLI_DATA=$HOME/.wildcloud
|
||||
|
||||
# Source activate.sh in interactive shells
|
||||
if [[ $- == *i* ]]; then
|
||||
source ./activate.sh
|
||||
fi
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -20,7 +20,7 @@ uv.lock
|
||||
.claude/
|
||||
|
||||
# Wild Cloud
|
||||
data/
|
||||
.envrc
|
||||
|
||||
# Development working dir
|
||||
.working/
|
||||
|
||||
15
.gitmodules
vendored
15
.gitmodules
vendored
@@ -1,3 +1,18 @@
|
||||
[submodule "wild-directory"]
|
||||
path = wild-directory
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-directory.git
|
||||
[submodule "wild-cli"]
|
||||
path = wild-cli
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-cli.git
|
||||
[submodule "wild-cloud-poc"]
|
||||
path = wild-cloud-poc
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-cloud-poc.git
|
||||
[submodule "wild-central-api"]
|
||||
path = wild-central-api
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-central-api.git
|
||||
[submodule "wild-web-app"]
|
||||
path = wild-web-app
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-web-app.git
|
||||
[submodule "wild-central"]
|
||||
path = wild-central
|
||||
url = https://git.civilsociety.dev/wild-cloud/wild-central.git
|
||||
|
||||
205
CLAUDE.md
205
CLAUDE.md
@@ -4,18 +4,197 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
## Project Overview
|
||||
|
||||
This project is called "Wild Cloud Central". It consists of the following components:
|
||||
This project is called "Wild Cloud". Wild Cloud is a platform for managing and orchestrating cloud-native applications on local networks using a network appliance called "Wild Central".
|
||||
|
||||
- **Wild Daemon**:
|
||||
- @daemon/README.md
|
||||
- A web server that provides an API for managing Wild Cloud instances.
|
||||
- Wild CLI:
|
||||
- @cli/README.md
|
||||
- A command-line interface for interacting with the Wild Daemon and managing Wild Cloud clusters.
|
||||
- Wild App:
|
||||
- @app/README.md
|
||||
- A web-based interface for managing Wild Cloud instances, hosted on Wild Central.
|
||||
|
||||
Wild Central is a lightweight server that runs on a local machine (e.g., a Raspberry Pi) and provides an API for users to manage their Wild Cloud instances. The Wild Cloud API is implemented in the wild-central-api project. @wild-central-api/README.md . Wild Central devices can be set up using our apt package implemented in the wild-central project. @wild-central/README.md
|
||||
|
||||
A Wild Cloud instance is a kubernetes (k8s) environment that runs Wild Cloud services and applications. Wild Cloud instances can be created, managed, and monitored using the Wild Cloud API running on a Wild Central device.
|
||||
|
||||
Wild Cloud applications are custom packages designed to be deployed to Wild Cloud instances. They consist of kustomize templates and a Wild Cloud app manifest file that describes the application and how it should be deployed configured and deployed in a Wild Cloud instance. Wild Cloud applications are stored in a "Wild Directory". The directory contained in the wild-directory folder is the official Wild Directory. @wild-directory/README.md
|
||||
|
||||
The Wild Cloud API maintains data for each Wild Cloud instance in its configured WILD_API_DATA_DIR. A data directory is intended to be checked into version control (e.g., git) to track changes to the configuration of Wild Cloud instances and their deployed applications over time. These are designed to follow infrastructure-as-code principles, allowing experienced devops users to manage their Wild Cloud instances using familiar tools and workflows.
|
||||
|
||||
We provide a command-line interface (CLI) tool called Wild CLI, implemented in the wild-cli project, that allows users to interact with the Wild Cloud API and manage their Wild Cloud instances from the terminal. This allows users to automate tasks and integrate Wild Cloud management into their existing workflows. @wild-cli/README.md
|
||||
|
||||
To make Wild Cloud more accessible to less-experienced users, the Wild Central device hosts a web-based interface for managing Wild Cloud instances, which is implemented in the wild-web-app project. @wild-web-app/README.md
|
||||
|
||||
## Additional Documentation
|
||||
|
||||
### Info about Talos
|
||||
|
||||
- @ai/talos-v1.11/README.md
|
||||
- @ai/talos-v1.11/architecture-and-components.md
|
||||
- @ai/talos-v1.11/cli-essentials.md
|
||||
- @ai/talos-v1.11/cluster-operations.md
|
||||
- @ai/talos-v1.11/discovery-and-networking.md
|
||||
- @ai/talos-v1.11/etcd-management.md
|
||||
- @ai/talos-v1.11/bare-metal-administration.md
|
||||
- @ai/talos-v1.11/troubleshooting-guide.md
|
||||
|
||||
## Implementation Philosophy
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
Embodies a Zen-like minimalism that values simplicity and clarity above all. This approach reflects:
|
||||
|
||||
- **Wabi-sabi philosophy**: Embracing simplicity and the essential. Each line serves a clear purpose without unnecessary embellishment.
|
||||
- **KISS**: The solution should be as simple as possible, but no simpler.
|
||||
- **YAGNI**: Avoid building features or abstractions that aren't immediately needed. The code handles what's needed now rather than anticipating every possible future scenario.
|
||||
- **Trust in emergence**: Complex systems work best when built from simple, well-defined components that do one thing well.
|
||||
- **Pragmatic trust**: The developer trusts external systems enough to interact with them directly, handling failures as they occur rather than assuming they'll happen.
|
||||
- **Consistency is key**: Uniform patterns and conventions make the codebase easier to understand and maintain. If you introduce a new pattern, make sure it's consistently applied. There should be one obvious way to do things.
|
||||
|
||||
This development philosophy values clear, concise documentation, readable code, and belief that good architecture emerges from simplicity rather than being imposed through complexity.
|
||||
|
||||
## Core Design Principles
|
||||
|
||||
### 1. Ruthless Simplicity
|
||||
|
||||
- **KISS principle taken to heart**: Keep everything as simple as possible, but no simpler
|
||||
- **Minimize abstractions**: Every layer of abstraction must justify its existence
|
||||
- **Start minimal, grow as needed**: Begin with the simplest implementation that meets current needs
|
||||
- **Avoid future-proofing**: Don't build for hypothetical future requirements
|
||||
- **Question everything**: Regularly challenge complexity in the codebase
|
||||
|
||||
### 2. Architectural Integrity with Minimal Implementation
|
||||
|
||||
- **Preserve key architectural patterns**: Maintain clear boundaries and responsibilities
|
||||
- **Simplify implementations**: Maintain pattern benefits with dramatically simpler code
|
||||
- **Scrappy but structured**: Lightweight implementations of solid architectural foundations
|
||||
- **End-to-end thinking**: Focus on complete flows rather than perfect components
|
||||
|
||||
### 3. Library vs Custom Code
|
||||
|
||||
Choosing between custom code and external libraries is a judgment call that evolves with your requirements. There's no rigid rule - it's about understanding trade-offs and being willing to revisit decisions as needs change.
|
||||
|
||||
#### The Evolution Pattern
|
||||
|
||||
Your approach might naturally evolve:
|
||||
- **Start simple**: Custom code for basic needs (20 lines handles it)
|
||||
- **Growing complexity**: Switch to a library when requirements expand
|
||||
- **Hitting limits**: Back to custom when you outgrow the library's capabilities
|
||||
|
||||
This isn't failure - it's natural evolution. Each stage was the right choice at that time.
|
||||
|
||||
#### When Custom Code Makes Sense
|
||||
|
||||
Custom code often wins when:
|
||||
- The need is simple and well-understood
|
||||
- You want code perfectly tuned to your exact requirements
|
||||
- Libraries would require significant "hacking" or workarounds
|
||||
- The problem is unique to your domain
|
||||
- You need full control over the implementation
|
||||
|
||||
#### When Libraries Make Sense
|
||||
|
||||
Libraries shine when:
|
||||
- They solve complex problems you'd rather not tackle (auth, crypto, video encoding)
|
||||
- They align well with your needs without major modifications
|
||||
- The problem is well-solved with mature, battle-tested solutions
|
||||
- Configuration alone can adapt them to your requirements
|
||||
- The complexity they handle far exceeds the integration cost
|
||||
|
||||
#### Making the Judgment Call
|
||||
|
||||
Ask yourself:
|
||||
- How well does this library align with our actual needs?
|
||||
- Are we fighting the library or working with it?
|
||||
- Is the integration clean or does it require workarounds?
|
||||
- Will our future requirements likely stay within this library's capabilities?
|
||||
- Is the problem complex enough to justify the dependency?
|
||||
|
||||
#### Recognizing Misalignment
|
||||
|
||||
Watch for signs you're fighting your current approach:
|
||||
- Spending more time working around the library than using it
|
||||
- Your simple custom solution has grown complex and fragile
|
||||
- You're monkey-patching or heavily wrapping a library
|
||||
- The library's assumptions fundamentally conflict with your needs
|
||||
|
||||
#### Stay Flexible
|
||||
|
||||
Remember that complexity isn't destroyed, only moved. Libraries shift complexity from your code to someone else's - that's often a great trade, but recognize what you're doing.
|
||||
|
||||
The key is avoiding lock-in. Keep library integration points minimal and isolated so you can switch approaches when needed. There's no shame in moving from custom to library or library to custom. Requirements change, understanding deepens, and the right answer today might not be the right answer tomorrow. Make the best decision with current information, and be ready to evolve.
|
||||
|
||||
## Technical Implementation Guidelines
|
||||
|
||||
### API Layer
|
||||
|
||||
- Implement only essential endpoints
|
||||
- Minimal middleware with focused validation
|
||||
- Clear error responses with useful messages
|
||||
- Consistent patterns across endpoints
|
||||
|
||||
### Storage
|
||||
|
||||
- Prefer simple file storage
|
||||
- Simple schema focused on current needs
|
||||
|
||||
## Development Approach
|
||||
|
||||
### Vertical Slices
|
||||
|
||||
- Implement complete end-to-end functionality slices
|
||||
- Start with core user journeys
|
||||
- Get data flowing through all layers early
|
||||
- Add features horizontally only after core flows work
|
||||
|
||||
### Iterative Implementation
|
||||
|
||||
- 80/20 principle: Focus on high-value, low-effort features first
|
||||
- One working feature > multiple partial features
|
||||
- Validate with real usage before enhancing
|
||||
- Be willing to refactor early work as patterns emerge
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
- Focus on critical path testing initially
|
||||
- Add unit tests for complex logic and edge cases
|
||||
- Testing pyramid: 60% unit, 30% integration, 10% end-to-end
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Handle common errors robustly
|
||||
- Log detailed information for debugging
|
||||
- Provide clear error messages to users
|
||||
- Fail fast and visibly during development
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When faced with implementation decisions, ask these questions:
|
||||
|
||||
1. **Necessity**: "Do we actually need this right now?"
|
||||
2. **Simplicity**: "What's the simplest way to solve this problem?"
|
||||
3. **Directness**: "Can we solve this more directly?"
|
||||
4. **Value**: "Does the complexity add proportional value?"
|
||||
5. **Maintenance**: "How easy will this be to understand and change later?"
|
||||
|
||||
## Areas to Embrace Complexity
|
||||
|
||||
Some areas justify additional complexity:
|
||||
|
||||
1. **Security**: Never compromise on security fundamentals
|
||||
2. **Data integrity**: Ensure data consistency and reliability
|
||||
3. **Core user experience**: Make the primary user flows smooth and reliable
|
||||
4. **Error visibility**: Make problems obvious and diagnosable
|
||||
|
||||
## Areas to Aggressively Simplify
|
||||
|
||||
Push for extreme simplicity in these areas:
|
||||
|
||||
1. **Internal abstractions**: Minimize layers between components
|
||||
2. **Generic "future-proof" code**: Resist solving non-existent problems
|
||||
3. **Edge case handling**: Handle the common cases well first
|
||||
4. **Framework usage**: Use only what you need from frameworks
|
||||
5. **State management**: Keep state simple and explicit
|
||||
|
||||
## Remember
|
||||
|
||||
- It's easier to add complexity later than to remove it
|
||||
- Code you don't write has no bugs
|
||||
- Favor clarity over cleverness
|
||||
- The best code is often the simplest
|
||||
|
||||
This philosophy document serves as the foundational guide for all implementation decisions in the project.
|
||||
|
||||
Read all of the following for context:
|
||||
- @ai/BUILDING_WILD_CENTRAL.md
|
||||
|
||||
65
README.md
65
README.md
@@ -1,12 +1,63 @@
|
||||
# Wild Cloud Development Environment
|
||||
|
||||
## Support
|
||||
## Overview
|
||||
|
||||
- **Documentation**: See `docs/` directory for detailed guides
|
||||
- **Issues**: Report problems on the project issue tracker
|
||||
- **API Reference**: Available at `/api/v1/` endpoints when service is running
|
||||
This project includes a Claude Code assisted environment for working on all Wild Cloud components at once. Each component of the Wild Cloud project is included as submodules in this repo. This includes:
|
||||
|
||||
## Documentation
|
||||
- [Wild Central](wild-central/README.md): The network appliance.
|
||||
- [Wild Centrail API](wild-central-api/README.md): The daemon running on the network appliance.
|
||||
- [Wild CLI](wild-cli/README.md): A command line interface to the API.
|
||||
- [Wild Web App](wild-web-app/README.md): A web interface to the API.
|
||||
- [Wild Directory](wild-directory/README.md): Managed apps to be deployed on wild clouds.
|
||||
|
||||
- [Developer Guide](docs/DEVELOPER.md) - Development setup, testing, and API reference
|
||||
- [Maintainer Guide](docs/MAINTAINER.md) - Package management and repository deployment
|
||||
... and, until this milestone is complete:
|
||||
|
||||
- [Wild CLoud PoC](wild-cloud-poc/README.md): The proof-of-concept project that includes functional scripts for running a wild cloud within a development environment. This will be retired as soon as the CLI and web-app are at feature-parity.
|
||||
|
||||
## Dev environment setup
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Go 1.21+
|
||||
- Docker (for testing)
|
||||
- make
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install make direnv
|
||||
echo 'eval "$(direnv hook bash)"' >> $HOME/.bashrc
|
||||
source $HOME/.bashrc
|
||||
|
||||
# Node.js and pnpm setup
|
||||
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
|
||||
source $HOME/.bashrc
|
||||
nvm install --lts
|
||||
|
||||
curl -fsSL https://get.pnpm.io/install.sh | sh -
|
||||
source $HOME/.bashrc
|
||||
pnpm install -g @anthropic-ai/claude-code
|
||||
|
||||
# Golang setup
|
||||
wget https://go.dev/dl/go1.24.5.linux-arm64.tar.gz
|
||||
sudo tar -C /usr/local -xzf ./go1.24.5.linux-arm64.tar.gz
|
||||
echo 'export PATH="$PATH:$HOME/go/bin:/usr/local/go/bin"' >> $HOME/.bashrc
|
||||
source $HOME/.bashrc
|
||||
rm ./go1.24.5.linux-arm64.tar.gz
|
||||
go install -v github.com/go-delve/delve/cmd/dlv@latest
|
||||
|
||||
# Python setup
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
source $HOME/.local/bin/env
|
||||
uv sync
|
||||
|
||||
# Runtime dependencies
|
||||
./scripts/install-wild-cloud-dependencies.sh
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
You will need to create a data directory and link it in your `.envrc`. The CLI and API will both work against this data dir.
|
||||
|
||||
Open the [VSCode workspace](./wild-cloud-dev.code-workspace) in VS Code. Within VS Code, you will be able to launch the API and Web App. The web app will reload as you modify files. The API, however, will need to be rebuilt/reloaded after you make changes.
|
||||
|
||||
Both the CLI and the web app use the API as their backend, so any non-CLI/web-app functionality should be in the API. When updating the API, it is important to ensure both the CLI and the web-app work with the updates.
|
||||
|
||||
@@ -1,260 +0,0 @@
|
||||
# Building Wild Cloud Central
|
||||
|
||||
The first version of Wild Cloud, the Proof of Concept version (v.PoC), was built as a collection of shell scripts that users would run from their local machines. This works well for early adopters who are comfortable with the command line, Talos, and Kubernetes.
|
||||
|
||||
To make Wild Cloud more accessible to a broader audience, we are developing Wild Central. Central is a single-purpose machine run on a LAN that will deliver:
|
||||
|
||||
- Wild Daemon: A lightweight service that runs on a local machine (e.g., a Raspberry Pi) to manage Wild Cloud instances on the local network.
|
||||
- Wild App: A web-based interface (to Wild Daemon) for managing Wild Cloud instances.
|
||||
- Wild CLI: A command-line interface (to Wild Daemon) for advanced users who prefer to manage Wild Cloud from the terminal.
|
||||
|
||||
## Background info
|
||||
|
||||
### Info about Wild Cloud v.PoC
|
||||
|
||||
- @docs/agent-context/wildcloud-v.PoC/README.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/overview.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/project-architecture.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/bin-scripts.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/configuration-system.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/setup-process.md
|
||||
- @docs/agent-context/wildcloud-v.PoC/apps-system.md
|
||||
|
||||
### Info about Talos
|
||||
|
||||
- @docs/agent-context/talos-v1.11/README.md
|
||||
- @docs/agent-context/talos-v1.11/architecture-and-components.md
|
||||
- @docs/agent-context/talos-v1.11/cli-essentials.md
|
||||
- @docs/agent-context/talos-v1.11/cluster-operations.md
|
||||
- @docs/agent-context/talos-v1.11/discovery-and-networking.md
|
||||
- @docs/agent-context/talos-v1.11/etcd-management.md
|
||||
- @docs/agent-context/talos-v1.11/bare-metal-administration.md
|
||||
- @docs/agent-context/talos-v1.11/troubleshooting-guide.md
|
||||
|
||||
## Architecture
|
||||
|
||||
### Old v.PoC Architecture
|
||||
|
||||
- WC_ROOT: The scripts used to set up and manage the Wild Cloud cluster. Currently, this is a set of shell scripts in $WC_ROOT/bin.
|
||||
- WC_HOME: During setup, the user creates a Wild Cloud project directory (WC_HOME) on their local machine. This directory holds all configuration, secrets, and k8s manifests for their specific Wild Cloud deployment.
|
||||
- Wild Cloud Apps Directory: The Wild Cloud apps are stored in the `apps/` directory within the WC_ROOT repository. Users can deploy these apps to their cluster using the scripts in WC_ROOT/bin.
|
||||
- dnsmasq server: Scripts help the operator set up a dnsmasq server on a separate machine to provide LAN DNS services during node bootstrapping.
|
||||
|
||||
### New Wild Central Architecture
|
||||
|
||||
#### wildd: The Wild Cloud Daemon
|
||||
|
||||
wildd is a long-running service that provides an API and web interface for managing one or more Wild Cloud clusters. It runs on a dedicated device within the user's network.
|
||||
|
||||
wildd replaces functionality from the v.PoC scripts and the dnsmasq server. It is one API for managing multiple wild cloud instances on the LAN.
|
||||
|
||||
Both wild-app and wild-cli communicate with wildd to perform actions.
|
||||
|
||||
See: @daemon/BUILDING_WILD_DAEMON.md
|
||||
|
||||
#### wild-app
|
||||
|
||||
The web application that provides the user interface for Wild Cloud on Wild Central. It communicates with wildd to perform actions and display information.
|
||||
|
||||
See: @/app/BUILDING_WILD_APP.md
|
||||
|
||||
#### wild-cli
|
||||
|
||||
A command-line interface for advanced users who prefer to manage Wild Cloud from the terminal. It communicates with wildd to perform actions.
|
||||
|
||||
Mirrors all of the wild-* scripts from v.PoC, but adapted for the new architecture:
|
||||
|
||||
- One golang client (wild-cli) replaces many bash scripts (wild-*).
|
||||
- Wrapper around wildd API instead of direct file manipulation.
|
||||
- Multi-cloud: v.PoC scripts set the instance context with WC_HOME environment variable. In Central, wild-cli follows the "context" pattern like kubectl and talosctl, using `--context` or `WILD_CONTEXT` to select which wild cloud instance to manage, or defaulting to the "current" context.
|
||||
|
||||
See: @cli/BUILDING_WILD_CLI.md
|
||||
|
||||
#### Wild Central Data
|
||||
|
||||
Configured with $WILD_CENTRAL_DATA environment variable (default: /var/lib/wild-central).
|
||||
|
||||
Replaces multiple WC_HOMEs. All wild clouds managed on the LAN are configured here. These are still in easy to read YAML format and can be edited directly or through the webapp.
|
||||
|
||||
Wild Central data also holds the local app directory, logs, and artifacts, and overall state data.
|
||||
|
||||
#### Wild Cloud Apps Directory
|
||||
|
||||
The Wild Cloud apps are stored in the `apps/` directory within the WC_ROOT repository. Users can deploy these apps to their cluster using the webapp or wild-cli.
|
||||
|
||||
#### dnsmasq server
|
||||
|
||||
The Wild Daemon (wildd) includes functionality to manage a dnsmasq server on the same device, providing LAN DNS services during node bootstrapping.
|
||||
|
||||
## Packaging and Installation
|
||||
|
||||
Ultimately, the daemon, app, and cli will be packaged together for easy installation on a Raspberry Pi or similar device.
|
||||
|
||||
See @ai/WILD_CENTRAL_PACKAGING.md
|
||||
|
||||
## Implementation Philosophy
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
Embodies a Zen-like minimalism that values simplicity and clarity above all. This approach reflects:
|
||||
|
||||
- **Wabi-sabi philosophy**: Embracing simplicity and the essential. Each line serves a clear purpose without unnecessary embellishment.
|
||||
- **KISS**: The solution should be as simple as possible, but no simpler.
|
||||
- **YAGNI**: Avoid building features or abstractions that aren't immediately needed. The code handles what's needed now rather than anticipating every possible future scenario.
|
||||
- **Trust in emergence**: Complex systems work best when built from simple, well-defined components that do one thing well.
|
||||
- **Pragmatic trust**: The developer trusts external systems enough to interact with them directly, handling failures as they occur rather than assuming they'll happen.
|
||||
- **Consistency is key**: Uniform patterns and conventions make the codebase easier to understand and maintain. If you introduce a new pattern, make sure it's consistently applied. There should be one obvious way to do things.
|
||||
|
||||
This development philosophy values clear, concise documentation, readable code, and belief that good architecture emerges from simplicity rather than being imposed through complexity.
|
||||
|
||||
## Core Design Principles
|
||||
|
||||
### 1. Ruthless Simplicity
|
||||
|
||||
- **KISS principle taken to heart**: Keep everything as simple as possible, but no simpler
|
||||
- **Minimize abstractions**: Every layer of abstraction must justify its existence
|
||||
- **Start minimal, grow as needed**: Begin with the simplest implementation that meets current needs
|
||||
- **Avoid future-proofing**: Don't build for hypothetical future requirements
|
||||
- **Question everything**: Regularly challenge complexity in the codebase
|
||||
|
||||
### 2. Architectural Integrity with Minimal Implementation
|
||||
|
||||
- **Preserve key architectural patterns**: Maintain clear boundaries and responsibilities
|
||||
- **Simplify implementations**: Maintain pattern benefits with dramatically simpler code
|
||||
- **Scrappy but structured**: Lightweight implementations of solid architectural foundations
|
||||
- **End-to-end thinking**: Focus on complete flows rather than perfect components
|
||||
|
||||
### 3. Library vs Custom Code
|
||||
|
||||
Choosing between custom code and external libraries is a judgment call that evolves with your requirements. There's no rigid rule - it's about understanding trade-offs and being willing to revisit decisions as needs change.
|
||||
|
||||
#### The Evolution Pattern
|
||||
|
||||
Your approach might naturally evolve:
|
||||
- **Start simple**: Custom code for basic needs (20 lines handles it)
|
||||
- **Growing complexity**: Switch to a library when requirements expand
|
||||
- **Hitting limits**: Back to custom when you outgrow the library's capabilities
|
||||
|
||||
This isn't failure - it's natural evolution. Each stage was the right choice at that time.
|
||||
|
||||
#### When Custom Code Makes Sense
|
||||
|
||||
Custom code often wins when:
|
||||
- The need is simple and well-understood
|
||||
- You want code perfectly tuned to your exact requirements
|
||||
- Libraries would require significant "hacking" or workarounds
|
||||
- The problem is unique to your domain
|
||||
- You need full control over the implementation
|
||||
|
||||
#### When Libraries Make Sense
|
||||
|
||||
Libraries shine when:
|
||||
- They solve complex problems you'd rather not tackle (auth, crypto, video encoding)
|
||||
- They align well with your needs without major modifications
|
||||
- The problem is well-solved with mature, battle-tested solutions
|
||||
- Configuration alone can adapt them to your requirements
|
||||
- The complexity they handle far exceeds the integration cost
|
||||
|
||||
#### Making the Judgment Call
|
||||
|
||||
Ask yourself:
|
||||
- How well does this library align with our actual needs?
|
||||
- Are we fighting the library or working with it?
|
||||
- Is the integration clean or does it require workarounds?
|
||||
- Will our future requirements likely stay within this library's capabilities?
|
||||
- Is the problem complex enough to justify the dependency?
|
||||
|
||||
#### Recognizing Misalignment
|
||||
|
||||
Watch for signs you're fighting your current approach:
|
||||
- Spending more time working around the library than using it
|
||||
- Your simple custom solution has grown complex and fragile
|
||||
- You're monkey-patching or heavily wrapping a library
|
||||
- The library's assumptions fundamentally conflict with your needs
|
||||
|
||||
#### Stay Flexible
|
||||
|
||||
Remember that complexity isn't destroyed, only moved. Libraries shift complexity from your code to someone else's - that's often a great trade, but recognize what you're doing.
|
||||
|
||||
The key is avoiding lock-in. Keep library integration points minimal and isolated so you can switch approaches when needed. There's no shame in moving from custom to library or library to custom. Requirements change, understanding deepens, and the right answer today might not be the right answer tomorrow. Make the best decision with current information, and be ready to evolve.
|
||||
|
||||
## Technical Implementation Guidelines
|
||||
|
||||
### API Layer
|
||||
|
||||
- Implement only essential endpoints
|
||||
- Minimal middleware with focused validation
|
||||
- Clear error responses with useful messages
|
||||
- Consistent patterns across endpoints
|
||||
|
||||
### Storage
|
||||
|
||||
- Prefer simple file storage
|
||||
- Simple schema focused on current needs
|
||||
|
||||
## Development Approach
|
||||
|
||||
### Vertical Slices
|
||||
|
||||
- Implement complete end-to-end functionality slices
|
||||
- Start with core user journeys
|
||||
- Get data flowing through all layers early
|
||||
- Add features horizontally only after core flows work
|
||||
|
||||
### Iterative Implementation
|
||||
|
||||
- 80/20 principle: Focus on high-value, low-effort features first
|
||||
- One working feature > multiple partial features
|
||||
- Validate with real usage before enhancing
|
||||
- Be willing to refactor early work as patterns emerge
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
- Focus on critical path testing initially
|
||||
- Add unit tests for complex logic and edge cases
|
||||
- Testing pyramid: 60% unit, 30% integration, 10% end-to-end
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Handle common errors robustly
|
||||
- Log detailed information for debugging
|
||||
- Provide clear error messages to users
|
||||
- Fail fast and visibly during development
|
||||
|
||||
## Decision-Making Framework
|
||||
|
||||
When faced with implementation decisions, ask these questions:
|
||||
|
||||
1. **Necessity**: "Do we actually need this right now?"
|
||||
2. **Simplicity**: "What's the simplest way to solve this problem?"
|
||||
3. **Directness**: "Can we solve this more directly?"
|
||||
4. **Value**: "Does the complexity add proportional value?"
|
||||
5. **Maintenance**: "How easy will this be to understand and change later?"
|
||||
|
||||
## Areas to Embrace Complexity
|
||||
|
||||
Some areas justify additional complexity:
|
||||
|
||||
1. **Security**: Never compromise on security fundamentals
|
||||
2. **Data integrity**: Ensure data consistency and reliability
|
||||
3. **Core user experience**: Make the primary user flows smooth and reliable
|
||||
4. **Error visibility**: Make problems obvious and diagnosable
|
||||
|
||||
## Areas to Aggressively Simplify
|
||||
|
||||
Push for extreme simplicity in these areas:
|
||||
|
||||
1. **Internal abstractions**: Minimize layers between components
|
||||
2. **Generic "future-proof" code**: Resist solving non-existent problems
|
||||
3. **Edge case handling**: Handle the common cases well first
|
||||
4. **Framework usage**: Use only what you need from frameworks
|
||||
5. **State management**: Keep state simple and explicit
|
||||
|
||||
## Remember
|
||||
|
||||
- It's easier to add complexity later than to remove it
|
||||
- Code you don't write has no bugs
|
||||
- Favor clarity over cleverness
|
||||
- The best code is often the simplest
|
||||
|
||||
This philosophy document serves as the foundational guide for all implementation decisions in the project.
|
||||
@@ -1,102 +0,0 @@
|
||||
# Packaging Wild Central
|
||||
|
||||
## Desired Experience
|
||||
|
||||
This is the desired experience for installing Wild Cloud Central on a fresh Debian/Ubuntu system:
|
||||
|
||||
### APT Repository (Recommended)
|
||||
|
||||
```bash
|
||||
# Download and install GPG key
|
||||
curl -fsSL https://mywildcloud.org/apt/wild-cloud-central.gpg | sudo tee /usr/share/keyrings/wild-cloud-central-archive-keyring.gpg > /dev/null
|
||||
|
||||
# Add repository (modern .sources format)
|
||||
sudo tee /etc/apt/sources.list.d/wild-cloud-central.sources << 'EOF'
|
||||
Types: deb
|
||||
URIs: https://mywildcloud.org/apt
|
||||
Suites: stable
|
||||
Components: main
|
||||
Signed-By: /usr/share/keyrings/wild-cloud-central-archive-keyring.gpg
|
||||
EOF
|
||||
|
||||
# Update and install
|
||||
sudo apt update
|
||||
sudo apt install wild-cloud-central
|
||||
```
|
||||
|
||||
### Manual Installation
|
||||
|
||||
Download the latest `.deb` package from the [releases page](https://github.com/wildcloud/wild-central/releases) and install:
|
||||
|
||||
```bash
|
||||
sudo dpkg -i wild-cloud-central_*.deb
|
||||
sudo apt-get install -f # Fix any dependency issues
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Configure the service** (optional):
|
||||
|
||||
```bash
|
||||
sudo cp /etc/wild-cloud-central/config.yaml.example /etc/wild-cloud-central/config.yaml
|
||||
sudo nano /etc/wild-cloud-central/config.yaml
|
||||
```
|
||||
|
||||
2. **Start the service**:
|
||||
|
||||
```bash
|
||||
sudo systemctl enable wild-cloud-central
|
||||
sudo systemctl start wild-cloud-central
|
||||
```
|
||||
|
||||
3. **Access the web interface**:
|
||||
Open http://your-server-ip in your browser
|
||||
|
||||
## Developer tooling
|
||||
|
||||
Makefile commands for packaging:
|
||||
|
||||
Build targets (compile binaries):
|
||||
|
||||
make build - Build for current architecture
|
||||
make build-arm64 - Build arm64 binary
|
||||
make build-amd64 - Build amd64 binary
|
||||
make build-all - Build all architectures
|
||||
|
||||
Package targets (create .deb packages):
|
||||
|
||||
make package - Create .deb package for current arch
|
||||
make package-arm64 - Create arm64 .deb package
|
||||
make package-amd64 - Create amd64 .deb package
|
||||
make package-all - Create all .deb packages
|
||||
|
||||
Repository targets:
|
||||
|
||||
make repo - Build APT repository from packages
|
||||
make deploy-repo - Deploy repository to server
|
||||
|
||||
Quality assurance:
|
||||
|
||||
make check - Run all checks (fmt + vet + test)
|
||||
make fmt - Format Go code
|
||||
make vet - Run go vet
|
||||
make test - Run tests
|
||||
|
||||
Development:
|
||||
|
||||
make run - Run application locally
|
||||
make clean - Remove all build artifacts
|
||||
make deps-check - Verify and tidy dependencies
|
||||
make version - Show build information
|
||||
make install - Install to system
|
||||
|
||||
Directory structure:
|
||||
|
||||
build/ - Intermediate build artifacts
|
||||
dist/bin/ - Final binaries for distribution
|
||||
dist/packages/ - OS packages (.deb files)
|
||||
dist/repositories/ - APT repository for deployment
|
||||
|
||||
Example workflows:
|
||||
make check && make build - Safe development build
|
||||
make clean && make repo - Full release build
|
||||
@@ -59,47 +59,6 @@ apps/
|
||||
- **AI/ML**: vLLM
|
||||
- **Infrastructure**: Memcached, NFS
|
||||
|
||||
### `/setup/` - Infrastructure Templates
|
||||
**Purpose**: Cluster and service deployment templates
|
||||
```
|
||||
setup/
|
||||
├── README.md
|
||||
├── cluster-nodes/ # Talos node configuration
|
||||
│ ├── init-cluster.sh # Cluster initialization script
|
||||
│ ├── patch.templates/ # Node-specific config templates
|
||||
│ │ ├── controlplane.yaml # Control plane template
|
||||
│ │ └── worker.yaml # Worker node template
|
||||
│ └── talos-schemas.yaml # Version mappings
|
||||
├── cluster-services/ # Core Kubernetes services
|
||||
│ ├── README.md
|
||||
│ ├── metallb/ # Load balancer
|
||||
│ ├── traefik/ # Ingress controller
|
||||
│ ├── cert-manager/ # Certificate management
|
||||
│ ├── longhorn/ # Distributed storage
|
||||
│ ├── coredns/ # DNS resolution
|
||||
│ ├── externaldns/ # DNS record management
|
||||
│ ├── kubernetes-dashboard/ # Web UI
|
||||
│ └── ...
|
||||
├── dnsmasq/ # DNS and PXE boot server
|
||||
├── home-scaffold/ # User directory templates
|
||||
└── operator/ # Additional operator tools
|
||||
```
|
||||
|
||||
### `/experimental/` - Development Projects
|
||||
**Purpose**: Experimental features and development tools
|
||||
```
|
||||
experimental/
|
||||
├── daemon/ # Go API daemon
|
||||
│ ├── main.go # API server
|
||||
│ ├── Makefile # Build automation
|
||||
│ └── README.md
|
||||
└── app/ # React dashboard
|
||||
├── src/ # React source code
|
||||
├── package.json # Dependencies
|
||||
├── pnpm-lock.yaml # Lock file
|
||||
└── README.md
|
||||
```
|
||||
|
||||
### `/scripts/` - Utility Scripts
|
||||
**Purpose**: Installation and utility scripts
|
||||
```
|
||||
|
||||
@@ -1,23 +0,0 @@
|
||||
# Maintenance Guide
|
||||
|
||||
Keep your wild cloud running smoothly.
|
||||
|
||||
- [Security Best Practices](./guides/security.md)
|
||||
- [Monitoring](./guides/monitoring.md)
|
||||
- [Making backups](./guides/making-backups.md)
|
||||
- [Restoring backups](./guides/restoring-backups.md)
|
||||
|
||||
## Upgrade
|
||||
|
||||
- [Upgrade applications](./guides/upgrade-applications.md)
|
||||
- [Upgrade kubernetes](./guides/upgrade-kubernetes.md)
|
||||
- [Upgrade Talos](./guides/upgrade-talos.md)
|
||||
- [Upgrade Wild Cloud](./guides/upgrade-wild-cloud.md)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- [Cluster issues](./guides/troubleshoot-cluster.md)
|
||||
- [DNS issues](./guides/troubleshoot-dns.md)
|
||||
- [Service connectivity issues](./guides/troubleshoot-service-connectivity.md)
|
||||
- [TLS certificate issues](./guides/troubleshoot-tls-certificates.md)
|
||||
- [Visibility issues](./guides/troubleshoot-visibility.md)
|
||||
@@ -1,3 +0,0 @@
|
||||
# Setting Up Your Wild Cloud
|
||||
|
||||
Visit https://mywildcloud.org/get-started for full wild cloud setup instructions.
|
||||
@@ -1,265 +0,0 @@
|
||||
# Making Backups
|
||||
|
||||
This guide covers how to create backups of your wild-cloud infrastructure using the integrated backup system.
|
||||
|
||||
## Overview
|
||||
|
||||
The wild-cloud backup system creates encrypted, deduplicated snapshots using restic. It backs up three main components:
|
||||
|
||||
- **Applications**: Database dumps and persistent volume data
|
||||
- **Cluster**: Kubernetes resources and etcd state
|
||||
- **Configuration**: Wild-cloud repository and settings
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before making backups, ensure you have:
|
||||
|
||||
1. **Environment configured**: Run `source env.sh` to load backup configuration
|
||||
2. **Restic repository**: Backup repository configured in `config.yaml`
|
||||
3. **Backup password**: Set in wild-cloud secrets
|
||||
4. **Staging directory**: Configured path for temporary backup files
|
||||
|
||||
## Backup Components
|
||||
|
||||
### Applications (`wild-app-backup`)
|
||||
|
||||
Backs up individual applications including:
|
||||
- **Database dumps**: PostgreSQL/MySQL databases in compressed custom format
|
||||
- **PVC data**: Application files streamed directly for restic deduplication
|
||||
- **Auto-discovery**: Finds databases and PVCs based on app manifest.yaml
|
||||
|
||||
### Cluster Resources (`wild-backup --cluster-only`)
|
||||
|
||||
Backs up cluster-wide resources:
|
||||
- **Kubernetes resources**: All pods, services, deployments, secrets, configmaps
|
||||
- **Storage definitions**: PersistentVolumes, PVCs, StorageClasses
|
||||
- **etcd snapshot**: Complete cluster state for disaster recovery
|
||||
|
||||
### Configuration (`wild-backup --home-only`)
|
||||
|
||||
Backs up wild-cloud configuration:
|
||||
- **Repository contents**: All app definitions, manifests, configurations
|
||||
- **Settings**: Wild-cloud configuration files and customizations
|
||||
|
||||
## Making Backups
|
||||
|
||||
### Full System Backup (Recommended)
|
||||
|
||||
Create a complete backup of everything:
|
||||
|
||||
```bash
|
||||
# Backup all components (apps + cluster + config)
|
||||
wild-backup
|
||||
```
|
||||
|
||||
This is equivalent to:
|
||||
```bash
|
||||
wild-backup --home --apps --cluster
|
||||
```
|
||||
|
||||
### Selective Backups
|
||||
|
||||
#### Applications Only
|
||||
```bash
|
||||
# All applications
|
||||
wild-backup --apps-only
|
||||
|
||||
# Single application
|
||||
wild-app-backup discourse
|
||||
|
||||
# Multiple applications
|
||||
wild-app-backup discourse gitea immich
|
||||
```
|
||||
|
||||
#### Cluster Only
|
||||
```bash
|
||||
# Kubernetes resources + etcd
|
||||
wild-backup --cluster-only
|
||||
```
|
||||
|
||||
#### Configuration Only
|
||||
```bash
|
||||
# Wild-cloud repository
|
||||
wild-backup --home-only
|
||||
```
|
||||
|
||||
### Excluding Components
|
||||
|
||||
Skip specific components:
|
||||
|
||||
```bash
|
||||
# Skip config, backup apps + cluster
|
||||
wild-backup --no-home
|
||||
|
||||
# Skip applications, backup config + cluster
|
||||
wild-backup --no-apps
|
||||
|
||||
# Skip cluster resources, backup config + apps
|
||||
wild-backup --no-cluster
|
||||
```
|
||||
|
||||
## Backup Process Details
|
||||
|
||||
### Application Backup Process
|
||||
|
||||
1. **Discovery**: Parses `manifest.yaml` to find database and PVC dependencies
|
||||
2. **Database backup**: Creates compressed custom-format dumps
|
||||
3. **PVC backup**: Streams files directly to staging for restic deduplication
|
||||
4. **Staging**: Organizes files in clean directory structure
|
||||
5. **Upload**: Creates individual restic snapshots per application
|
||||
|
||||
### Cluster Backup Process
|
||||
|
||||
1. **Resource export**: Exports all Kubernetes resources to YAML
|
||||
2. **etcd snapshot**: Creates point-in-time etcd backup via talosctl
|
||||
3. **Upload**: Creates single restic snapshot for cluster state
|
||||
|
||||
### Restic Snapshots
|
||||
|
||||
Each backup creates tagged restic snapshots:
|
||||
|
||||
```bash
|
||||
# View all snapshots
|
||||
restic snapshots
|
||||
|
||||
# Filter by component
|
||||
restic snapshots --tag discourse # Specific app
|
||||
restic snapshots --tag cluster # Cluster resources
|
||||
restic snapshots --tag wc-home # Wild-cloud config
|
||||
```
|
||||
|
||||
## Where Backup Files Are Staged
|
||||
|
||||
Before uploading to your restic repository, backup files are organized in a staging directory. This temporary area lets you see exactly what's being backed up and helps with deduplication.
|
||||
|
||||
Here's what the staging area looks like:
|
||||
|
||||
```
|
||||
backup-staging/
|
||||
├── apps/
|
||||
│ ├── discourse/
|
||||
│ │ ├── database_20250816T120000Z.dump
|
||||
│ │ ├── globals_20250816T120000Z.sql
|
||||
│ │ └── discourse/
|
||||
│ │ └── data/ # All the actual files
|
||||
│ ├── gitea/
|
||||
│ │ ├── database_20250816T120000Z.dump
|
||||
│ │ └── gitea-data/
|
||||
│ │ └── data/ # Git repositories, etc.
|
||||
│ └── immich/
|
||||
│ ├── database_20250816T120000Z.dump
|
||||
│ └── immich-data/
|
||||
│ └── upload/ # Photos and videos
|
||||
└── cluster/
|
||||
├── all-resources.yaml # All running services
|
||||
├── secrets.yaml # Passwords and certificates
|
||||
├── configmaps.yaml # Configuration data
|
||||
└── etcd-snapshot.db # Complete cluster state
|
||||
```
|
||||
|
||||
This staging approach means you can examine backup contents before they're uploaded, and restic can efficiently deduplicate files that haven't changed.
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Backup Scripts
|
||||
|
||||
Applications can provide custom backup logic:
|
||||
|
||||
```bash
|
||||
# Create apps/myapp/backup.sh for custom behavior
|
||||
chmod +x apps/myapp/backup.sh
|
||||
|
||||
# wild-app-backup will use custom script if present
|
||||
wild-app-backup myapp
|
||||
```
|
||||
|
||||
### Monitoring Backup Status
|
||||
|
||||
```bash
|
||||
# Check recent snapshots
|
||||
restic snapshots | head -20
|
||||
|
||||
# Check specific app backups
|
||||
restic snapshots --tag discourse
|
||||
|
||||
# Verify backup integrity
|
||||
restic check
|
||||
```
|
||||
|
||||
### Backup Automation
|
||||
|
||||
Set up automated backups with cron:
|
||||
|
||||
```bash
|
||||
# Daily full backup at 2 AM
|
||||
0 2 * * * cd /data/repos/payne-cloud && source env.sh && wild-backup
|
||||
|
||||
# Hourly app backups during business hours
|
||||
0 9-17 * * * cd /data/repos/payne-cloud && source env.sh && wild-backup --apps-only
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Large PVCs (like Immich photos)
|
||||
|
||||
The streaming backup approach provides:
|
||||
- **First backup**: Full transfer time (all files processed)
|
||||
- **Subsequent backups**: Only changed files processed (dramatically faster)
|
||||
- **Storage efficiency**: Restic deduplication reduces storage usage
|
||||
|
||||
### Network Usage
|
||||
|
||||
- **Database dumps**: Compressed at source, efficient transfer
|
||||
- **PVC data**: Uncompressed transfer, but restic handles deduplication
|
||||
- **etcd snapshots**: Small files, minimal impact
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**"No databases or PVCs found"**
|
||||
- App has no `manifest.yaml` with database dependencies
|
||||
- No PVCs with matching labels in app namespace
|
||||
- Create custom `backup.sh` script for special cases
|
||||
|
||||
**"kubectl not found"**
|
||||
- Ensure kubectl is installed and configured
|
||||
- Check cluster connectivity with `kubectl get nodes`
|
||||
|
||||
**"Staging directory not set"**
|
||||
- Configure `cloud.backup.staging` in `config.yaml`
|
||||
- Ensure directory exists and is writable
|
||||
|
||||
**"Could not create etcd backup"**
|
||||
- Ensure `talosctl` is installed for Talos clusters
|
||||
- Check control plane node connectivity
|
||||
- Verify etcd pods are accessible in kube-system namespace
|
||||
|
||||
### Backup Verification
|
||||
|
||||
Always verify backups periodically:
|
||||
|
||||
```bash
|
||||
# Check restic repository integrity
|
||||
restic check
|
||||
|
||||
# List recent snapshots
|
||||
restic snapshots --compact
|
||||
|
||||
# Test restore to different directory
|
||||
restic restore latest --target /tmp/restore-test
|
||||
```
|
||||
|
||||
## Security Notes
|
||||
|
||||
- **Encryption**: All backups are encrypted with your backup password
|
||||
- **Secrets**: Kubernetes secrets are included in cluster backups
|
||||
- **Access control**: Secure your backup repository and passwords
|
||||
- **Network**: Consider bandwidth usage for large initial backups
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Restoring Backups](restoring-backups.md) - Learn how to restore from backups
|
||||
- Configure automated backup schedules
|
||||
- Set up backup monitoring and alerting
|
||||
- Test disaster recovery procedures
|
||||
@@ -1,50 +0,0 @@
|
||||
# System Health Monitoring
|
||||
|
||||
## Basic Monitoring
|
||||
|
||||
Check system health with:
|
||||
|
||||
```bash
|
||||
# Node resource usage
|
||||
kubectl top nodes
|
||||
|
||||
# Pod resource usage
|
||||
kubectl top pods -A
|
||||
|
||||
# Persistent volume claims
|
||||
kubectl get pvc -A
|
||||
```
|
||||
|
||||
## Advanced Monitoring (Future Implementation)
|
||||
|
||||
Consider implementing:
|
||||
|
||||
1. **Prometheus + Grafana** for comprehensive monitoring:
|
||||
```bash
|
||||
# Placeholder for future implementation
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
|
||||
```
|
||||
|
||||
2. **Loki** for log aggregation:
|
||||
```bash
|
||||
# Placeholder for future implementation
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm install loki grafana/loki-stack --namespace logging --create-namespace
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
This document will be expanded in the future with:
|
||||
|
||||
- Detailed backup and restore procedures
|
||||
- Monitoring setup instructions
|
||||
- Comprehensive security hardening guide
|
||||
- Automated maintenance scripts
|
||||
|
||||
For now, refer to the following external resources:
|
||||
|
||||
- [K3s Documentation](https://docs.k3s.io/)
|
||||
- [Kubernetes Troubleshooting Guide](https://kubernetes.io/docs/tasks/debug/)
|
||||
- [Velero Backup Documentation](https://velero.io/docs/latest/)
|
||||
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
|
||||
@@ -1,294 +0,0 @@
|
||||
# Restoring Backups
|
||||
|
||||
This guide will walk you through restoring your applications and cluster from wild-cloud backups. Hopefully you'll never need this, but when you do, it's critical that the process works smoothly.
|
||||
|
||||
## Understanding Restore Types
|
||||
|
||||
Your wild-cloud backup system can restore different types of data depending on what you need to recover:
|
||||
|
||||
**Application restores** bring back individual applications by restoring their database contents and file storage. This is what you'll use most often - maybe you accidentally deleted something in Discourse, or Gitea got corrupted, or you want to roll back Immich to before a bad update.
|
||||
|
||||
**Cluster restores** are for disaster recovery scenarios where you need to rebuild your entire Kubernetes cluster from scratch. This includes restoring all the cluster's configuration and even its internal state.
|
||||
|
||||
**Configuration restores** bring back your wild-cloud repository and settings, which contain all the "recipes" for how your infrastructure should be set up.
|
||||
|
||||
## Before You Start Restoring
|
||||
|
||||
Make sure you have everything needed to perform restores. You need to be in your wild-cloud directory with the environment loaded (`source env.sh`). Your backup repository and password should be configured and working - you can test this by running `restic snapshots` to see your available backups.
|
||||
|
||||
Most importantly, make sure you have kubectl access to your cluster, since restores involve creating temporary pods and manipulating storage.
|
||||
|
||||
## Restoring Applications
|
||||
|
||||
### Basic Application Restore
|
||||
|
||||
The most common restore scenario is bringing back a single application. To restore the latest backup of an app:
|
||||
|
||||
```bash
|
||||
wild-app-restore discourse
|
||||
```
|
||||
|
||||
This restores both the database and all file storage for the discourse app. The restore system automatically figures out what the app needs based on its manifest file and what was backed up.
|
||||
|
||||
If you want to restore from a specific backup instead of the latest:
|
||||
|
||||
```bash
|
||||
wild-app-restore discourse abc123
|
||||
```
|
||||
|
||||
Where `abc123` is the snapshot ID from `restic snapshots --tag discourse`.
|
||||
|
||||
### Partial Restores
|
||||
|
||||
Sometimes you only need to restore part of an application. Maybe the database is fine but the files got corrupted, or vice versa.
|
||||
|
||||
To restore only the database:
|
||||
```bash
|
||||
wild-app-restore discourse --db-only
|
||||
```
|
||||
|
||||
To restore only the file storage:
|
||||
```bash
|
||||
wild-app-restore discourse --pvc-only
|
||||
```
|
||||
|
||||
To restore without database roles and permissions (if they're causing conflicts):
|
||||
```bash
|
||||
wild-app-restore discourse --skip-globals
|
||||
```
|
||||
|
||||
### Finding Available Backups
|
||||
|
||||
To see what backups are available for an app:
|
||||
```bash
|
||||
wild-app-restore discourse --list
|
||||
```
|
||||
|
||||
This shows recent snapshots with their IDs, timestamps, and what was included.
|
||||
|
||||
## How Application Restores Work
|
||||
|
||||
Understanding what happens during a restore can help when things don't go as expected.
|
||||
|
||||
### Database Restoration
|
||||
|
||||
When restoring a database, the system first downloads the backup files from your restic repository. It then prepares the database by creating any needed roles, disconnecting existing users, and dropping/recreating the database to ensure a clean restore.
|
||||
|
||||
For PostgreSQL databases, it uses `pg_restore` with parallel processing to speed up large database imports. For MySQL, it uses standard mysql import commands. The system also handles database ownership and permissions automatically.
|
||||
|
||||
### File Storage Restoration
|
||||
|
||||
File storage (PVC) restoration is more complex because it involves safely replacing files that might be actively used by running applications.
|
||||
|
||||
First, the system creates a safety snapshot using Longhorn. This means if something goes wrong during the restore, you can get back to where you started. Then it scales your application down to zero replicas so no pods are using the storage.
|
||||
|
||||
Next, it creates a temporary utility pod with the PVC mounted and copies all the backup files into place, preserving file permissions and structure. Once the data is restored and verified, it removes the utility pod and scales your application back up.
|
||||
|
||||
If everything worked correctly, the safety snapshot is automatically deleted. If something went wrong, the safety snapshot is preserved so you can recover manually.
|
||||
|
||||
## Cluster Disaster Recovery
|
||||
|
||||
Cluster restoration is much less common but critical when you need to rebuild your entire infrastructure.
|
||||
|
||||
### Restoring Kubernetes Resources
|
||||
|
||||
To restore all cluster resources from a backup:
|
||||
|
||||
```bash
|
||||
# Download cluster backup
|
||||
restic restore --tag cluster latest --target ./restore/
|
||||
|
||||
# Apply all resources
|
||||
kubectl apply -f restore/cluster/all-resources.yaml
|
||||
```
|
||||
|
||||
You can also restore specific types of resources:
|
||||
```bash
|
||||
kubectl apply -f restore/cluster/secrets.yaml
|
||||
kubectl apply -f restore/cluster/configmaps.yaml
|
||||
```
|
||||
|
||||
### Restoring etcd State
|
||||
|
||||
**Warning: This is extremely dangerous and will affect your entire cluster.**
|
||||
|
||||
etcd restoration should only be done when rebuilding a cluster from scratch. For Talos clusters:
|
||||
|
||||
```bash
|
||||
talosctl --nodes <control-plane-ip> etcd restore --from ./restore/cluster/etcd-snapshot.db
|
||||
```
|
||||
|
||||
This command stops etcd, replaces its data with the backup, and restarts the cluster. Expect significant downtime while the cluster rebuilds itself.
|
||||
|
||||
## Common Disaster Recovery Scenarios
|
||||
|
||||
### Complete Application Loss
|
||||
|
||||
When an entire application is gone (namespace deleted, pods corrupted, etc.):
|
||||
|
||||
```bash
|
||||
# Make sure the namespace exists
|
||||
kubectl create namespace discourse --dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Apply the application manifests if needed
|
||||
kubectl apply -f apps/discourse/
|
||||
|
||||
# Restore the application data
|
||||
wild-app-restore discourse
|
||||
```
|
||||
|
||||
### Complete Cluster Rebuild
|
||||
|
||||
When rebuilding a cluster from scratch:
|
||||
|
||||
First, build your new cluster infrastructure and install wild-cloud components. Then configure backup access so you can reach your backup repository.
|
||||
|
||||
Restore cluster state:
|
||||
```bash
|
||||
restic restore --tag cluster latest --target ./restore/
|
||||
# Apply etcd snapshot using appropriate method for your cluster type
|
||||
```
|
||||
|
||||
Finally, restore all applications:
|
||||
```bash
|
||||
# See what applications are backed up
|
||||
wild-app-restore --list
|
||||
|
||||
# Restore each application individually
|
||||
wild-app-restore discourse
|
||||
wild-app-restore gitea
|
||||
wild-app-restore immich
|
||||
```
|
||||
|
||||
### Rolling Back After Bad Changes
|
||||
|
||||
Sometimes you need to undo recent changes to an application:
|
||||
|
||||
```bash
|
||||
# See available snapshots
|
||||
wild-app-restore discourse --list
|
||||
|
||||
# Restore from before the problematic changes
|
||||
wild-app-restore discourse abc123
|
||||
```
|
||||
|
||||
## Cross-Cluster Migration
|
||||
|
||||
You can use backups to move applications between clusters:
|
||||
|
||||
On the source cluster, create a fresh backup:
|
||||
```bash
|
||||
wild-app-backup discourse
|
||||
```
|
||||
|
||||
On the target cluster, deploy the application manifests:
|
||||
```bash
|
||||
kubectl apply -f apps/discourse/
|
||||
```
|
||||
|
||||
Then restore the data:
|
||||
```bash
|
||||
wild-app-restore discourse
|
||||
```
|
||||
|
||||
## Verifying Successful Restores
|
||||
|
||||
After any restore, verify that everything is working correctly.
|
||||
|
||||
For databases, check that you can connect and see expected data:
|
||||
```bash
|
||||
kubectl exec -n postgres deploy/postgres-deployment -- \
|
||||
psql -U postgres -d discourse -c "SELECT count(*) FROM posts;"
|
||||
```
|
||||
|
||||
For file storage, check that files exist and applications can start:
|
||||
```bash
|
||||
kubectl get pods -n discourse
|
||||
kubectl logs -n discourse deployment/discourse
|
||||
```
|
||||
|
||||
For web applications, test that you can access them:
|
||||
```bash
|
||||
curl -f https://discourse.example.com/latest.json
|
||||
```
|
||||
|
||||
## When Things Go Wrong
|
||||
|
||||
### No Snapshots Found
|
||||
|
||||
If the restore system can't find backups for an application, check that snapshots exist:
|
||||
```bash
|
||||
restic snapshots --tag discourse
|
||||
```
|
||||
|
||||
Make sure you're using the correct app name and that backups were actually created successfully.
|
||||
|
||||
### Database Restore Failures
|
||||
|
||||
Database restores can fail if the target database isn't accessible or if there are permission issues. Check that your postgres or mysql pods are running and that you can connect to them manually.
|
||||
|
||||
Review the restore error messages carefully - they usually indicate whether the problem is with the backup file, database connectivity, or permissions.
|
||||
|
||||
### PVC Restore Failures
|
||||
|
||||
If PVC restoration fails, check that you have sufficient disk space and that the PVC isn't being used by other pods. The error messages will usually indicate what went wrong.
|
||||
|
||||
Most importantly, remember that safety snapshots are preserved when PVC restores fail. You can see them with:
|
||||
```bash
|
||||
kubectl get snapshot.longhorn.io -n longhorn-system -l app=wild-app-restore
|
||||
```
|
||||
|
||||
These snapshots let you recover to the pre-restore state if needed.
|
||||
|
||||
### Application Won't Start After Restore
|
||||
|
||||
If pods fail to start after restoration, check file permissions and ownership. Sometimes the restoration process doesn't perfectly preserve the exact permissions that the application expects.
|
||||
|
||||
You can also try scaling the application to zero and back to one, which sometimes resolves transient issues:
|
||||
```bash
|
||||
kubectl scale deployment/discourse -n discourse --replicas=0
|
||||
kubectl scale deployment/discourse -n discourse --replicas=1
|
||||
```
|
||||
|
||||
## Manual Recovery
|
||||
|
||||
When automated restore fails, you can always fall back to manual extraction and restoration:
|
||||
|
||||
```bash
|
||||
# Extract backup files to local directory
|
||||
restic restore --tag discourse latest --target ./manual-restore/
|
||||
|
||||
# Manually copy database dump to postgres pod
|
||||
kubectl cp ./manual-restore/discourse/database_*.dump \
|
||||
postgres/postgres-deployment-xxx:/tmp/
|
||||
|
||||
# Manually restore database
|
||||
kubectl exec -n postgres deploy/postgres-deployment -- \
|
||||
pg_restore -U postgres -d discourse /tmp/database_*.dump
|
||||
```
|
||||
|
||||
For file restoration, you'd need to create a utility pod and manually copy files into the PVC.
|
||||
|
||||
## Best Practices
|
||||
|
||||
Test your restore procedures regularly in a non-production environment. It's much better to discover issues with your backup system during a planned test than during an actual emergency.
|
||||
|
||||
Always communicate with users before performing restores, especially if they involve downtime. Document any manual steps you had to take so you can improve the automated process.
|
||||
|
||||
After any significant restore, monitor your applications more closely than usual for a few days. Sometimes problems don't surface immediately.
|
||||
|
||||
## Security and Access Control
|
||||
|
||||
Restore operations are powerful and can be destructive. Make sure only trusted administrators can perform restores, and consider requiring approval or coordination before major restoration operations.
|
||||
|
||||
Be aware that cluster restores include all secrets, so they potentially expose passwords, API keys, and certificates. Ensure your backup repository is properly secured.
|
||||
|
||||
Remember that Longhorn safety snapshots are preserved when things go wrong. These snapshots may contain sensitive data, so clean them up appropriately once you've resolved any issues.
|
||||
|
||||
## What's Next
|
||||
|
||||
The best way to get comfortable with restore operations is to practice them in a safe environment. Set up a test cluster and practice restoring applications and data.
|
||||
|
||||
Consider creating runbooks for your most likely disaster scenarios, including the specific commands and verification steps for your infrastructure.
|
||||
|
||||
Read the [Making Backups](making-backups.md) guide to ensure you're creating the backups you'll need for successful recovery.
|
||||
@@ -1,19 +0,0 @@
|
||||
# Troubleshoot Wild Cloud Cluster issues
|
||||
|
||||
## General Troubleshooting Steps
|
||||
|
||||
1. **Check Node Status**:
|
||||
```bash
|
||||
kubectl get nodes
|
||||
kubectl describe node <node-name>
|
||||
```
|
||||
|
||||
1. **Check Component Status**:
|
||||
```bash
|
||||
# Check all pods across all namespaces
|
||||
kubectl get pods -A
|
||||
|
||||
# Look for pods that aren't Running or Ready
|
||||
kubectl get pods -A | grep -v "Running\|Completed"
|
||||
```
|
||||
|
||||
@@ -1,20 +0,0 @@
|
||||
# Troubleshoot DNS
|
||||
|
||||
If DNS resolution isn't working properly:
|
||||
|
||||
1. Check CoreDNS status:
|
||||
```bash
|
||||
kubectl get pods -n kube-system -l k8s-app=kube-dns
|
||||
kubectl logs -l k8s-app=kube-dns -n kube-system
|
||||
```
|
||||
|
||||
2. Verify CoreDNS configuration:
|
||||
```bash
|
||||
kubectl get configmap -n kube-system coredns -o yaml
|
||||
```
|
||||
|
||||
3. Test DNS resolution from inside the cluster:
|
||||
```bash
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default
|
||||
```
|
||||
|
||||
@@ -1,18 +0,0 @@
|
||||
# Troubleshoot Service Connectivity
|
||||
|
||||
If services can't communicate:
|
||||
|
||||
1. Check network policies:
|
||||
```bash
|
||||
kubectl get networkpolicies -A
|
||||
```
|
||||
|
||||
2. Verify service endpoints:
|
||||
```bash
|
||||
kubectl get endpoints -n <namespace>
|
||||
```
|
||||
|
||||
3. Test connectivity from within the cluster:
|
||||
```bash
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- wget -O- <service-name>.<namespace>
|
||||
```
|
||||
@@ -1,24 +0,0 @@
|
||||
# Troubleshoot TLS Certificates
|
||||
|
||||
If services show invalid certificates:
|
||||
|
||||
1. Check certificate status:
|
||||
```bash
|
||||
kubectl get certificates -A
|
||||
```
|
||||
|
||||
2. Examine certificate details:
|
||||
```bash
|
||||
kubectl describe certificate <cert-name> -n <namespace>
|
||||
```
|
||||
|
||||
3. Check for cert-manager issues:
|
||||
```bash
|
||||
kubectl get pods -n cert-manager
|
||||
kubectl logs -l app=cert-manager -n cert-manager
|
||||
```
|
||||
|
||||
4. Verify the Cloudflare API token is correctly set up:
|
||||
```bash
|
||||
kubectl get secret cloudflare-api-token -n internal
|
||||
```
|
||||
@@ -1,246 +0,0 @@
|
||||
# Troubleshoot Service Visibility
|
||||
|
||||
This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them.
|
||||
|
||||
## Common Issues
|
||||
|
||||
External access to your services might fail for several reasons:
|
||||
|
||||
1. **DNS Resolution Issues** - Domain names not resolving to the correct IP address
|
||||
2. **Network Connectivity Issues** - Traffic can't reach the cluster's external IP
|
||||
3. **TLS Certificate Issues** - Invalid or missing certificates
|
||||
4. **Ingress/Service Configuration Issues** - Incorrectly configured routing
|
||||
|
||||
## Diagnostic Steps
|
||||
|
||||
### 1. Check DNS Resolution
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Browser shows "site cannot be reached" or "server IP address could not be found"
|
||||
- `ping` or `nslookup` commands fail for your domain
|
||||
- Your service DNS records don't appear in CloudFlare or your DNS provider
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check if your domain resolves (from outside the cluster)
|
||||
nslookup yourservice.yourdomain.com
|
||||
|
||||
# Check if ExternalDNS is running
|
||||
kubectl get pods -n externaldns
|
||||
|
||||
# Check ExternalDNS logs for errors
|
||||
kubectl logs -n externaldns -l app=external-dns < /dev/null | grep -i error
|
||||
kubectl logs -n externaldns -l app=external-dns | grep -i "your-service-name"
|
||||
|
||||
# Check if CloudFlare API token is configured correctly
|
||||
kubectl get secret cloudflare-api-token -n externaldns
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **ExternalDNS Not Running**: The ExternalDNS pod is not running or has errors.
|
||||
|
||||
b) **Cloudflare API Token Issues**: The API token is invalid, expired, or doesn't have the right permissions.
|
||||
|
||||
c) **Domain Filter Mismatch**: ExternalDNS is configured with a `--domain-filter` that doesn't match your domain.
|
||||
|
||||
d) **Annotations Missing**: Service or Ingress is missing the required ExternalDNS annotations.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Recreate CloudFlare API token secret
|
||||
kubectl create secret generic cloudflare-api-token \
|
||||
--namespace externaldns \
|
||||
--from-literal=api-token="your-api-token" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 2. Check and set proper annotations on your Ingress:
|
||||
kubectl annotate ingress your-ingress -n your-namespace \
|
||||
external-dns.alpha.kubernetes.io/hostname=your-service.your-domain.com
|
||||
|
||||
# 3. Restart ExternalDNS
|
||||
kubectl rollout restart deployment -n externaldns external-dns
|
||||
```
|
||||
|
||||
### 2. Check Network Connectivity
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- DNS resolves to the correct IP but the service is still unreachable
|
||||
- Only some services are unreachable while others work
|
||||
- Network timeout errors
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check if MetalLB is running
|
||||
kubectl get pods -n metallb-system
|
||||
|
||||
# Check MetalLB IP address pool
|
||||
kubectl get ipaddresspools.metallb.io -n metallb-system
|
||||
|
||||
# Verify the service has an external IP
|
||||
kubectl get svc -n your-namespace your-service
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **MetalLB Configuration**: The IP pool doesn't match your network or is exhausted.
|
||||
|
||||
b) **Firewall Issues**: Firewall is blocking traffic to your cluster's external IP.
|
||||
|
||||
c) **Router Configuration**: NAT or port forwarding issues if using a router.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Check and update MetalLB configuration
|
||||
kubectl apply -f infrastructure_setup/metallb/metallb-pool.yaml
|
||||
|
||||
# 2. Check service external IP assignment
|
||||
kubectl describe svc -n your-namespace your-service
|
||||
```
|
||||
|
||||
### 3. Check TLS Certificates
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Browser shows certificate errors
|
||||
- "Your connection is not private" warnings
|
||||
- Cert-manager logs show errors
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check certificate status
|
||||
kubectl get certificates -A
|
||||
|
||||
# Check cert-manager logs
|
||||
kubectl logs -n cert-manager -l app=cert-manager
|
||||
|
||||
# Check if your ingress is using the correct certificate
|
||||
kubectl get ingress -n your-namespace your-ingress -o yaml
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **Certificate Issuance Failures**: DNS validation or HTTP validation failing.
|
||||
|
||||
b) **Wrong Secret Referenced**: Ingress is referencing a non-existent certificate secret.
|
||||
|
||||
c) **Expired Certificate**: Certificate has expired and wasn't renewed.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Check and recreate certificates
|
||||
kubectl apply -f infrastructure_setup/cert-manager/wildcard-certificate.yaml
|
||||
|
||||
# 2. Update ingress to use correct secret
|
||||
kubectl patch ingress your-ingress -n your-namespace --type=json \
|
||||
-p='[{"op": "replace", "path": "/spec/tls/0/secretName", "value": "correct-secret-name"}]'
|
||||
```
|
||||
|
||||
### 4. Check Ingress Configuration
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- HTTP 404, 503, or other error codes
|
||||
- Service accessible from inside cluster but not outside
|
||||
- Traffic routed to wrong service
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check ingress status
|
||||
kubectl get ingress -n your-namespace
|
||||
|
||||
# Check Traefik logs
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
|
||||
|
||||
# Check ingress configuration
|
||||
kubectl describe ingress -n your-namespace your-ingress
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **Incorrect Service Targeting**: Ingress is pointing to wrong service or port.
|
||||
|
||||
b) **Traefik Configuration**: IngressClass or middleware issues.
|
||||
|
||||
c) **Path Configuration**: Incorrect path prefixes or regex.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Verify ingress configuration
|
||||
kubectl edit ingress -n your-namespace your-ingress
|
||||
|
||||
# 2. Check that the referenced service exists
|
||||
kubectl get svc -n your-namespace
|
||||
|
||||
# 3. Restart Traefik if needed
|
||||
kubectl rollout restart deployment -n kube-system traefik
|
||||
```
|
||||
|
||||
## Advanced Diagnostics
|
||||
|
||||
For more complex issues, you can use port-forwarding to test services directly:
|
||||
|
||||
```bash
|
||||
# Port-forward the service directly
|
||||
kubectl port-forward -n your-namespace svc/your-service 8080:80
|
||||
|
||||
# Then test locally
|
||||
curl http://localhost:8080
|
||||
```
|
||||
|
||||
You can also deploy a debug pod to test connectivity from inside the cluster:
|
||||
|
||||
```bash
|
||||
# Start a debug pod
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
|
||||
|
||||
# Inside the pod, test DNS and connectivity
|
||||
nslookup your-service.your-namespace.svc.cluster.local
|
||||
wget -O- http://your-service.your-namespace.svc.cluster.local
|
||||
```
|
||||
|
||||
## ExternalDNS Specifics
|
||||
|
||||
ExternalDNS can be particularly troublesome. Here are specific debugging steps:
|
||||
|
||||
1. **Check Log Level**: Set `--log-level=debug` for more detailed logs
|
||||
2. **Check Domain Filter**: Ensure `--domain-filter` includes your domain
|
||||
3. **Check Provider**: Ensure `--provider=cloudflare` (or your DNS provider)
|
||||
4. **Verify API Permissions**: CloudFlare token needs Zone.Zone and Zone.DNS permissions
|
||||
5. **Check TXT Records**: ExternalDNS uses TXT records for ownership tracking
|
||||
|
||||
```bash
|
||||
# Restart with verbose logging
|
||||
kubectl set env deployment/external-dns -n externaldns -- --log-level=debug
|
||||
|
||||
# Check for specific domain errors
|
||||
kubectl logs -n externaldns -l app=external-dns | grep -i yourservice.yourdomain.com
|
||||
```
|
||||
|
||||
## CloudFlare Specific Issues
|
||||
|
||||
When using CloudFlare, additional issues may arise:
|
||||
|
||||
1. **API Rate Limiting**: CloudFlare may rate limit frequent API calls
|
||||
2. **DNS Propagation**: Changes may take time to propagate through CloudFlare's CDN
|
||||
3. **Proxied Records**: The `external-dns.alpha.kubernetes.io/cloudflare-proxied` annotation controls whether CloudFlare proxies traffic
|
||||
4. **Access Restrictions**: CloudFlare Access or Page Rules may restrict access
|
||||
5. **API Token Permissions**: The token must have Zone:Zone:Read and Zone:DNS:Edit permissions
|
||||
6. **Zone Detection**: If using subdomains, ensure the parent domain is included in the domain filter
|
||||
|
||||
Check CloudFlare dashboard for:
|
||||
|
||||
- DNS record existence
|
||||
- API access logs
|
||||
- DNS settings including proxy status
|
||||
- Any error messages or rate limit warnings
|
||||
@@ -1,3 +0,0 @@
|
||||
# Upgrade Applications
|
||||
|
||||
TBD
|
||||
@@ -1,3 +0,0 @@
|
||||
# Upgrade Kubernetes
|
||||
|
||||
TBD
|
||||
@@ -1,3 +0,0 @@
|
||||
# Upgrade Talos
|
||||
|
||||
TBD
|
||||
@@ -1,3 +0,0 @@
|
||||
# Upgrade Wild Cloud
|
||||
|
||||
TBD
|
||||
1
wild-central
Submodule
1
wild-central
Submodule
Submodule wild-central added at 44ebbbd42c
1
wild-central-api
Submodule
1
wild-central-api
Submodule
Submodule wild-central-api added at c8fd702d1b
1
wild-cli
Submodule
1
wild-cli
Submodule
Submodule wild-cli added at 77571e8062
1
wild-cloud-poc
Submodule
1
wild-cloud-poc
Submodule
Submodule wild-cloud-poc added at 2684c46de4
Submodule wild-directory updated: db621755b3...351f58b80d
1
wild-web-app
Submodule
1
wild-web-app
Submodule
Submodule wild-web-app added at b324540ce0
Reference in New Issue
Block a user