Settle on v1 setup method. Test run completed successfully from bootstrap to service setup.
- Refactor dnsmasq configuration and scripts for improved variable handling and clarity - Updated dnsmasq configuration files to use direct variable references instead of data source functions for better readability. - Modified setup scripts to ensure they are run from the correct environment and directory, checking for the WC_HOME variable. - Changed paths in README and scripts to reflect the new directory structure. - Enhanced error handling in setup scripts to provide clearer guidance on required configurations. - Adjusted kernel and initramfs URLs in boot.ipxe to use the updated variable references.
This commit is contained in:
@@ -1,90 +1,235 @@
|
||||
# Cluster Node Setup
|
||||
|
||||
Cluster node setup is WIP. Any kubernetes setup will do. Currently, we have a working cluster using each of these methods and are moving towards Talos.
|
||||
This directory contains automation for setting up Talos Kubernetes cluster nodes with static IP configuration.
|
||||
|
||||
## k3s cluster node setup
|
||||
## Hardware Detection and Setup (Recommended)
|
||||
|
||||
K3s provides a fully-compliant Kubernetes distribution in a small footprint.
|
||||
The automated setup discovers hardware configuration from nodes in maintenance mode and generates machine configurations with the correct interface names and disk paths.
|
||||
|
||||
To set up control nodes:
|
||||
### Prerequisites
|
||||
|
||||
1. `source .env`
|
||||
2. Boot nodes with Talos ISO in maintenance mode
|
||||
3. Nodes must be accessible on the network
|
||||
|
||||
### Hardware Discovery Workflow
|
||||
|
||||
```bash
|
||||
# Install K3s without the default load balancer (we'll use MetalLB)
|
||||
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode=644 --disable servicelb --disable metallb
|
||||
# ONE-TIME CLUSTER INITIALIZATION (run once per cluster)
|
||||
./init-cluster.sh
|
||||
|
||||
# Set up kubectl configuration
|
||||
mkdir -p ~/.kube
|
||||
sudo cat /etc/rancher/k3s/k3s.yaml > ~/.kube/config
|
||||
chmod 600 ~/.kube/config
|
||||
# FOR EACH CONTROL PLANE NODE:
|
||||
|
||||
# 1. Boot node with Talos ISO (it will get a DHCP IP in maintenance mode)
|
||||
# 2. Detect hardware and update config.yaml
|
||||
./detect-node-hardware.sh <maintenance-ip> <node-number>
|
||||
|
||||
# Example: Node boots at 192.168.8.168, register as node 1
|
||||
./detect-node-hardware.sh 192.168.8.168 1
|
||||
|
||||
# 3. Generate machine config for registered nodes
|
||||
./generate-machine-configs.sh
|
||||
|
||||
# 4. Apply configuration - node will reboot with static IP
|
||||
talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml
|
||||
|
||||
# 5. Wait for reboot, node should come up at its target static IP (192.168.8.31)
|
||||
|
||||
# Repeat steps 1-5 for additional control plane nodes
|
||||
```
|
||||
|
||||
Set up the infrastructure services after these are running, then you can add more worker nodes with:
|
||||
The `detect-node-hardware.sh` script will:
|
||||
|
||||
- Connect to nodes in maintenance mode via talosctl
|
||||
- Discover active ethernet interfaces (e.g., `enp4s0` instead of hardcoded `eth0`)
|
||||
- Discover available installation disks (>10GB)
|
||||
- Update `config.yaml` with per-node hardware configuration
|
||||
- Provide next steps for machine config generation
|
||||
|
||||
The `init-cluster.sh` script will:
|
||||
|
||||
- Generate Talos cluster secrets and base configurations (once per cluster)
|
||||
- Set up talosctl context with cluster certificates
|
||||
- Configure VIP endpoint for cluster communication
|
||||
|
||||
The `generate-machine-configs.sh` script will:
|
||||
|
||||
- Check which nodes have been hardware-detected
|
||||
- Compile network configuration templates with discovered hardware settings
|
||||
- Create final machine configurations for registered nodes only
|
||||
- Include system extensions for Longhorn (iscsi-tools, util-linux-tools)
|
||||
- Update talosctl context with registered node IPs
|
||||
|
||||
### Cluster Bootstrap
|
||||
|
||||
After all control plane nodes are configured with static IPs:
|
||||
|
||||
```bash
|
||||
# On your master node, get the node token
|
||||
NODE_TOKEN=`sudo cat /var/lib/rancher/k3s/server/node-token`
|
||||
MASTER_IP=192.168.8.222
|
||||
# On each new node, join the cluster
|
||||
# Bootstrap the cluster using any control node
|
||||
talosctl bootstrap --nodes 192.168.8.31 --endpoint 192.168.8.31
|
||||
|
||||
curl -sfL https://get.k3s.io | K3S_URL=https://$MASTER_IP:6443 K3S_TOKEN=$NODE_TOKEN sh -
|
||||
```
|
||||
|
||||
## Talos cluster node setup
|
||||
|
||||
This is a new experimental method for setting up cluster nodes. We're currently working through the simplest bootstrapping experience.
|
||||
|
||||
Currently, though, all these steps are manual.
|
||||
|
||||
Copy this entire directory to your personal cloud folder and modify it as necessary as you install. We suggest putting it in `cluster/bootstrap`.
|
||||
|
||||
```bash
|
||||
|
||||
# Install kubectl
|
||||
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
|
||||
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
|
||||
echo "$(cat kubectl.sha256) kubectl" | sha256sum --check
|
||||
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
|
||||
|
||||
# Install talosctl
|
||||
curl -sL https://talos.dev/install | sh
|
||||
|
||||
# In your LAN Router (which is your DHCP server),
|
||||
|
||||
CLUSTER_NAME=test-cluster
|
||||
VIP=192.168.8.20 # Non-DHCP
|
||||
|
||||
# Boot your nodes with the ISO and put their IP addresses here. Pin in DHCP.
|
||||
# Nodes must all be on the same switch.
|
||||
# TODO: How to set these static on boot?
|
||||
CONTROL_NODE_1=192.168.8.21
|
||||
CONTROL_NODE_2=192.168.8.22
|
||||
CONTROL_NODE_3=192.168.8.23
|
||||
|
||||
# Generate cluster config files (including pki and tokens)
|
||||
cd generated
|
||||
talosctl gen secrets -o secrets.yaml
|
||||
talosctl gen config --with-secrets secrets.yaml $CLUSTER_NAME https://$VIP:6443
|
||||
talosctl config merge ./talosconfig
|
||||
cd ..
|
||||
|
||||
# If the disk you want to install Talos on isn't /dev/sda, you should
|
||||
# update to the disk you want in patch/controlplane.yml and patch/worker.yaml. If you have already attempted to install a node and received an error about not being able to find /dev/sda, you can see what disks are available on it with:
|
||||
#
|
||||
# talosctl -n $VIP get disks --insecure
|
||||
|
||||
# See https://www.talos.dev/v1.10/talos-guides/configuration/patching/
|
||||
talosctl machineconfig patch generated/controlplane.yaml --patch @patch/controlplane.yaml -o final/controlplane.yaml
|
||||
talosctl machineconfig patch generated/worker.yaml --patch @patch/worker.yaml -o final/worker.yaml
|
||||
$
|
||||
|
||||
# Apply control plane config
|
||||
talosctl apply-config --insecure -n $CONTROL_NODE_1,$CONTROL_NODE_2,$CONTROL_NODE_3 --file final/controlplane.yaml
|
||||
|
||||
# Bootstrap cluster on control plan
|
||||
talosctl bootstrap -n $VIP
|
||||
|
||||
# Merge new cluster information into kubeconfig
|
||||
# Get kubeconfig
|
||||
talosctl kubeconfig
|
||||
|
||||
# You are now ready to use both `talosctl` and `kubectl` against your new cluster.
|
||||
# Verify cluster is ready
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## Complete Example
|
||||
|
||||
Here's a complete example of setting up a 3-node control plane:
|
||||
|
||||
```bash
|
||||
# CLUSTER INITIALIZATION (once per cluster)
|
||||
./init-cluster.sh
|
||||
|
||||
# NODE 1
|
||||
# Boot node with Talos ISO, it gets DHCP IP 192.168.8.168
|
||||
./detect-node-hardware.sh 192.168.8.168 1
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml
|
||||
# Node reboots and comes up at 192.168.8.31
|
||||
|
||||
# NODE 2
|
||||
# Boot second node with Talos ISO, it gets DHCP IP 192.168.8.169
|
||||
./detect-node-hardware.sh 192.168.8.169 2
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.169 --file final/controlplane-node-2.yaml
|
||||
# Node reboots and comes up at 192.168.8.32
|
||||
|
||||
# NODE 3
|
||||
# Boot third node with Talos ISO, it gets DHCP IP 192.168.8.170
|
||||
./detect-node-hardware.sh 192.168.8.170 3
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.170 --file final/controlplane-node-3.yaml
|
||||
# Node reboots and comes up at 192.168.8.33
|
||||
|
||||
# CLUSTER BOOTSTRAP
|
||||
talosctl bootstrap -n 192.168.8.30
|
||||
talosctl kubeconfig
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## Configuration Details
|
||||
|
||||
### Per-Node Configuration
|
||||
|
||||
Each control plane node has its own configuration block in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
cluster:
|
||||
nodes:
|
||||
control:
|
||||
vip: 192.168.8.30
|
||||
node1:
|
||||
ip: 192.168.8.31
|
||||
interface: enp4s0 # Discovered automatically
|
||||
disk: /dev/sdb # Selected during hardware detection
|
||||
node2:
|
||||
ip: 192.168.8.32
|
||||
# interface and disk added after hardware detection
|
||||
node3:
|
||||
ip: 192.168.8.33
|
||||
# interface and disk added after hardware detection
|
||||
```
|
||||
|
||||
Worker nodes use DHCP by default. You can use the same hardware detection process for worker nodes if static IPs are needed.
|
||||
|
||||
## Talosconfig Management
|
||||
|
||||
### Context Naming and Conflicts
|
||||
|
||||
When running `talosctl config merge ./generated/talosconfig`, if a context with the same name already exists, talosctl will create an enumerated version (e.g., `demo-cluster-2`).
|
||||
|
||||
**For a clean setup:**
|
||||
|
||||
- Delete existing contexts before merging: `talosctl config contexts` then `talosctl config context <name> --remove`
|
||||
- Or use `--force` to overwrite: `talosctl config merge ./generated/talosconfig --force`
|
||||
|
||||
**Recommended approach for new clusters:**
|
||||
|
||||
```bash
|
||||
# Remove old context if rebuilding cluster
|
||||
talosctl config context demo-cluster --remove || true
|
||||
|
||||
# Merge new configuration
|
||||
talosctl config merge ./generated/talosconfig
|
||||
talosctl config endpoint 192.168.8.30
|
||||
talosctl config node 192.168.8.31 # Add nodes as they are registered
|
||||
```
|
||||
|
||||
### Context Configuration Timeline
|
||||
|
||||
1. **After first node hardware detection**: Merge talosconfig and set endpoint/first node
|
||||
2. **After additional nodes**: Add them to the existing context with `talosctl config node <ip1> <ip2> <ip3>`
|
||||
3. **Before cluster bootstrap**: Ensure all control plane nodes are in the node list
|
||||
|
||||
### System Extensions
|
||||
|
||||
All nodes include:
|
||||
|
||||
- `siderolabs/iscsi-tools`: Required for Longhorn storage
|
||||
- `siderolabs/util-linux-tools`: Utility tools for storage operations
|
||||
|
||||
### Hardware Detection
|
||||
|
||||
The `detect-node-hardware.sh` script automatically discovers:
|
||||
|
||||
- **Network interfaces**: Finds active ethernet interfaces (no more hardcoded `eth0`)
|
||||
- **Installation disks**: Lists available disks >10GB for interactive selection
|
||||
- **Per-node settings**: Updates `config.yaml` with hardware-specific configuration
|
||||
|
||||
This eliminates the need to manually configure hardware settings and handles different hardware configurations across nodes.
|
||||
|
||||
### Template Structure
|
||||
|
||||
Configuration templates are stored in `patch.templates/` and use gomplate syntax:
|
||||
|
||||
- `controlplane-node-1.yaml`: Template for first control plane node
|
||||
- `controlplane-node-2.yaml`: Template for second control plane node
|
||||
- `controlplane-node-3.yaml`: Template for third control plane node
|
||||
- `worker.yaml`: Template for worker nodes
|
||||
|
||||
Templates use per-node variables from `config.yaml`:
|
||||
|
||||
- `{{ .cluster.nodes.control.node1.ip }}`
|
||||
- `{{ .cluster.nodes.control.node1.interface }}`
|
||||
- `{{ .cluster.nodes.control.node1.disk }}`
|
||||
- `{{ .cluster.nodes.control.vip }}`
|
||||
|
||||
The `wild-compile-template-dir` command processes all templates and outputs compiled configurations to the `patch/` directory.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Hardware Detection Issues
|
||||
|
||||
```bash
|
||||
# Check if node is accessible in maintenance mode
|
||||
talosctl -n <NODE_IP> version --insecure
|
||||
|
||||
# View available network interfaces
|
||||
talosctl -n <NODE_IP> get links --insecure
|
||||
|
||||
# View available disks
|
||||
talosctl -n <NODE_IP> get disks --insecure
|
||||
```
|
||||
|
||||
### Manual Hardware Discovery
|
||||
|
||||
If the automatic detection fails, you can manually inspect hardware:
|
||||
|
||||
```bash
|
||||
# Find active ethernet interfaces
|
||||
talosctl -n <NODE_IP> get links --insecure -o json | jq -s '.[] | select(.spec.operationalState == "up" and .spec.type == "ether" and .metadata.id != "lo") | .metadata.id'
|
||||
|
||||
# Find suitable installation disks
|
||||
talosctl -n <NODE_IP> get disks --insecure -o json | jq -s '.[] | select(.spec.size > 10000000000) | .metadata.id'
|
||||
```
|
||||
|
||||
### Node Status
|
||||
|
||||
```bash
|
||||
# View machine configuration (only works after config is applied)
|
||||
talosctl -n <NODE_IP> get machineconfig
|
||||
```
|
||||
|
Reference in New Issue
Block a user