From ddac8775b18fd013675eb9dab8a13ad18fa36cf0 Mon Sep 17 00:00:00 2001 From: Paul Payne Date: Sat, 28 Jun 2025 09:42:27 -0700 Subject: [PATCH] Remove README.md for cluster node setup --- setup/cluster-nodes/README.md | 235 ---------------------------------- 1 file changed, 235 deletions(-) delete mode 100644 setup/cluster-nodes/README.md diff --git a/setup/cluster-nodes/README.md b/setup/cluster-nodes/README.md deleted file mode 100644 index 4df923e..0000000 --- a/setup/cluster-nodes/README.md +++ /dev/null @@ -1,235 +0,0 @@ -# Cluster Node Setup - -This directory contains automation for setting up Talos Kubernetes cluster nodes with static IP configuration. - -## Hardware Detection and Setup (Recommended) - -The automated setup discovers hardware configuration from nodes in maintenance mode and generates machine configurations with the correct interface names and disk paths. - -### Prerequisites - -1. `source .env` -2. Boot nodes with Talos ISO in maintenance mode -3. Nodes must be accessible on the network - -### Hardware Discovery Workflow - -```bash -# ONE-TIME CLUSTER INITIALIZATION (run once per cluster) -./init-cluster.sh - -# FOR EACH CONTROL PLANE NODE: - -# 1. Boot node with Talos ISO (it will get a DHCP IP in maintenance mode) -# 2. Detect hardware and update config.yaml -./detect-node-hardware.sh - -# Example: Node boots at 192.168.8.168, register as node 1 -./detect-node-hardware.sh 192.168.8.168 1 - -# 3. Generate machine config for registered nodes -./generate-machine-configs.sh - -# 4. Apply configuration - node will reboot with static IP -talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml - -# 5. Wait for reboot, node should come up at its target static IP (192.168.8.31) - -# Repeat steps 1-5 for additional control plane nodes -``` - -The `detect-node-hardware.sh` script will: - -- Connect to nodes in maintenance mode via talosctl -- Discover active ethernet interfaces (e.g., `enp4s0` instead of hardcoded `eth0`) -- Discover available installation disks (>10GB) -- Update `config.yaml` with per-node hardware configuration -- Provide next steps for machine config generation - -The `init-cluster.sh` script will: - -- Generate Talos cluster secrets and base configurations (once per cluster) -- Set up talosctl context with cluster certificates -- Configure VIP endpoint for cluster communication - -The `generate-machine-configs.sh` script will: - -- Check which nodes have been hardware-detected -- Compile network configuration templates with discovered hardware settings -- Create final machine configurations for registered nodes only -- Include system extensions for Longhorn (iscsi-tools, util-linux-tools) -- Update talosctl context with registered node IPs - -### Cluster Bootstrap - -After all control plane nodes are configured with static IPs: - -```bash -# Bootstrap the cluster using any control node -talosctl bootstrap --nodes 192.168.8.31 --endpoint 192.168.8.31 - - -# Get kubeconfig -talosctl kubeconfig - -# Verify cluster is ready -kubectl get nodes -``` - -## Complete Example - -Here's a complete example of setting up a 3-node control plane: - -```bash -# CLUSTER INITIALIZATION (once per cluster) -./init-cluster.sh - -# NODE 1 -# Boot node with Talos ISO, it gets DHCP IP 192.168.8.168 -./detect-node-hardware.sh 192.168.8.168 1 -./generate-machine-configs.sh -talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml -# Node reboots and comes up at 192.168.8.31 - -# NODE 2 -# Boot second node with Talos ISO, it gets DHCP IP 192.168.8.169 -./detect-node-hardware.sh 192.168.8.169 2 -./generate-machine-configs.sh -talosctl apply-config --insecure -n 192.168.8.169 --file final/controlplane-node-2.yaml -# Node reboots and comes up at 192.168.8.32 - -# NODE 3 -# Boot third node with Talos ISO, it gets DHCP IP 192.168.8.170 -./detect-node-hardware.sh 192.168.8.170 3 -./generate-machine-configs.sh -talosctl apply-config --insecure -n 192.168.8.170 --file final/controlplane-node-3.yaml -# Node reboots and comes up at 192.168.8.33 - -# CLUSTER BOOTSTRAP -talosctl bootstrap -n 192.168.8.30 -talosctl kubeconfig -kubectl get nodes -``` - -## Configuration Details - -### Per-Node Configuration - -Each control plane node has its own configuration block in `config.yaml`: - -```yaml -cluster: - nodes: - control: - vip: 192.168.8.30 - node1: - ip: 192.168.8.31 - interface: enp4s0 # Discovered automatically - disk: /dev/sdb # Selected during hardware detection - node2: - ip: 192.168.8.32 - # interface and disk added after hardware detection - node3: - ip: 192.168.8.33 - # interface and disk added after hardware detection -``` - -Worker nodes use DHCP by default. You can use the same hardware detection process for worker nodes if static IPs are needed. - -## Talosconfig Management - -### Context Naming and Conflicts - -When running `talosctl config merge ./generated/talosconfig`, if a context with the same name already exists, talosctl will create an enumerated version (e.g., `demo-cluster-2`). - -**For a clean setup:** - -- Delete existing contexts before merging: `talosctl config contexts` then `talosctl config context --remove` -- Or use `--force` to overwrite: `talosctl config merge ./generated/talosconfig --force` - -**Recommended approach for new clusters:** - -```bash -# Remove old context if rebuilding cluster -talosctl config context demo-cluster --remove || true - -# Merge new configuration -talosctl config merge ./generated/talosconfig -talosctl config endpoint 192.168.8.30 -talosctl config node 192.168.8.31 # Add nodes as they are registered -``` - -### Context Configuration Timeline - -1. **After first node hardware detection**: Merge talosconfig and set endpoint/first node -2. **After additional nodes**: Add them to the existing context with `talosctl config node ` -3. **Before cluster bootstrap**: Ensure all control plane nodes are in the node list - -### System Extensions - -All nodes include: - -- `siderolabs/iscsi-tools`: Required for Longhorn storage -- `siderolabs/util-linux-tools`: Utility tools for storage operations - -### Hardware Detection - -The `detect-node-hardware.sh` script automatically discovers: - -- **Network interfaces**: Finds active ethernet interfaces (no more hardcoded `eth0`) -- **Installation disks**: Lists available disks >10GB for interactive selection -- **Per-node settings**: Updates `config.yaml` with hardware-specific configuration - -This eliminates the need to manually configure hardware settings and handles different hardware configurations across nodes. - -### Template Structure - -Configuration templates are stored in `patch.templates/` and use gomplate syntax: - -- `controlplane-node-1.yaml`: Template for first control plane node -- `controlplane-node-2.yaml`: Template for second control plane node -- `controlplane-node-3.yaml`: Template for third control plane node -- `worker.yaml`: Template for worker nodes - -Templates use per-node variables from `config.yaml`: - -- `{{ .cluster.nodes.control.node1.ip }}` -- `{{ .cluster.nodes.control.node1.interface }}` -- `{{ .cluster.nodes.control.node1.disk }}` -- `{{ .cluster.nodes.control.vip }}` - -The `wild-compile-template-dir` command processes all templates and outputs compiled configurations to the `patch/` directory. - -## Troubleshooting - -### Hardware Detection Issues - -```bash -# Check if node is accessible in maintenance mode -talosctl -n version --insecure - -# View available network interfaces -talosctl -n get links --insecure - -# View available disks -talosctl -n get disks --insecure -``` - -### Manual Hardware Discovery - -If the automatic detection fails, you can manually inspect hardware: - -```bash -# Find active ethernet interfaces -talosctl -n get links --insecure -o json | jq -s '.[] | select(.spec.operationalState == "up" and .spec.type == "ether" and .metadata.id != "lo") | .metadata.id' - -# Find suitable installation disks -talosctl -n get disks --insecure -o json | jq -s '.[] | select(.spec.size > 10000000000) | .metadata.id' -``` - -### Node Status - -```bash -# View machine configuration (only works after config is applied) -talosctl -n get machineconfig -```