Remove README.md for cluster node setup
This commit is contained in:
@@ -1,235 +0,0 @@
|
||||
# Cluster Node Setup
|
||||
|
||||
This directory contains automation for setting up Talos Kubernetes cluster nodes with static IP configuration.
|
||||
|
||||
## Hardware Detection and Setup (Recommended)
|
||||
|
||||
The automated setup discovers hardware configuration from nodes in maintenance mode and generates machine configurations with the correct interface names and disk paths.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. `source .env`
|
||||
2. Boot nodes with Talos ISO in maintenance mode
|
||||
3. Nodes must be accessible on the network
|
||||
|
||||
### Hardware Discovery Workflow
|
||||
|
||||
```bash
|
||||
# ONE-TIME CLUSTER INITIALIZATION (run once per cluster)
|
||||
./init-cluster.sh
|
||||
|
||||
# FOR EACH CONTROL PLANE NODE:
|
||||
|
||||
# 1. Boot node with Talos ISO (it will get a DHCP IP in maintenance mode)
|
||||
# 2. Detect hardware and update config.yaml
|
||||
./detect-node-hardware.sh <maintenance-ip> <node-number>
|
||||
|
||||
# Example: Node boots at 192.168.8.168, register as node 1
|
||||
./detect-node-hardware.sh 192.168.8.168 1
|
||||
|
||||
# 3. Generate machine config for registered nodes
|
||||
./generate-machine-configs.sh
|
||||
|
||||
# 4. Apply configuration - node will reboot with static IP
|
||||
talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml
|
||||
|
||||
# 5. Wait for reboot, node should come up at its target static IP (192.168.8.31)
|
||||
|
||||
# Repeat steps 1-5 for additional control plane nodes
|
||||
```
|
||||
|
||||
The `detect-node-hardware.sh` script will:
|
||||
|
||||
- Connect to nodes in maintenance mode via talosctl
|
||||
- Discover active ethernet interfaces (e.g., `enp4s0` instead of hardcoded `eth0`)
|
||||
- Discover available installation disks (>10GB)
|
||||
- Update `config.yaml` with per-node hardware configuration
|
||||
- Provide next steps for machine config generation
|
||||
|
||||
The `init-cluster.sh` script will:
|
||||
|
||||
- Generate Talos cluster secrets and base configurations (once per cluster)
|
||||
- Set up talosctl context with cluster certificates
|
||||
- Configure VIP endpoint for cluster communication
|
||||
|
||||
The `generate-machine-configs.sh` script will:
|
||||
|
||||
- Check which nodes have been hardware-detected
|
||||
- Compile network configuration templates with discovered hardware settings
|
||||
- Create final machine configurations for registered nodes only
|
||||
- Include system extensions for Longhorn (iscsi-tools, util-linux-tools)
|
||||
- Update talosctl context with registered node IPs
|
||||
|
||||
### Cluster Bootstrap
|
||||
|
||||
After all control plane nodes are configured with static IPs:
|
||||
|
||||
```bash
|
||||
# Bootstrap the cluster using any control node
|
||||
talosctl bootstrap --nodes 192.168.8.31 --endpoint 192.168.8.31
|
||||
|
||||
|
||||
# Get kubeconfig
|
||||
talosctl kubeconfig
|
||||
|
||||
# Verify cluster is ready
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## Complete Example
|
||||
|
||||
Here's a complete example of setting up a 3-node control plane:
|
||||
|
||||
```bash
|
||||
# CLUSTER INITIALIZATION (once per cluster)
|
||||
./init-cluster.sh
|
||||
|
||||
# NODE 1
|
||||
# Boot node with Talos ISO, it gets DHCP IP 192.168.8.168
|
||||
./detect-node-hardware.sh 192.168.8.168 1
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.168 --file final/controlplane-node-1.yaml
|
||||
# Node reboots and comes up at 192.168.8.31
|
||||
|
||||
# NODE 2
|
||||
# Boot second node with Talos ISO, it gets DHCP IP 192.168.8.169
|
||||
./detect-node-hardware.sh 192.168.8.169 2
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.169 --file final/controlplane-node-2.yaml
|
||||
# Node reboots and comes up at 192.168.8.32
|
||||
|
||||
# NODE 3
|
||||
# Boot third node with Talos ISO, it gets DHCP IP 192.168.8.170
|
||||
./detect-node-hardware.sh 192.168.8.170 3
|
||||
./generate-machine-configs.sh
|
||||
talosctl apply-config --insecure -n 192.168.8.170 --file final/controlplane-node-3.yaml
|
||||
# Node reboots and comes up at 192.168.8.33
|
||||
|
||||
# CLUSTER BOOTSTRAP
|
||||
talosctl bootstrap -n 192.168.8.30
|
||||
talosctl kubeconfig
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
## Configuration Details
|
||||
|
||||
### Per-Node Configuration
|
||||
|
||||
Each control plane node has its own configuration block in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
cluster:
|
||||
nodes:
|
||||
control:
|
||||
vip: 192.168.8.30
|
||||
node1:
|
||||
ip: 192.168.8.31
|
||||
interface: enp4s0 # Discovered automatically
|
||||
disk: /dev/sdb # Selected during hardware detection
|
||||
node2:
|
||||
ip: 192.168.8.32
|
||||
# interface and disk added after hardware detection
|
||||
node3:
|
||||
ip: 192.168.8.33
|
||||
# interface and disk added after hardware detection
|
||||
```
|
||||
|
||||
Worker nodes use DHCP by default. You can use the same hardware detection process for worker nodes if static IPs are needed.
|
||||
|
||||
## Talosconfig Management
|
||||
|
||||
### Context Naming and Conflicts
|
||||
|
||||
When running `talosctl config merge ./generated/talosconfig`, if a context with the same name already exists, talosctl will create an enumerated version (e.g., `demo-cluster-2`).
|
||||
|
||||
**For a clean setup:**
|
||||
|
||||
- Delete existing contexts before merging: `talosctl config contexts` then `talosctl config context <name> --remove`
|
||||
- Or use `--force` to overwrite: `talosctl config merge ./generated/talosconfig --force`
|
||||
|
||||
**Recommended approach for new clusters:**
|
||||
|
||||
```bash
|
||||
# Remove old context if rebuilding cluster
|
||||
talosctl config context demo-cluster --remove || true
|
||||
|
||||
# Merge new configuration
|
||||
talosctl config merge ./generated/talosconfig
|
||||
talosctl config endpoint 192.168.8.30
|
||||
talosctl config node 192.168.8.31 # Add nodes as they are registered
|
||||
```
|
||||
|
||||
### Context Configuration Timeline
|
||||
|
||||
1. **After first node hardware detection**: Merge talosconfig and set endpoint/first node
|
||||
2. **After additional nodes**: Add them to the existing context with `talosctl config node <ip1> <ip2> <ip3>`
|
||||
3. **Before cluster bootstrap**: Ensure all control plane nodes are in the node list
|
||||
|
||||
### System Extensions
|
||||
|
||||
All nodes include:
|
||||
|
||||
- `siderolabs/iscsi-tools`: Required for Longhorn storage
|
||||
- `siderolabs/util-linux-tools`: Utility tools for storage operations
|
||||
|
||||
### Hardware Detection
|
||||
|
||||
The `detect-node-hardware.sh` script automatically discovers:
|
||||
|
||||
- **Network interfaces**: Finds active ethernet interfaces (no more hardcoded `eth0`)
|
||||
- **Installation disks**: Lists available disks >10GB for interactive selection
|
||||
- **Per-node settings**: Updates `config.yaml` with hardware-specific configuration
|
||||
|
||||
This eliminates the need to manually configure hardware settings and handles different hardware configurations across nodes.
|
||||
|
||||
### Template Structure
|
||||
|
||||
Configuration templates are stored in `patch.templates/` and use gomplate syntax:
|
||||
|
||||
- `controlplane-node-1.yaml`: Template for first control plane node
|
||||
- `controlplane-node-2.yaml`: Template for second control plane node
|
||||
- `controlplane-node-3.yaml`: Template for third control plane node
|
||||
- `worker.yaml`: Template for worker nodes
|
||||
|
||||
Templates use per-node variables from `config.yaml`:
|
||||
|
||||
- `{{ .cluster.nodes.control.node1.ip }}`
|
||||
- `{{ .cluster.nodes.control.node1.interface }}`
|
||||
- `{{ .cluster.nodes.control.node1.disk }}`
|
||||
- `{{ .cluster.nodes.control.vip }}`
|
||||
|
||||
The `wild-compile-template-dir` command processes all templates and outputs compiled configurations to the `patch/` directory.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Hardware Detection Issues
|
||||
|
||||
```bash
|
||||
# Check if node is accessible in maintenance mode
|
||||
talosctl -n <NODE_IP> version --insecure
|
||||
|
||||
# View available network interfaces
|
||||
talosctl -n <NODE_IP> get links --insecure
|
||||
|
||||
# View available disks
|
||||
talosctl -n <NODE_IP> get disks --insecure
|
||||
```
|
||||
|
||||
### Manual Hardware Discovery
|
||||
|
||||
If the automatic detection fails, you can manually inspect hardware:
|
||||
|
||||
```bash
|
||||
# Find active ethernet interfaces
|
||||
talosctl -n <NODE_IP> get links --insecure -o json | jq -s '.[] | select(.spec.operationalState == "up" and .spec.type == "ether" and .metadata.id != "lo") | .metadata.id'
|
||||
|
||||
# Find suitable installation disks
|
||||
talosctl -n <NODE_IP> get disks --insecure -o json | jq -s '.[] | select(.spec.size > 10000000000) | .metadata.id'
|
||||
```
|
||||
|
||||
### Node Status
|
||||
|
||||
```bash
|
||||
# View machine configuration (only works after config is applied)
|
||||
talosctl -n <NODE_IP> get machineconfig
|
||||
```
|
Reference in New Issue
Block a user