Files
wild-cloud/my-scaffold/docs/node-setup.md
Paul Payne f1fe4f9cc2 Settle on v1 setup method. Test run completed successfully from bootstrap to service setup.
- Refactor dnsmasq configuration and scripts for improved variable handling and clarity
- Updated dnsmasq configuration files to use direct variable references instead of data source functions for better readability.
- Modified setup scripts to ensure they are run from the correct environment and directory, checking for the WC_HOME variable.
- Changed paths in README and scripts to reflect the new directory structure.
- Enhanced error handling in setup scripts to provide clearer guidance on required configurations.
- Adjusted kernel and initramfs URLs in boot.ipxe to use the updated variable references.
2025-06-24 15:12:53 -07:00

6.6 KiB

Node Setup Guide

This guide covers setting up Talos Linux nodes for your Kubernetes cluster using USB boot.

Overview

There are two main approaches for booting Talos nodes:

  1. USB Boot (covered here) - Boot from a custom USB drive with system extensions
  2. PXE Boot - Network boot using dnsmasq setup (see setup/dnsmasq/README.md)

USB Boot Setup

Prerequisites

  • Target hardware for Kubernetes nodes
  • USB drive (8GB+ recommended)
  • Admin access to create bootable USB drives

Step 1: Upload Schematic and Download Custom Talos ISO

First, upload the system extensions schematic to Talos Image Factory, then download the custom ISO.

# Upload schematic configuration to get schematic ID
wild-talos-schema

# Download custom ISO with system extensions
wild-talos-iso

The custom ISO includes system extensions (iscsi-tools, util-linux-tools, intel-ucode, gvisor) needed for the cluster and is saved to .wildcloud/iso/talos-v1.10.3-metal-amd64.iso.

Step 2: Create Bootable USB Drive

# Find your USB device (be careful to select the right device!)
lsblk
sudo dmesg | tail  # Check for recently connected USB devices

# Create bootable USB (replace /dev/sdX with your USB device)
sudo dd if=.wildcloud/iso/talos-v1.10.3-metal-amd64.iso of=/dev/sdX bs=4M status=progress sync

# Verify the write completed
sync

⚠️ Warning: Double-check the device path (/dev/sdX). Writing to the wrong device will destroy data!

macOS

# Find your USB device
diskutil list

# Unmount the USB drive (replace diskX with your USB device)
diskutil unmountDisk /dev/diskX

# Create bootable USB
sudo dd if=.wildcloud/iso/talos-v1.10.3-metal-amd64.iso of=/dev/rdiskX bs=4m

# Eject when complete
diskutil eject /dev/diskX

Windows

Use one of these tools:

  1. Rufus (Recommended)

    • Download from https://rufus.ie/
    • Select the Talos ISO file
    • Choose your USB drive
    • Use "DD Image" mode
    • Click "START"
  2. Balena Etcher

  3. Command Line (Windows 10/11)

    # List disks to find USB drive number
    diskpart
    list disk
    exit
    
    # Write ISO (replace X with your USB disk number)
    dd if=.wildcloud\iso\talos-v1.10.3-metal-amd64.iso of=\\.\PhysicalDriveX bs=4M --progress
    

Step 3: Boot Target Machine

  1. Insert USB into target machine
  2. Boot from USB:
    • Restart machine and enter BIOS/UEFI (usually F2, F12, DEL, or ESC during startup)
    • Change boot order to prioritize USB drive
    • Or use one-time boot menu (usually F12)
  3. Talos will boot in maintenance mode with a DHCP IP

Step 4: Hardware Detection and Configuration

Once the machine boots, it will be in maintenance mode with a DHCP IP address.

# Find the node's maintenance IP (check your router/DHCP server)
# Then detect hardware and register the node
cd setup/cluster-nodes
./detect-node-hardware.sh <maintenance-ip> <node-number>

# Example: Node got DHCP IP 192.168.8.150, registering as node 1
./detect-node-hardware.sh 192.168.8.150 1

This script will:

  • Discover network interface names (e.g., enp4s0)
  • List available disks for installation
  • Update config.yaml with node-specific hardware settings

Step 5: Generate and Apply Configuration

# Generate machine configurations with detected hardware
./generate-machine-configs.sh

# Apply configuration (node will reboot with static IP)
talosctl apply-config --insecure -n <maintenance-ip> --file final/controlplane-node-<number>.yaml

# Example:
talosctl apply-config --insecure -n 192.168.8.150 --file final/controlplane-node-1.yaml

Step 6: Verify Installation

After reboot, the node should come up with its assigned static IP:

# Check connectivity (node 1 should be at 192.168.8.31)
ping 192.168.8.31

# Verify system extensions are installed
talosctl -e 192.168.8.31 -n 192.168.8.31 get extensions

# Check for iscsi tools
talosctl -e 192.168.8.31 -n 192.168.8.31 list /usr/local/bin/ | grep iscsi

Repeat for Additional Nodes

For each additional control plane node:

  1. Boot with the same USB drive
  2. Run hardware detection with the new maintenance IP and node number
  3. Generate and apply configurations
  4. Verify the node comes up at its static IP

Example for node 2:

./detect-node-hardware.sh 192.168.8.151 2
./generate-machine-configs.sh
talosctl apply-config --insecure -n 192.168.8.151 --file final/controlplane-node-2.yaml

Cluster Bootstrap

Once all control plane nodes are configured:

# Bootstrap the cluster using the VIP
talosctl bootstrap -n 192.168.8.30

# Get kubeconfig
talosctl kubeconfig

# Verify cluster
kubectl get nodes

Troubleshooting

USB Boot Issues

  • Machine won't boot from USB: Check BIOS boot order, disable Secure Boot if needed
  • Talos doesn't start: Verify ISO was written correctly, try re-creating USB
  • Network issues: Ensure DHCP is available on your network

Hardware Detection Issues

  • Node not accessible: Check IP assignment, firewall settings
  • Wrong interface detected: Manual override in config.yaml if needed
  • Disk not found: Verify disk size (must be >10GB), check disk health

Installation Issues

  • Static IP not assigned: Check network configuration in machine config
  • Extensions not installed: Verify ISO includes extensions, check upgrade logs
  • Node won't join cluster: Check certificates, network connectivity to VIP

Checking Logs

# View system logs
talosctl -e <node-ip> -n <node-ip> logs machined

# Check kernel messages
talosctl -e <node-ip> -n <node-ip> dmesg

# Monitor services
talosctl -e <node-ip> -n <node-ip> get services

System Extensions Included

The custom ISO includes these extensions:

  • siderolabs/iscsi-tools: iSCSI initiator tools for persistent storage
  • siderolabs/util-linux-tools: Utility tools including fstrim for storage
  • siderolabs/intel-ucode: Intel CPU microcode updates (harmless on AMD)
  • siderolabs/gvisor: Container runtime sandbox (optional security enhancement)

These extensions enable:

  • Longhorn distributed storage
  • Improved security isolation
  • CPU microcode updates
  • Storage optimization tools

Next Steps

After all nodes are configured:

  1. Install CNI: Deploy a Container Network Interface (Cilium, Calico, etc.)
  2. Install CSI: Deploy Container Storage Interface (Longhorn for persistent storage)
  3. Deploy workloads: Your applications and services
  4. Monitor cluster: Set up monitoring and logging

See the main project documentation for application deployment guides.