Files
wild-cloud-dev/ai/talos-v1.11/bare-metal-administration.md
2025-10-11 18:08:04 +00:00

11 KiB

Bare Metal Talos Administration Guide

This guide covers bare metal specific operations, configurations, and best practices for Talos Linux clusters.

META-Based Network Configuration

Talos supports META-based network configuration for bare metal deployments where configuration is embedded in the disk image.

Basic META Configuration

# META configuration for bare metal networking
machine:
  network:
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.1.100/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1
        mtu: 1500
    nameservers:
      - 8.8.8.8
      - 1.1.1.1

Advanced Network Configurations

VLAN Configuration

machine:
  network:
    interfaces:
      - interface: eth0.100  # VLAN 100
        vlan:
          parentDevice: eth0
          vid: 100
        addresses:
          - 192.168.100.10/24
        routes:
          - network: 192.168.100.0/24

Interface Bonding

machine:
  network:
    interfaces:
      - interface: bond0
        bond:
          mode: 802.3ad
          lacpRate: fast
          xmitHashPolicy: layer3+4
          miimon: 100
          updelay: 200
          downdelay: 200
          interfaces:
            - eth0
            - eth1
        addresses:
          - 192.168.1.100/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1

Bridge Configuration

machine:
  network:
    interfaces:
      - interface: br0
        bridge:
          stp:
            enabled: false
          interfaces:
            - eth0
            - eth1
        addresses:
          - 192.168.1.100/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.1.1

Network Troubleshooting Commands

# Check interface configuration
talosctl -n <IP> get addresses
talosctl -n <IP> get routes
talosctl -n <IP> get links

# Check network configuration
talosctl -n <IP> get networkconfig -o yaml

# Test network connectivity
talosctl -n <IP> list /sys/class/net
talosctl -n <IP> read /proc/net/dev

Disk Encryption for Bare Metal

LUKS2 Encryption Configuration

machine:
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - slot: 0
          static:
            passphrase: "your-secure-passphrase"
    ephemeral:
      provider: luks2
      keys:
        - slot: 0
          nodeID: {}

TPM-Based Encryption

machine:
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - slot: 0
          tpm: {}
    ephemeral:
      provider: luks2
      keys:
        - slot: 0
          tpm: {}

Key Management Operations

# Check encryption status
talosctl -n <IP> get encryptionconfig -o yaml

# Rotate encryption keys
talosctl -n <IP> apply-config --file updated-config.yaml --mode staged

SecureBoot Implementation

UKI (Unified Kernel Image) Setup

SecureBoot requires UKI format images with embedded signatures.

Generate SecureBoot Keys

# Generate platform key (PK)
talosctl gen secureboot uki --platform-key-path platform.key --platform-cert-path platform.crt

# Generate PCR signing key
talosctl gen secureboot pcr --pcr-key-path pcr.key --pcr-cert-path pcr.crt

# Generate database entries
talosctl gen secureboot database --enrolled-certificate platform.crt

Machine Configuration for SecureBoot

machine:
  secureboot:
    enabled: true
    uklPath: /boot/vmlinuz
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - slot: 0
          tpm:
            pcrTargets:
              - 0
              - 1
              - 7

UEFI Configuration

  • Enable SecureBoot in UEFI firmware
  • Enroll platform keys and certificates
  • Configure TPM 2.0 for PCR measurements
  • Set boot order for UKI images

Hardware-Specific Configurations

Performance Tuning for Bare Metal

CPU Governor Configuration

machine:
  sysfs:
    "devices.system.cpu.cpu0.cpufreq.scaling_governor": "performance"
    "devices.system.cpu.cpu1.cpufreq.scaling_governor": "performance"

Hardware Vulnerability Mitigations

machine:
  kernel:
    args:
      - mitigations=off  # For maximum performance (less secure)
      # or
      - mitigations=auto  # Default balanced approach

IOMMU Configuration

machine:
  kernel:
    args:
      - intel_iommu=on
      - iommu=pt

Memory Management

machine:
  kernel:
    args:
      - hugepages=1024  # 1GB hugepages
      - transparent_hugepage=never

Ingress Firewall for Bare Metal

Basic Firewall Configuration

machine:
  network:
    firewall:
      defaultAction: block
      rules:
        - name: allow-talos-api
          portSelector:
            ports:
              - 50000
              - 50001
          ingress:
            - subnet: 192.168.1.0/24
        - name: allow-kubernetes-api
          portSelector:
            ports:
              - 6443
          ingress:
            - subnet: 0.0.0.0/0
        - name: allow-etcd
          portSelector:
            ports:
              - 2379
              - 2380
          ingress:
            - subnet: 192.168.1.0/24

Advanced Firewall Rules

machine:
  network:
    firewall:
      defaultAction: block
      rules:
        - name: allow-ssh-management
          portSelector:
            ports:
              - 22
          ingress:
            - subnet: 10.0.1.0/24  # Management network only
        - name: allow-monitoring
          portSelector:
            ports:
              - 9100  # Node exporter
              - 10250 # kubelet metrics
          ingress:
            - subnet: 192.168.1.0/24

System Extensions for Bare Metal

Common Bare Metal Extensions

machine:
  install:
    extensions:
      - image: ghcr.io/siderolabs/iscsi-tools:latest
      - image: ghcr.io/siderolabs/util-linux-tools:latest
      - image: ghcr.io/siderolabs/drbd:latest

Storage Extensions

machine:
  install:
    extensions:
      - image: ghcr.io/siderolabs/zfs:latest
      - image: ghcr.io/siderolabs/nut-client:latest
      - image: ghcr.io/siderolabs/smartmontools:latest

Checking Extension Status

# List installed extensions
talosctl -n <IP> get extensions

# Check extension services
talosctl -n <IP> get extensionserviceconfigs

Static Pod Configuration for Bare Metal

Local Storage Static Pods

machine:
  pods:
    - name: local-storage-provisioner
      namespace: kube-system
      image: rancher/local-path-provisioner:v0.0.24
      args:
        - --config-path=/etc/config/config.json
      env:
        - name: POD_NAMESPACE
          value: kube-system
      volumeMounts:
        - name: config
          mountPath: /etc/config
        - name: local-storage
          mountPath: /opt/local-path-provisioner
      volumes:
        - name: config
          hostPath:
            path: /etc/local-storage
        - name: local-storage
          hostPath:
            path: /var/lib/local-storage

Hardware Monitoring Static Pods

machine:
  pods:
    - name: node-exporter
      namespace: monitoring
      image: prom/node-exporter:latest
      args:
        - --path.rootfs=/host
        - --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      volumeMounts:
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: rootfs
          mountPath: /host
          readOnly: true
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

Bare Metal Boot Asset Management

PXE Boot Configuration

For network booting, configure DHCP/TFTP with appropriate boot assets:

# Download kernel and initramfs for PXE
curl -LO https://github.com/siderolabs/talos/releases/download/v1.11.0/vmlinuz-amd64
curl -LO https://github.com/siderolabs/talos/releases/download/v1.11.0/initramfs-amd64.xz

USB Boot Asset Creation

# Write installer image to USB
sudo dd if=metal-amd64.iso of=/dev/sdX bs=4M status=progress

Image Factory Integration

For custom bare metal images:

# Generate schematic for bare metal with extensions
curl -X POST --data-binary @schematic.yaml \
  https://factory.talos.dev/schematics

# Download custom installer
curl -LO https://factory.talos.dev/image/<schematic-id>/v1.11.0/metal-amd64.iso

Hardware Compatibility and Drivers

Check Hardware Support

# Check PCI devices
talosctl -n <IP> read /proc/bus/pci/devices

# Check USB devices
talosctl -n <IP> read /proc/bus/usb/devices

# Check loaded kernel modules
talosctl -n <IP> read /proc/modules

# Check hardware information
talosctl -n <IP> read /proc/cpuinfo
talosctl -n <IP> read /proc/meminfo

Common Hardware Issues

Network Interface Issues

# Check interface status
talosctl -n <IP> list /sys/class/net/

# Check driver information
talosctl -n <IP> read /sys/class/net/eth0/device/driver

# Check firmware loading
talosctl -n <IP> dmesg | grep firmware

Storage Controller Issues

# Check block devices
talosctl -n <IP> disks

# Check SMART status (if smartmontools extension installed)
talosctl -n <IP> list /dev/disk/by-id/

Bare Metal Monitoring and Maintenance

Hardware Health Monitoring

# Check system temperatures (if available)
talosctl -n <IP> read /sys/class/thermal/thermal_zone0/temp

# Check power supply status
talosctl -n <IP> read /sys/class/power_supply/*/status

# Monitor system events for hardware issues
talosctl -n <IP> dmesg | grep -i error
talosctl -n <IP> dmesg | grep -i "machine check"

Performance Monitoring

# Check CPU performance
talosctl -n <IP> read /proc/cpuinfo | grep MHz
talosctl -n <IP> cgroups --preset cpu

# Check memory performance
talosctl -n <IP> memory
talosctl -n <IP> cgroups --preset memory

# Check I/O performance
talosctl -n <IP> read /proc/diskstats

Security Hardening for Bare Metal

BIOS/UEFI Security

  • Enable SecureBoot
  • Disable unused boot devices
  • Set administrator passwords
  • Enable TPM 2.0
  • Disable legacy boot modes

Physical Security

  • Secure physical access to servers
  • Use chassis intrusion detection
  • Implement network port security
  • Consider hardware-based attestation

Network Security

machine:
  network:
    firewall:
      defaultAction: block
      rules:
        # Only allow necessary services
        - name: allow-cluster-traffic
          portSelector:
            ports:
              - 6443   # Kubernetes API
              - 2379   # etcd client
              - 2380   # etcd peer
              - 10250  # kubelet API
              - 50000  # Talos API
          ingress:
            - subnet: 192.168.1.0/24

This bare metal guide provides comprehensive coverage of hardware-specific configurations, performance optimization, security hardening, and operational practices for Talos Linux on physical servers.