Files
wild-cloud-dev/docs/future/longhorn-disk-configuration.md
2026-05-16 22:24:30 +00:00

10 KiB

Longhorn Storage Disk Configuration

Current Problem

Wild Cloud currently doesn't properly configure additional storage disks for Longhorn during node setup, causing Longhorn to use the OS disk instead of dedicated storage disks. This leads to:

  • Insufficient storage capacity - OS disks are typically small (100-200GB)
  • Performance issues - OS and storage I/O compete for the same disk
  • Disk pressure warnings - Longhorn marks nodes as unschedulable when OS disk fills up

Example Case

Worker-1 in production has three disks:

  • /dev/sdb: 117GB (OS disk) - Previously used by Longhorn (now removed)
  • /dev/nvme0n1: 976GB NVMe - Now configured as Longhorn storage via ExistingVolumeConfig
  • /dev/sda: 1.9TB SATA (unused)

Root Cause

1. Talos Doesn't Auto-Mount Additional Disks

Talos Linux requires explicit configuration for additional disks. They don't automatically mount or become available for use.

2. Wild Cloud's Incomplete Configuration

The current worker patch template (api/internal/setup/cluster-nodes/patch.templates/worker.yaml) only configures a self-referencing bind mount:

machine:
  kubelet:
    extraMounts:
      - destination: /var/lib/longhorn
        type: bind
        source: /var/lib/longhorn  # This just binds to itself!
        options:
          - bind
          - rshared
          - rw

This doesn't actually mount a different disk - it just creates a bind mount from /var/lib/longhorn to itself, which remains on the OS disk.

3. No Disk Detection or Configuration

Wild Cloud doesn't:

  • Detect available storage disks during node configuration
  • Configure the disk specified in config.yaml (e.g., disk: /dev/sda)
  • Provide UI/CLI options for selecting which disk to use for storage

Implemented Solution (Worker-1)

Worker-1 has been manually configured as a reference implementation. The approach uses Talos ExistingVolumeConfig (v1.11+) combined with Longhorn DaemonSet hostPath volumes.

How It Works

1. Talos ExistingVolumeConfig

Mounts an existing partition by UUID at /var/mnt/longhorn-nvme without reformatting:

---
apiVersion: v1alpha1
kind: ExistingVolumeConfig
name: longhorn-nvme
discovery:
  volumeSelector:
    match: volume.partition_uuid == "54e9771a-74d6-4242-bdcb-9c2398ef5d91"
mount: {}

This is applied as a second config document alongside the main machine config via talosctl apply-config. The NVMe mounts at /var/mnt/longhorn-nvme (Talos convention for ExistingVolumeConfig).

Key details:

  • Requires Talos v1.11+ (ExistingVolumeConfig not available in v1.10)
  • The partition must already exist with a filesystem (XFS in our case)
  • Get the partition UUID with: talosctl get discoveredvolumes <partition> -o yaml
  • Validates with: talosctl validate -m metal -c <config-file>
  • Apply with: talosctl apply-config -f <config-file> --mode no-reboot

2. Longhorn DaemonSet hostPath Volume

The longhorn-manager DaemonSet needs a hostPath volume to access the NVMe:

# Volume definition
volumes:
- hostPath:
    path: /var/mnt/longhorn-nvme    # Host path (where Talos mounts the NVMe)
  name: longhorn-nvme

# Volume mount in container
volumeMounts:
- mountPath: /var/mnt/longhorn-nvme  # Container path (MUST match host path)
  mountPropagation: Bidirectional
  name: longhorn-nvme

3. Longhorn Node Disk Configuration

The Longhorn node spec points to the mount path:

spec:
  disks:
    nvme-disk:
      allowScheduling: true
      diskType: filesystem
      path: /var/mnt/longhorn-nvme/
      storageReserved: 0
      tags:
      - nvme

The longhorn-disk.cfg file at the root of the NVMe filesystem stores the disk identity:

{"diskName":"nvme-disk","diskUUID":"3dd490e4-5c5f-4422-bcd8-f11d18580431","diskDriver":""}

4. No kubelet extraMounts Needed

The kubelet extraMounts are NOT needed for this approach. The DaemonSet hostPath volume handles the mount directly. kubelet extraMounts only affect the kubelet container's mount namespace and do not propagate to pod hostPath volumes.

Critical Lessons Learned

Path Alignment Between Pods

Longhorn uses multiple pod types that access the disk:

  • longhorn-manager: Accesses the disk via its DaemonSet hostPath volume
  • instance-manager: Accesses the host filesystem via /host/ mount (host root)

The disk path in the Longhorn node spec must work from BOTH perspectives:

  • longhorn-manager sees it via the DaemonSet volumeMount
  • instance-manager sees it via /host/<path>

The container mountPath MUST equal the host path. If the DaemonSet maps host /var/mnt/longhorn-nvme to container /var/lib/longhorn-nvme, the longhorn-manager sees the NVMe at /var/lib/longhorn-nvme but the instance-manager sees the OS disk at /host/var/lib/longhorn-nvme. This causes wrong storage capacity reporting.

The fix: use the same path everywhere (/var/mnt/longhorn-nvme).

kubelet extraMounts Don't Affect hostPath Volumes

Talos machine.kubelet.extraMounts add bind mounts to the kubelet container's CRI sandbox. They do NOT affect pod hostPath volume resolution. Pods with hostPath volumes always resolve from the actual host filesystem. Don't use extraMounts for Longhorn disk mounting.

longhorn-disk.cfg Regeneration

When the NVMe is unmounted (e.g., after a reboot before ExistingVolumeConfig is applied), the longhorn-manager may write a new longhorn-disk.cfg with a fresh UUID to the mount point on the EPHEMERAL partition. When the NVMe is remounted, the old longhorn-disk.cfg on the NVMe still has the correct UUID. If the cfg on the NVMe gets overwritten with a wrong UUID, fix it:

# Inside the longhorn-manager pod
echo '{"diskName":"nvme-disk","diskUUID":"<correct-uuid>","diskDriver":""}' > /var/mnt/longhorn-nvme/longhorn-disk.cfg

Talos Version Requirements

  • ExistingVolumeConfig requires Talos v1.11+
  • Upgrade path: v1.10 -> v1.11 (one minor version at a time)
  • After upgrading a worker, the talosctl endpoint must use the node's actual current IP (check kubectl get nodes -o wide), not the config's targetIp
  • Control plane nodes on v1.10 can still manage workers on v1.11 via the VIP proxy

Replica Recovery After Path Changes

When changing the Longhorn disk path, some replicas may be left with stale dataDirectoryName values that don't exist on disk. These stopped replicas should be deleted so Longhorn creates fresh replacements that rebuild from healthy replicas on other nodes.

Proposed Automation

1. Automatic Storage Disk Detection

During node configuration, Wild Cloud should:

// Detect available disks on the node
disks := detectAvailableDisks(nodeIP)

// Filter out OS disk and find suitable storage disks
storageDisk := selectBestStorageDisk(disks, config.Disk)

// Generate ExistingVolumeConfig for the storage disk partition
if storageDisk != "" {
    partitionUUID := getPartitionUUID(storageDisk)
    // Add ExistingVolumeConfig document to the node's Talos config
}

2. Configuration Schema Updates

Add storage disk configuration to the node configuration:

cluster:
  nodes:
    active:
      worker-1:
        role: worker
        disk: /dev/sdb          # OS installation disk
        storageDisk: /dev/nvme0n1  # Dedicated storage disk
        storagePartitionUUID: 54e9771a-74d6-4242-bdcb-9c2398ef5d91
        currentIp: 192.168.8.158
        targetIp: 192.168.8.158

3. Longhorn DaemonSet Management

When a storage disk is configured, Wild Cloud should:

  1. Add a hostPath volume to the longhorn-manager DaemonSet for the mount path
  2. Configure the Longhorn node spec with the disk path and tags
  3. Write the longhorn-disk.cfg to the disk if not present

4. Web UI Enhancements

Add storage disk selection during node configuration:

  • Show available disks when configuring a node
  • Allow selection of OS disk and storage disk separately
  • Validate disk selections (ensure they're different)
  • Show disk sizes to help users make informed choices

5. Migration Path for Existing Clusters

For clusters already using the wrong disk:

  1. Add new disk to Longhorn via ExistingVolumeConfig + DaemonSet update
  2. Evict replicas from OS disk to storage disk (disable scheduling, request eviction)
  3. Remove OS disk from Longhorn node spec

Implementation Steps

Phase 1: Detection and Configuration (Priority: High)

  1. Add disk detection to node configuration API
  2. Update node configuration to include storage disk selection
  3. Generate ExistingVolumeConfig documents for worker nodes with storage disks
  4. Manage longhorn-manager DaemonSet hostPath volumes
  5. Update Web UI to show disk options during node setup

Phase 2: Validation and Safety (Priority: Medium)

  1. Validate disk isn't already in use
  2. Check disk size meets minimum requirements (>100GB)
  3. Prevent selection of OS disk as storage disk
  4. Add warnings when storage disk isn't configured
  5. Validate longhorn-disk.cfg UUID consistency

Phase 3: Migration Tools (Priority: Low)

  1. Create tools to migrate existing Longhorn data between disks
  2. Add disk reconfiguration workflow for existing nodes
  3. Provide backup/restore path for disk changes

Testing Requirements

  1. New Installation: Verify storage disk is properly configured during initial setup
  2. Upgrade Path: Ensure existing clusters continue working without breaking changes
  3. Multi-Disk Scenarios: Test with various disk configurations (NVMe, SATA, mixed)
  4. Failure Cases: Test behavior when storage disk fails or is removed
  5. Reboot Persistence: Verify NVMe mount survives node reboots via ExistingVolumeConfig

References

Timeline

  • Done: Manual fix for worker-1 (ExistingVolumeConfig + DaemonSet + Longhorn node spec)
  • Next: Apply same pattern to worker-2 and worker-3
  • v0.2.0: Implement basic storage disk selection in UI
  • v0.3.0: Add automatic disk detection and validation
  • v0.4.0: Provide migration tools for existing clusters