7.7 KiB
Talos Architecture and Components Guide
This guide provides deep understanding of Talos Linux architecture and system components for effective cluster administration.
Core Architecture Principles
Talos is designed to be:
- Atomic: Distributed as a single, versioned, signed, immutable image
- Modular: Composed of separate components with defined gRPC interfaces
- Minimal: Focused init system that runs only services necessary for Kubernetes
File System Architecture
Partition Layout
- EFI: Stores EFI boot data
- BIOS: Used for GRUB's second stage boot
- BOOT: Contains boot loader, initramfs, and kernel data
- META: Stores node metadata (node IDs, etc.)
- STATE: Stores machine configuration, node identity, cluster discovery, KubeSpan data
- EPHEMERAL: Stores ephemeral state, mounted at
/var
Root File System Structure
Three-layer design:
- Base Layer: Read-only squashfs mounted as loop device (immutable base)
- Runtime Layer: tmpfs filesystems for runtime needs (
/dev,/proc,/run,/sys,/tmp,/system) - Overlay Layer: overlayfs for persistent data backed by XFS at
/var
Special Directories
/system: Internal files that need to be writable (recreated each boot)- Example:
/system/etc/hostsbind-mounted over/etc/hosts
- Example:
/var: Owned by Kubernetes, contains persistent data:- etcd data (control plane nodes)
- kubelet data
- containerd data
- Survives reboots and upgrades, wiped on reset
Core Components
machined (PID 1)
Role: Talos replacement for traditional init process Functions:
- Machine configuration management
- API handling
- Resource and controller management
- Service lifecycle management
Managed Services:
- containerd
- etcd (control plane nodes)
- kubelet
- networkd
- trustd
- udevd
Architecture: Uses controller-runtime pattern similar to Kubernetes controllers
apid (API Gateway)
Role: gRPC API endpoint for all Talos interactions Functions:
- Routes requests to appropriate components
- Provides proxy capabilities for multi-node operations
- Handles authentication and authorization
Usage Patterns:
# Direct node communication
talosctl -e <node-ip> <command>
# Proxy through endpoint to specific nodes
talosctl -e <endpoint> -n <target-nodes> <command>
# Multi-node operations
talosctl -e <endpoint> -n <node1>,<node2>,<node3> <command>
trustd (Trust Management)
Role: Establishes and maintains trust within the system Functions:
- Root of Trust implementation
- PKI data distribution for control plane bootstrap
- Certificate management
- Secure file placement operations
containerd (Container Runtime)
Role: Industry-standard container runtime Namespaces:
system: Talos servicesk8s.io: Kubernetes services
udevd (Device Management)
Role: Device file manager (eudev implementation) Functions:
- Kernel device notification handling
- Device node management in
/dev - Hardware discovery and setup
Control Plane Architecture
etcd Cluster Design
Critical Concepts:
- Quorum: Majority of members must agree on leader
- Membership: Formal etcd cluster membership required
- Consensus: Uses Raft protocol for distributed consensus
Quorum Requirements:
- 3 nodes: Requires 2 for quorum (tolerates 1 failure)
- 5 nodes: Requires 3 for quorum (tolerates 2 failures)
- Even numbers are worse than odd (4 nodes still only tolerates 1 failure)
Control Plane Components
Running as Static Pods on Control Plane Nodes:
kube-apiserver
- Kubernetes API endpoint
- Connects to local etcd instance
- Handles all API operations
kube-controller-manager
- Runs control loops
- Manages cluster state reconciliation
- Handles node lifecycle, replication, etc.
kube-scheduler
- Pod placement decisions
- Resource-aware scheduling
- Constraint satisfaction
Bootstrap Process
- etcd Bootstrap: One node chosen as bootstrap node, initializes etcd cluster
- Static Pods: Control plane components start as static pods via kubelet
- API Availability: Control plane endpoint becomes available
- Manifest Injection: Bootstrap manifests (join tokens, RBAC, etc.) injected
- Cluster Formation: Other control plane nodes join etcd cluster
- HA Control Plane: All control plane nodes run full component set
Resource System Architecture
Controller-Runtime Pattern
Talos uses Kubernetes-style controller pattern:
- Resources: Typed configuration and state objects
- Controllers: Reconcile desired vs actual state
- Events: Reactive architecture for state changes
Resource Namespaces
config: Machine configuration resourcescluster: Cluster membership and discoverycontrolplane: Control plane component configurationssecrets: Certificate and key managementnetwork: Network configuration and state
Key Resources
# Machine configuration
talosctl get machineconfig
talosctl get machinetype
# Cluster membership
talosctl get members
talosctl get affiliates
talosctl get identities
# Control plane
talosctl get apiserverconfig
talosctl get controllermanagerconfig
talosctl get schedulerconfig
# Network
talosctl get addresses
talosctl get routes
talosctl get nodeaddresses
Network Architecture
Network Stack
- CNI: Container Network Interface for pod networking
- Host Networking: Node-to-node communication
- Service Discovery: Built-in cluster member discovery
- KubeSpan: Optional WireGuard mesh networking
Discovery Service Integration
- Service Registry: External discovery service (default: discovery.talos.dev)
- Kubernetes Registry: Deprecated, uses Kubernetes Node resources
- Encrypted Communication: All discovery data encrypted before transmission
Security Architecture
Immutable Base
- Read-only root filesystem
- Signed and verified boot process
- Atomic updates with rollback capability
Process Isolation
- Minimal attack surface
- No shell access
- No arbitrary user services
- Container-based workload isolation
Network Security
- Mutual TLS (mTLS) for all API communication
- Certificate-based node authentication
- Optional WireGuard mesh networking (KubeSpan)
- Encrypted service discovery
Kernel Hardening
Configured according to Kernel Self Protection Project (KSPP) recommendations:
- Stack protection
- Control flow integrity
- Memory protection features
- Attack surface reduction
Extension Points
Machine Configuration
- Declarative configuration management
- Patch-based configuration updates
- Runtime configuration validation
System Extensions
- Kernel modules
- System services (limited)
- Network configuration
- Storage configuration
Kubernetes Integration
- Automatic kubelet configuration
- Bootstrap manifest management
- Certificate lifecycle management
- Node lifecycle automation
Performance Characteristics
etcd Performance
- Performance decreases with cluster size
- Network latency affects consensus performance
- Storage I/O directly impacts etcd performance
Resource Requirements
- Control Plane Nodes: Higher memory for etcd, CPU for control plane
- Worker Nodes: Resources scale with workload requirements
- Network: Low latency crucial for etcd performance
Scaling Patterns
- Horizontal Scaling: Add worker nodes for capacity
- Vertical Scaling: Increase control plane node resources for performance
- Control Plane Scaling: Odd numbers (3, 5) for availability
This architecture enables Talos to provide a secure, minimal, and operationally simple platform for running Kubernetes clusters while maintaining the reliability and performance characteristics needed for production workloads.