Initial commit.
This commit is contained in:
165
docs/APPS.md
Normal file
165
docs/APPS.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Deploying Applications
|
||||
|
||||
Once you have your personal cloud infrastructure up and running, you'll want to start deploying applications. This guide explains how to deploy and manage applications on your infrastructure.
|
||||
|
||||
## Application Charts
|
||||
|
||||
The `/charts` directory contains curated Helm charts for common applications that are ready to deploy on your personal cloud.
|
||||
|
||||
### Available Charts
|
||||
|
||||
| Chart | Description | Internal/Public |
|
||||
|-------|-------------|----------------|
|
||||
| mariadb | MariaDB database for applications | Internal |
|
||||
| postgres | PostgreSQL database for applications | Internal |
|
||||
|
||||
### Installing Charts
|
||||
|
||||
Use the `bin/helm-install` script to easily deploy charts with the right configuration:
|
||||
|
||||
```bash
|
||||
# Install PostgreSQL
|
||||
./bin/helm-install postgres
|
||||
|
||||
# Install MariaDB
|
||||
./bin/helm-install mariadb
|
||||
```
|
||||
|
||||
The script automatically:
|
||||
- Uses values from your environment variables
|
||||
- Creates the necessary namespace
|
||||
- Configures storage and networking
|
||||
- Sets up appropriate secrets
|
||||
|
||||
### Customizing Chart Values
|
||||
|
||||
Each chart can be customized by:
|
||||
|
||||
1. Editing environment variables in your `.env` file
|
||||
2. Creating a custom values file:
|
||||
|
||||
```bash
|
||||
# Create a custom values file
|
||||
cp charts/postgres/values.yaml my-postgres-values.yaml
|
||||
nano my-postgres-values.yaml
|
||||
|
||||
# Install with custom values
|
||||
./bin/helm-install postgres --values my-postgres-values.yaml
|
||||
```
|
||||
|
||||
### Creating Your Own Charts
|
||||
|
||||
You can add your own applications to the charts directory:
|
||||
|
||||
1. Create a new directory: `mkdir -p charts/my-application`
|
||||
2. Add the necessary templates and values
|
||||
3. Document any required environment variables
|
||||
|
||||
## Deploying Custom Services
|
||||
|
||||
For simpler applications or services without existing charts, use the `deploy-service` script to quickly deploy from templates.
|
||||
|
||||
### Service Types
|
||||
|
||||
The system supports four types of services:
|
||||
|
||||
1. **Public** - Accessible from the internet
|
||||
2. **Internal** - Only accessible within your local network
|
||||
3. **Database** - Internal database services
|
||||
4. **Microservice** - Services that are only accessible by other services
|
||||
|
||||
### Deployment Examples
|
||||
|
||||
```bash
|
||||
# Deploy a public blog using Ghost
|
||||
./bin/deploy-service --type public --name blog --image ghost:4.12 --port 2368
|
||||
|
||||
# Deploy an internal admin dashboard
|
||||
./bin/deploy-service --type internal --name admin --image my-admin:v1 --port 8080
|
||||
|
||||
# Deploy a database service
|
||||
./bin/deploy-service --type database --name postgres --image postgres:15 --port 5432
|
||||
|
||||
# Deploy a microservice
|
||||
./bin/deploy-service --type microservice --name auth --image auth-service:v1 --port 9000
|
||||
```
|
||||
|
||||
### Service Structure
|
||||
|
||||
When you deploy a service, a directory is created at `services/[service-name]/` containing:
|
||||
|
||||
- `service.yaml` - The Kubernetes manifest for your service
|
||||
|
||||
You can modify this file directly and reapply it with `kubectl apply -f services/[service-name]/service.yaml` to update your service.
|
||||
|
||||
## Accessing Services
|
||||
|
||||
Services are automatically configured with proper URLs and TLS certificates.
|
||||
|
||||
### URL Patterns
|
||||
|
||||
- **Public services**: `https://[service-name].[domain]`
|
||||
- **Internal services**: `https://[service-name].internal.[domain]`
|
||||
- **Microservices**: `https://[service-name].svc.[domain]`
|
||||
- **Databases**: `[service-name].[namespace].svc.cluster.local:[port]`
|
||||
|
||||
### Dashboard Access
|
||||
|
||||
Access the Kubernetes Dashboard at `https://dashboard.internal.[domain]`:
|
||||
|
||||
```bash
|
||||
# Get the dashboard token
|
||||
./bin/dashboard-token
|
||||
```
|
||||
|
||||
### Service Management
|
||||
|
||||
Monitor your running services with:
|
||||
|
||||
```bash
|
||||
# List all services
|
||||
kubectl get services -A
|
||||
|
||||
# View detailed information about a service
|
||||
kubectl describe service [service-name] -n [namespace]
|
||||
|
||||
# Check pods for a service
|
||||
kubectl get pods -n [namespace] -l app=[service-name]
|
||||
|
||||
# View logs for a service
|
||||
kubectl logs -n [namespace] -l app=[service-name]
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Scaling Services
|
||||
|
||||
Scale your services by editing the deployment:
|
||||
|
||||
```bash
|
||||
kubectl scale deployment [service-name] --replicas=3 -n [namespace]
|
||||
```
|
||||
|
||||
### Adding Environment Variables
|
||||
|
||||
Add environment variables to your service by editing the service YAML file and adding entries to the `env` section:
|
||||
|
||||
```yaml
|
||||
env:
|
||||
- name: DATABASE_URL
|
||||
value: "postgres://user:password@postgres:5432/db"
|
||||
```
|
||||
|
||||
### Persistent Storage
|
||||
|
||||
For services that need persistent storage, add a PersistentVolumeClaim to your service YAML.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If a service isn't working correctly:
|
||||
|
||||
1. Check pod status: `kubectl get pods -n [namespace]`
|
||||
2. View logs: `kubectl logs [pod-name] -n [namespace]`
|
||||
3. Describe the pod: `kubectl describe pod [pod-name] -n [namespace]`
|
||||
4. Verify the service: `kubectl get svc [service-name] -n [namespace]`
|
||||
5. Check the ingress: `kubectl get ingress [service-name] -n [namespace]`
|
||||
328
docs/MAINTENANCE.md
Normal file
328
docs/MAINTENANCE.md
Normal file
@@ -0,0 +1,328 @@
|
||||
# Maintenance Guide
|
||||
|
||||
This guide covers essential maintenance tasks for your personal cloud infrastructure, including troubleshooting, backups, updates, and security best practices.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### General Troubleshooting Steps
|
||||
|
||||
1. **Check Component Status**:
|
||||
```bash
|
||||
# Check all pods across all namespaces
|
||||
kubectl get pods -A
|
||||
|
||||
# Look for pods that aren't Running or Ready
|
||||
kubectl get pods -A | grep -v "Running\|Completed"
|
||||
```
|
||||
|
||||
2. **View Detailed Pod Information**:
|
||||
```bash
|
||||
# Get detailed info about problematic pods
|
||||
kubectl describe pod <pod-name> -n <namespace>
|
||||
|
||||
# Check pod logs
|
||||
kubectl logs <pod-name> -n <namespace>
|
||||
```
|
||||
|
||||
3. **Run Validation Script**:
|
||||
```bash
|
||||
./infrastructure_setup/validate_setup.sh
|
||||
```
|
||||
|
||||
4. **Check Node Status**:
|
||||
```bash
|
||||
kubectl get nodes
|
||||
kubectl describe node <node-name>
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Certificate Problems
|
||||
|
||||
If services show invalid certificates:
|
||||
|
||||
1. Check certificate status:
|
||||
```bash
|
||||
kubectl get certificates -A
|
||||
```
|
||||
|
||||
2. Examine certificate details:
|
||||
```bash
|
||||
kubectl describe certificate <cert-name> -n <namespace>
|
||||
```
|
||||
|
||||
3. Check for cert-manager issues:
|
||||
```bash
|
||||
kubectl get pods -n cert-manager
|
||||
kubectl logs -l app=cert-manager -n cert-manager
|
||||
```
|
||||
|
||||
4. Verify the Cloudflare API token is correctly set up:
|
||||
```bash
|
||||
kubectl get secret cloudflare-api-token -n internal
|
||||
```
|
||||
|
||||
#### DNS Issues
|
||||
|
||||
If DNS resolution isn't working properly:
|
||||
|
||||
1. Check CoreDNS status:
|
||||
```bash
|
||||
kubectl get pods -n kube-system -l k8s-app=kube-dns
|
||||
kubectl logs -l k8s-app=kube-dns -n kube-system
|
||||
```
|
||||
|
||||
2. Verify CoreDNS configuration:
|
||||
```bash
|
||||
kubectl get configmap -n kube-system coredns -o yaml
|
||||
```
|
||||
|
||||
3. Test DNS resolution from inside the cluster:
|
||||
```bash
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- nslookup kubernetes.default
|
||||
```
|
||||
|
||||
#### Service Connectivity
|
||||
|
||||
If services can't communicate:
|
||||
|
||||
1. Check network policies:
|
||||
```bash
|
||||
kubectl get networkpolicies -A
|
||||
```
|
||||
|
||||
2. Verify service endpoints:
|
||||
```bash
|
||||
kubectl get endpoints -n <namespace>
|
||||
```
|
||||
|
||||
3. Test connectivity from within the cluster:
|
||||
```bash
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- wget -O- <service-name>.<namespace>
|
||||
```
|
||||
|
||||
## Backup and Restore
|
||||
|
||||
### What to Back Up
|
||||
|
||||
1. **Persistent Data**:
|
||||
- Database volumes
|
||||
- Application storage
|
||||
- Configuration files
|
||||
|
||||
2. **Kubernetes Resources**:
|
||||
- Custom Resource Definitions (CRDs)
|
||||
- Deployments, Services, Ingresses
|
||||
- Secrets and ConfigMaps
|
||||
|
||||
### Backup Methods
|
||||
|
||||
#### Simple Backup Script
|
||||
|
||||
Create a backup script at `bin/backup.sh` (to be implemented):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Simple backup script for your personal cloud
|
||||
# This is a placeholder for future implementation
|
||||
|
||||
BACKUP_DIR="/path/to/backups/$(date +%Y-%m-%d)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# Back up Kubernetes resources
|
||||
kubectl get all -A -o yaml > "$BACKUP_DIR/all-resources.yaml"
|
||||
kubectl get secrets -A -o yaml > "$BACKUP_DIR/secrets.yaml"
|
||||
kubectl get configmaps -A -o yaml > "$BACKUP_DIR/configmaps.yaml"
|
||||
|
||||
# Back up persistent volumes
|
||||
# TODO: Add logic to back up persistent volume data
|
||||
|
||||
echo "Backup completed: $BACKUP_DIR"
|
||||
```
|
||||
|
||||
#### Using Velero (Recommended for Future)
|
||||
|
||||
[Velero](https://velero.io/) is a powerful backup solution for Kubernetes:
|
||||
|
||||
```bash
|
||||
# Install Velero (future implementation)
|
||||
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
|
||||
helm install velero vmware-tanzu/velero --namespace velero --create-namespace
|
||||
|
||||
# Create a backup
|
||||
velero backup create my-backup --include-namespaces default,internal
|
||||
|
||||
# Restore from backup
|
||||
velero restore create --from-backup my-backup
|
||||
```
|
||||
|
||||
### Database Backups
|
||||
|
||||
For database services, set up regular dumps:
|
||||
|
||||
```bash
|
||||
# PostgreSQL backup (placeholder)
|
||||
kubectl exec <postgres-pod> -n <namespace> -- pg_dump -U <username> <database> > backup.sql
|
||||
|
||||
# MariaDB/MySQL backup (placeholder)
|
||||
kubectl exec <mariadb-pod> -n <namespace> -- mysqldump -u root -p<password> <database> > backup.sql
|
||||
```
|
||||
|
||||
## Updates
|
||||
|
||||
### Updating Kubernetes (K3s)
|
||||
|
||||
1. Check current version:
|
||||
```bash
|
||||
k3s --version
|
||||
```
|
||||
|
||||
2. Update K3s:
|
||||
```bash
|
||||
curl -sfL https://get.k3s.io | sh -
|
||||
```
|
||||
|
||||
3. Verify the update:
|
||||
```bash
|
||||
k3s --version
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
### Updating Infrastructure Components
|
||||
|
||||
1. Update the repository:
|
||||
```bash
|
||||
git pull
|
||||
```
|
||||
|
||||
2. Re-run the setup script:
|
||||
```bash
|
||||
./infrastructure_setup/setup-all.sh
|
||||
```
|
||||
|
||||
3. Or update specific components:
|
||||
```bash
|
||||
./infrastructure_setup/setup-cert-manager.sh
|
||||
./infrastructure_setup/setup-dashboard.sh
|
||||
# etc.
|
||||
```
|
||||
|
||||
### Updating Applications
|
||||
|
||||
For Helm chart applications:
|
||||
|
||||
```bash
|
||||
# Update Helm repositories
|
||||
helm repo update
|
||||
|
||||
# Upgrade a specific application
|
||||
./bin/helm-install <chart-name> --upgrade
|
||||
```
|
||||
|
||||
For services deployed with `deploy-service`:
|
||||
|
||||
```bash
|
||||
# Edit the service YAML
|
||||
nano services/<service-name>/service.yaml
|
||||
|
||||
# Apply changes
|
||||
kubectl apply -f services/<service-name>/service.yaml
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Keep Everything Updated**:
|
||||
- Regularly update K3s
|
||||
- Update all infrastructure components
|
||||
- Keep application images up to date
|
||||
|
||||
2. **Network Security**:
|
||||
- Use internal services whenever possible
|
||||
- Limit exposed services to only what's necessary
|
||||
- Configure your home router's firewall properly
|
||||
|
||||
3. **Access Control**:
|
||||
- Use strong passwords for all services
|
||||
- Implement a secrets management strategy
|
||||
- Rotate API tokens and keys regularly
|
||||
|
||||
4. **Regular Audits**:
|
||||
- Review running services periodically
|
||||
- Check for unused or outdated deployments
|
||||
- Monitor resource usage for anomalies
|
||||
|
||||
### Security Scanning (Future Implementation)
|
||||
|
||||
Tools to consider implementing:
|
||||
|
||||
1. **Trivy** for image scanning:
|
||||
```bash
|
||||
# Example Trivy usage (placeholder)
|
||||
trivy image <your-image>
|
||||
```
|
||||
|
||||
2. **kube-bench** for Kubernetes security checks:
|
||||
```bash
|
||||
# Example kube-bench usage (placeholder)
|
||||
kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
|
||||
```
|
||||
|
||||
3. **Falco** for runtime security monitoring:
|
||||
```bash
|
||||
# Example Falco installation (placeholder)
|
||||
helm repo add falcosecurity https://falcosecurity.github.io/charts
|
||||
helm install falco falcosecurity/falco --namespace falco --create-namespace
|
||||
```
|
||||
|
||||
## System Health Monitoring
|
||||
|
||||
### Basic Monitoring
|
||||
|
||||
Check system health with:
|
||||
|
||||
```bash
|
||||
# Node resource usage
|
||||
kubectl top nodes
|
||||
|
||||
# Pod resource usage
|
||||
kubectl top pods -A
|
||||
|
||||
# Persistent volume claims
|
||||
kubectl get pvc -A
|
||||
```
|
||||
|
||||
### Advanced Monitoring (Future Implementation)
|
||||
|
||||
Consider implementing:
|
||||
|
||||
1. **Prometheus + Grafana** for comprehensive monitoring:
|
||||
```bash
|
||||
# Placeholder for future implementation
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
|
||||
```
|
||||
|
||||
2. **Loki** for log aggregation:
|
||||
```bash
|
||||
# Placeholder for future implementation
|
||||
helm repo add grafana https://grafana.github.io/helm-charts
|
||||
helm install loki grafana/loki-stack --namespace logging --create-namespace
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
This document will be expanded in the future with:
|
||||
|
||||
- Detailed backup and restore procedures
|
||||
- Monitoring setup instructions
|
||||
- Comprehensive security hardening guide
|
||||
- Automated maintenance scripts
|
||||
|
||||
For now, refer to the following external resources:
|
||||
|
||||
- [K3s Documentation](https://docs.k3s.io/)
|
||||
- [Kubernetes Troubleshooting Guide](https://kubernetes.io/docs/tasks/debug/)
|
||||
- [Velero Backup Documentation](https://velero.io/docs/latest/)
|
||||
- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
|
||||
112
docs/SETUP.md
Normal file
112
docs/SETUP.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# Setting Up Your Personal Cloud
|
||||
|
||||
Welcome to your journey toward digital independence! This guide will walk you through setting up your own personal cloud infrastructure using Kubernetes, providing you with privacy, control, and flexibility.
|
||||
|
||||
## Hardware Recommendations
|
||||
|
||||
For a pleasant experience, we recommend:
|
||||
|
||||
- A dedicated mini PC, NUC, or old laptop with at least:
|
||||
- 4 CPU cores
|
||||
- 8GB RAM (16GB recommended)
|
||||
- 128GB SSD (256GB or more recommended)
|
||||
- A stable internet connection
|
||||
- Optional: additional nodes for high availability
|
||||
|
||||
## Initial Setup
|
||||
|
||||
### 1. Prepare Environment Variables
|
||||
|
||||
First, create your environment configuration:
|
||||
|
||||
```bash
|
||||
# Copy the example file and edit with your details
|
||||
cp .env.example .env
|
||||
nano .env
|
||||
|
||||
# Then load the environment variables
|
||||
source load-env.sh
|
||||
```
|
||||
|
||||
Important variables to set in your `.env` file:
|
||||
- `DOMAIN`: Your domain name (e.g., `cloud.example.com`)
|
||||
- `EMAIL`: Your email for Let's Encrypt certificates
|
||||
- `CLOUDFLARE_API_TOKEN`: If using Cloudflare for DNS
|
||||
|
||||
### 2. Install K3s (Lightweight Kubernetes)
|
||||
|
||||
K3s provides a fully-compliant Kubernetes distribution in a small footprint:
|
||||
|
||||
```bash
|
||||
# Install K3s without the default load balancer (we'll use MetalLB)
|
||||
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode=644 --disable servicelb
|
||||
|
||||
# Set up kubectl configuration
|
||||
mkdir -p ~/.kube
|
||||
sudo cat /etc/rancher/k3s/k3s.yaml > ~/.kube/config
|
||||
chmod 600 ~/.kube/config
|
||||
```
|
||||
|
||||
### 3. Install Infrastructure Components
|
||||
|
||||
One command sets up your entire cloud infrastructure:
|
||||
|
||||
```bash
|
||||
./infrastructure_setup/setup-all.sh
|
||||
```
|
||||
|
||||
This installs and configures:
|
||||
- **MetalLB**: Provides IP addresses for services
|
||||
- **Traefik**: Handles ingress (routing) with automatic HTTPS
|
||||
- **cert-manager**: Manages TLS certificates automatically
|
||||
- **CoreDNS**: Provides internal DNS resolution
|
||||
- **ExternalDNS**: Updates DNS records automatically
|
||||
- **Kubernetes Dashboard**: Web UI for managing your cluster
|
||||
|
||||
## Adding Additional Nodes (Optional)
|
||||
|
||||
For larger workloads or high availability, you can add more nodes:
|
||||
|
||||
```bash
|
||||
# On your master node, get the node token
|
||||
sudo cat /var/lib/rancher/k3s/server/node-token
|
||||
|
||||
# On each new node, join the cluster
|
||||
curl -sfL https://get.k3s.io | K3S_URL=https://MASTER_IP:6443 K3S_TOKEN=NODE_TOKEN sh -
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
Now that your infrastructure is set up, you can:
|
||||
|
||||
1. **Deploy Applications**: See [Applications Guide](./APPS.md) for deploying services and applications
|
||||
2. **Access Dashboard**: Visit `https://dashboard.internal.yourdomain.com` and use the token from `./bin/dashboard-token`
|
||||
3. **Validate Setup**: Run `./infrastructure_setup/validate_setup.sh` to ensure everything is working
|
||||
|
||||
## Validation and Troubleshooting
|
||||
|
||||
Run the validation script to ensure everything is working correctly:
|
||||
|
||||
```bash
|
||||
./infrastructure_setup/validate_setup.sh
|
||||
```
|
||||
|
||||
This script checks:
|
||||
- All infrastructure components
|
||||
- DNS resolution
|
||||
- Service connectivity
|
||||
- Certificate issuance
|
||||
- Network configuration
|
||||
|
||||
If issues are found, the script provides specific remediation steps.
|
||||
|
||||
## What's Next?
|
||||
|
||||
Now that your personal cloud is running, consider:
|
||||
|
||||
- Setting up backups with [Velero](https://velero.io/)
|
||||
- Adding monitoring with Prometheus and Grafana
|
||||
- Deploying applications like Nextcloud, Home Assistant, or Gitea
|
||||
- Exploring the Kubernetes Dashboard to monitor your services
|
||||
|
||||
Welcome to your personal cloud journey! You now have the foundation for hosting your own services and taking control of your digital life.
|
||||
331
docs/learning/visibility.md
Normal file
331
docs/learning/visibility.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Understanding Network Visibility in Kubernetes
|
||||
|
||||
This guide explains how applications deployed on our Kubernetes cluster become accessible from both internal and external networks. Whether you're deploying a public-facing website or an internal admin panel, this document will help you understand the journey from deployment to accessibility.
|
||||
|
||||
## The Visibility Pipeline
|
||||
|
||||
When you deploy an application to the cluster, making it accessible involves several coordinated components working together:
|
||||
|
||||
1. **Kubernetes Services** - Direct traffic to your application pods
|
||||
2. **Ingress Controllers** - Route external HTTP/HTTPS traffic to services
|
||||
3. **Load Balancers** - Assign external IPs to services
|
||||
4. **DNS Management** - Map domain names to IPs
|
||||
5. **TLS Certificates** - Secure connections with HTTPS
|
||||
|
||||
Let's walk through how each part works and how they interconnect.
|
||||
|
||||
## From Deployment to Visibility
|
||||
|
||||
### 1. Application Deployment
|
||||
|
||||
Your journey begins with deploying your application on Kubernetes. This typically involves:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-namespace
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: my-app
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: my-app
|
||||
spec:
|
||||
containers:
|
||||
- name: my-app
|
||||
image: myapp:latest
|
||||
ports:
|
||||
- containerPort: 80
|
||||
```
|
||||
|
||||
This creates pods running your application, but they're not yet accessible outside their namespace.
|
||||
|
||||
### 2. Kubernetes Service: Internal Connectivity
|
||||
|
||||
A Kubernetes Service provides a stable endpoint to access your pods:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-namespace
|
||||
spec:
|
||||
selector:
|
||||
app: my-app
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 80
|
||||
type: ClusterIP
|
||||
```
|
||||
|
||||
With this `ClusterIP` service, your application is accessible within the cluster at `my-app.my-namespace.svc.cluster.local`, but not from outside.
|
||||
|
||||
### 3. Ingress: Defining HTTP Routes
|
||||
|
||||
For HTTP/HTTPS traffic, an Ingress resource defines routing rules:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: my-app
|
||||
namespace: my-namespace
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: "traefik"
|
||||
external-dns.alpha.kubernetes.io/target: "CLOUD_DOMAIN"
|
||||
external-dns.alpha.kubernetes.io/ttl: "60"
|
||||
spec:
|
||||
rules:
|
||||
- host: my-app.CLOUD_DOMAIN
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: my-app
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts:
|
||||
- my-app.CLOUD_DOMAIN
|
||||
secretName: wildcard-sovereign-cloud-tls
|
||||
```
|
||||
|
||||
This Ingress tells the cluster to route requests for `my-app.CLOUD_DOMAIN` to your service. The annotations provide hints to other systems like ExternalDNS.
|
||||
|
||||
### 4. Traefik: The Ingress Controller
|
||||
|
||||
Our cluster uses Traefik as the ingress controller. Traefik watches for Ingress resources and configures itself to handle the routing rules. It acts as a reverse proxy and edge router, handling:
|
||||
|
||||
- HTTP/HTTPS routing
|
||||
- TLS termination
|
||||
- Load balancing
|
||||
- Path-based routing
|
||||
- Host-based routing
|
||||
|
||||
Traefik runs as a service in the cluster with its own external IP (provided by MetalLB).
|
||||
|
||||
### 5. MetalLB: Assigning External IPs
|
||||
|
||||
Since we're running on-premises (not in a cloud that provides load balancers), we use MetalLB to assign external IPs to services. MetalLB manages a pool of IP addresses from our local network:
|
||||
|
||||
```yaml
|
||||
apiVersion: metallb.io/v1beta1
|
||||
kind: IPAddressPool
|
||||
metadata:
|
||||
name: default
|
||||
namespace: metallb-system
|
||||
spec:
|
||||
addresses:
|
||||
- 192.168.8.240-192.168.8.250
|
||||
```
|
||||
|
||||
This allows Traefik and any other LoadBalancer services to receive a real IP address from our network.
|
||||
|
||||
### 6. ExternalDNS: Automated DNS Management
|
||||
|
||||
ExternalDNS automatically creates and updates DNS records in our CloudFlare DNS zone:
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: external-dns
|
||||
namespace: externaldns
|
||||
spec:
|
||||
# ...
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: external-dns
|
||||
image: registry.k8s.io/external-dns/external-dns
|
||||
args:
|
||||
- --source=service
|
||||
- --source=ingress
|
||||
- --provider=cloudflare
|
||||
- --txt-owner-id=sovereign-cloud
|
||||
```
|
||||
|
||||
ExternalDNS watches Kubernetes Services and Ingresses with appropriate annotations, then creates corresponding DNS records in CloudFlare, making your applications discoverable by domain name.
|
||||
|
||||
### 7. Cert-Manager: TLS Certificate Automation
|
||||
|
||||
To secure connections with HTTPS, we use cert-manager to automatically obtain and renew TLS certificates:
|
||||
|
||||
```yaml
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: wildcard-sovereign-cloud-io
|
||||
namespace: default
|
||||
spec:
|
||||
secretName: wildcard-sovereign-cloud-tls
|
||||
dnsNames:
|
||||
- "*.CLOUD_DOMAIN"
|
||||
- "CLOUD_DOMAIN"
|
||||
issuerRef:
|
||||
name: letsencrypt-prod
|
||||
kind: ClusterIssuer
|
||||
```
|
||||
|
||||
Cert-manager handles:
|
||||
|
||||
- Certificate request and issuance
|
||||
- DNS validation (for wildcard certificates)
|
||||
- Automatic renewal
|
||||
- Secret storage of certificates
|
||||
|
||||
## The Two Visibility Paths
|
||||
|
||||
In our infrastructure, we support two primary visibility paths:
|
||||
|
||||
### Public Services (External Access)
|
||||
|
||||
Public services are those meant to be accessible from the public internet:
|
||||
|
||||
1. **Service**: Kubernetes ClusterIP service (internal)
|
||||
2. **Ingress**: Defines routing with hostname like `service-name.CLOUD_DOMAIN`
|
||||
3. **DNS**: ExternalDNS creates a CNAME record pointing to `CLOUD_DOMAIN`
|
||||
4. **TLS**: Uses wildcard certificate for `*.CLOUD_DOMAIN`
|
||||
5. **IP Addressing**: Traffic reaches the MetalLB-assigned IP for Traefik
|
||||
6. **Network**: Traffic flows from external internet → router → MetalLB IP → Traefik → Kubernetes Service → Application Pods
|
||||
|
||||
**Deploy a public service with:**
|
||||
|
||||
```bash
|
||||
./bin/deploy-service --type public --name myservice
|
||||
```
|
||||
|
||||
### Internal Services (Private Access)
|
||||
|
||||
Internal services are restricted to the internal network:
|
||||
|
||||
1. **Service**: Kubernetes ClusterIP service (internal)
|
||||
2. **Ingress**: Defines routing with hostname like `service-name.internal.CLOUD_DOMAIN`
|
||||
3. **DNS**: ExternalDNS creates an A record pointing to the internal load balancer IP
|
||||
4. **TLS**: Uses wildcard certificate for `*.internal.CLOUD_DOMAIN`
|
||||
5. **IP Addressing**: Traffic reaches the MetalLB-assigned IP for Traefik
|
||||
6. **Network**: Traffic flows from internal network → MetalLB IP → Traefik → Service → Pods
|
||||
7. **Security**: Traefik middleware restricts access to internal network IPs
|
||||
|
||||
**Deploy an internal service with:**
|
||||
|
||||
```bash
|
||||
./bin/deploy-service --type internal --name adminpanel
|
||||
```
|
||||
|
||||
## How It All Works Together
|
||||
|
||||
1. **You deploy** an application using our deploy-service script
|
||||
2. **Kubernetes** schedules and runs your application pods
|
||||
3. **Services** provide a stable endpoint for your pods
|
||||
4. **Traefik** configures routing based on Ingress definitions
|
||||
5. **MetalLB** assigns real network IPs to LoadBalancer services
|
||||
6. **ExternalDNS** creates DNS records for your services
|
||||
7. **Cert-Manager** ensures valid TLS certificates for HTTPS
|
||||
|
||||
### Network Flow Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
subgraph Internet["Internet"]
|
||||
User("User Browser")
|
||||
CloudDNS("CloudFlare DNS")
|
||||
end
|
||||
subgraph Cluster["Cluster"]
|
||||
Router("Router")
|
||||
MetalLB("MetalLB")
|
||||
Traefik("Traefik Ingress")
|
||||
IngSvc("Service")
|
||||
IngPods("Application Pods")
|
||||
Ingress("Ingress")
|
||||
CertManager("cert-manager")
|
||||
WildcardCert("Wildcard Certificate")
|
||||
ExtDNS("ExternalDNS")
|
||||
end
|
||||
User -- "1\. DNS Query" --> CloudDNS
|
||||
CloudDNS -- "2\. IP Address" --> User
|
||||
User -- "3\. HTTPS Request" --> Router
|
||||
Router -- "4\. Forward" --> MetalLB
|
||||
MetalLB -- "5\. Route" --> Traefik
|
||||
Traefik -- "6\. Route" --> Ingress
|
||||
Ingress -- "7\. Forward" --> IngSvc
|
||||
IngSvc -- "8\. Balance" --> IngPods
|
||||
ExtDNS -- "A. Update DNS" --> CloudDNS
|
||||
Ingress -- "B. Configure" --> ExtDNS
|
||||
CertManager -- "C. Issue Cert" --> WildcardCert
|
||||
Ingress -- "D. Use" --> WildcardCert
|
||||
|
||||
User:::internet
|
||||
CloudDNS:::internet
|
||||
Router:::cluster
|
||||
MetalLB:::cluster
|
||||
Traefik:::cluster
|
||||
IngSvc:::cluster
|
||||
IngPods:::cluster
|
||||
Ingress:::cluster
|
||||
CertManager:::cluster
|
||||
WildcardCert:::cluster
|
||||
ExtDNS:::cluster
|
||||
classDef internet fill:#fcfcfc,stroke:#333
|
||||
classDef cluster fill:#a6f3ff,stroke:#333
|
||||
style User fill:#C8E6C9
|
||||
style CloudDNS fill:#C8E6C9
|
||||
style Router fill:#C8E6C9
|
||||
style MetalLB fill:#C8E6C9
|
||||
style Traefik fill:#C8E6C9
|
||||
style IngSvc fill:#C8E6C9
|
||||
style IngPods fill:#C8E6C9
|
||||
style Ingress fill:#C8E6C9
|
||||
style CertManager fill:#C8E6C9
|
||||
style WildcardCert fill:#C8E6C9
|
||||
style ExtDNS fill:#C8E6C9
|
||||
```
|
||||
|
||||
A successful deployment creates a chain of connections:
|
||||
|
||||
```
|
||||
Internet → DNS (domain name) → External IP → Traefik → Kubernetes Service → Application Pod
|
||||
```
|
||||
|
||||
## Behind the Scenes: The Technical Magic
|
||||
|
||||
When you use our `deploy-service` script, several things happen:
|
||||
|
||||
1. **Template Processing**: The script processes a YAML template for your service type, using environment variables to customize it
|
||||
2. **Namespace Management**: Creates or uses your service's namespace
|
||||
3. **Resource Application**: Applies the generated YAML to create/update all Kubernetes resources
|
||||
4. **DNS Configuration**: ExternalDNS detects the new resources and creates DNS records
|
||||
5. **Certificate Management**: Cert-manager ensures TLS certificates exist or creates new ones
|
||||
6. **Secret Distribution**: For internal services, certificates are copied to the appropriate namespaces
|
||||
|
||||
## Troubleshooting Visibility Issues
|
||||
|
||||
When services aren't accessible, the issue usually lies in one of these areas:
|
||||
|
||||
1. **DNS Resolution**: Domain not resolving to the correct IP
|
||||
2. **Certificate Problems**: Invalid, expired, or missing TLS certificates
|
||||
3. **Ingress Configuration**: Incorrect routing rules or annotations
|
||||
4. **Network Issues**: Firewall rules or internal/external network segregation
|
||||
|
||||
Our [Visibility Troubleshooting Guide](/docs/troubleshooting/VISIBILITY.md) provides detailed steps for diagnosing these issues.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The visibility layer in our infrastructure represents a sophisticated interplay of multiple systems working together. While complex under the hood, it provides a streamlined experience for developers to deploy applications with proper networking, DNS, and security.
|
||||
|
||||
By understanding these components and their relationships, you'll be better equipped to deploy applications and diagnose any visibility issues that arise.
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [Traefik Documentation](https://doc.traefik.io/traefik/)
|
||||
- [ExternalDNS Project](https://github.com/kubernetes-sigs/external-dns)
|
||||
- [Cert-Manager Documentation](https://cert-manager.io/docs/)
|
||||
- [MetalLB Project](https://metallb.universe.tf/)
|
||||
246
docs/troubleshooting/VISIBILITY.md
Normal file
246
docs/troubleshooting/VISIBILITY.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# Troubleshooting Service Visibility
|
||||
|
||||
This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them.
|
||||
|
||||
## Common Issues
|
||||
|
||||
External access to your services might fail for several reasons:
|
||||
|
||||
1. **DNS Resolution Issues** - Domain names not resolving to the correct IP address
|
||||
2. **Network Connectivity Issues** - Traffic can't reach the cluster's external IP
|
||||
3. **TLS Certificate Issues** - Invalid or missing certificates
|
||||
4. **Ingress/Service Configuration Issues** - Incorrectly configured routing
|
||||
|
||||
## Diagnostic Steps
|
||||
|
||||
### 1. Check DNS Resolution
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Browser shows "site cannot be reached" or "server IP address could not be found"
|
||||
- `ping` or `nslookup` commands fail for your domain
|
||||
- Your service DNS records don't appear in CloudFlare or your DNS provider
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check if your domain resolves (from outside the cluster)
|
||||
nslookup yourservice.yourdomain.com
|
||||
|
||||
# Check if ExternalDNS is running
|
||||
kubectl get pods -n externaldns
|
||||
|
||||
# Check ExternalDNS logs for errors
|
||||
kubectl logs -n externaldns -l app=external-dns < /dev/null | grep -i error
|
||||
kubectl logs -n externaldns -l app=external-dns | grep -i "your-service-name"
|
||||
|
||||
# Check if CloudFlare API token is configured correctly
|
||||
kubectl get secret cloudflare-api-token -n externaldns
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **ExternalDNS Not Running**: The ExternalDNS pod is not running or has errors.
|
||||
|
||||
b) **Cloudflare API Token Issues**: The API token is invalid, expired, or doesn't have the right permissions.
|
||||
|
||||
c) **Domain Filter Mismatch**: ExternalDNS is configured with a `--domain-filter` that doesn't match your domain.
|
||||
|
||||
d) **Annotations Missing**: Service or Ingress is missing the required ExternalDNS annotations.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Recreate CloudFlare API token secret
|
||||
kubectl create secret generic cloudflare-api-token \
|
||||
--namespace externaldns \
|
||||
--from-literal=api-token="your-api-token" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 2. Check and set proper annotations on your Ingress:
|
||||
kubectl annotate ingress your-ingress -n your-namespace \
|
||||
external-dns.alpha.kubernetes.io/hostname=your-service.your-domain.com
|
||||
|
||||
# 3. Restart ExternalDNS
|
||||
kubectl rollout restart deployment -n externaldns external-dns
|
||||
```
|
||||
|
||||
### 2. Check Network Connectivity
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- DNS resolves to the correct IP but the service is still unreachable
|
||||
- Only some services are unreachable while others work
|
||||
- Network timeout errors
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check if MetalLB is running
|
||||
kubectl get pods -n metallb-system
|
||||
|
||||
# Check MetalLB IP address pool
|
||||
kubectl get ipaddresspools.metallb.io -n metallb-system
|
||||
|
||||
# Verify the service has an external IP
|
||||
kubectl get svc -n your-namespace your-service
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **MetalLB Configuration**: The IP pool doesn't match your network or is exhausted.
|
||||
|
||||
b) **Firewall Issues**: Firewall is blocking traffic to your cluster's external IP.
|
||||
|
||||
c) **Router Configuration**: NAT or port forwarding issues if using a router.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Check and update MetalLB configuration
|
||||
kubectl apply -f infrastructure_setup/metallb/metallb-pool.yaml
|
||||
|
||||
# 2. Check service external IP assignment
|
||||
kubectl describe svc -n your-namespace your-service
|
||||
```
|
||||
|
||||
### 3. Check TLS Certificates
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Browser shows certificate errors
|
||||
- "Your connection is not private" warnings
|
||||
- Cert-manager logs show errors
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check certificate status
|
||||
kubectl get certificates -A
|
||||
|
||||
# Check cert-manager logs
|
||||
kubectl logs -n cert-manager -l app=cert-manager
|
||||
|
||||
# Check if your ingress is using the correct certificate
|
||||
kubectl get ingress -n your-namespace your-ingress -o yaml
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **Certificate Issuance Failures**: DNS validation or HTTP validation failing.
|
||||
|
||||
b) **Wrong Secret Referenced**: Ingress is referencing a non-existent certificate secret.
|
||||
|
||||
c) **Expired Certificate**: Certificate has expired and wasn't renewed.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Check and recreate certificates
|
||||
kubectl apply -f infrastructure_setup/cert-manager/wildcard-certificate.yaml
|
||||
|
||||
# 2. Update ingress to use correct secret
|
||||
kubectl patch ingress your-ingress -n your-namespace --type=json \
|
||||
-p='[{"op": "replace", "path": "/spec/tls/0/secretName", "value": "correct-secret-name"}]'
|
||||
```
|
||||
|
||||
### 4. Check Ingress Configuration
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- HTTP 404, 503, or other error codes
|
||||
- Service accessible from inside cluster but not outside
|
||||
- Traffic routed to wrong service
|
||||
|
||||
**Checks:**
|
||||
|
||||
```bash
|
||||
# Check ingress status
|
||||
kubectl get ingress -n your-namespace
|
||||
|
||||
# Check Traefik logs
|
||||
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
|
||||
|
||||
# Check ingress configuration
|
||||
kubectl describe ingress -n your-namespace your-ingress
|
||||
```
|
||||
|
||||
**Common Issues:**
|
||||
|
||||
a) **Incorrect Service Targeting**: Ingress is pointing to wrong service or port.
|
||||
|
||||
b) **Traefik Configuration**: IngressClass or middleware issues.
|
||||
|
||||
c) **Path Configuration**: Incorrect path prefixes or regex.
|
||||
|
||||
**Solutions:**
|
||||
|
||||
```bash
|
||||
# 1. Verify ingress configuration
|
||||
kubectl edit ingress -n your-namespace your-ingress
|
||||
|
||||
# 2. Check that the referenced service exists
|
||||
kubectl get svc -n your-namespace
|
||||
|
||||
# 3. Restart Traefik if needed
|
||||
kubectl rollout restart deployment -n kube-system traefik
|
||||
```
|
||||
|
||||
## Advanced Diagnostics
|
||||
|
||||
For more complex issues, you can use port-forwarding to test services directly:
|
||||
|
||||
```bash
|
||||
# Port-forward the service directly
|
||||
kubectl port-forward -n your-namespace svc/your-service 8080:80
|
||||
|
||||
# Then test locally
|
||||
curl http://localhost:8080
|
||||
```
|
||||
|
||||
You can also deploy a debug pod to test connectivity from inside the cluster:
|
||||
|
||||
```bash
|
||||
# Start a debug pod
|
||||
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
|
||||
|
||||
# Inside the pod, test DNS and connectivity
|
||||
nslookup your-service.your-namespace.svc.cluster.local
|
||||
wget -O- http://your-service.your-namespace.svc.cluster.local
|
||||
```
|
||||
|
||||
## ExternalDNS Specifics
|
||||
|
||||
ExternalDNS can be particularly troublesome. Here are specific debugging steps:
|
||||
|
||||
1. **Check Log Level**: Set `--log-level=debug` for more detailed logs
|
||||
2. **Check Domain Filter**: Ensure `--domain-filter` includes your domain
|
||||
3. **Check Provider**: Ensure `--provider=cloudflare` (or your DNS provider)
|
||||
4. **Verify API Permissions**: CloudFlare token needs Zone.Zone and Zone.DNS permissions
|
||||
5. **Check TXT Records**: ExternalDNS uses TXT records for ownership tracking
|
||||
|
||||
```bash
|
||||
# Restart with verbose logging
|
||||
kubectl set env deployment/external-dns -n externaldns -- --log-level=debug
|
||||
|
||||
# Check for specific domain errors
|
||||
kubectl logs -n externaldns -l app=external-dns | grep -i yourservice.yourdomain.com
|
||||
```
|
||||
|
||||
## CloudFlare Specific Issues
|
||||
|
||||
When using CloudFlare, additional issues may arise:
|
||||
|
||||
1. **API Rate Limiting**: CloudFlare may rate limit frequent API calls
|
||||
2. **DNS Propagation**: Changes may take time to propagate through CloudFlare's CDN
|
||||
3. **Proxied Records**: The `external-dns.alpha.kubernetes.io/cloudflare-proxied` annotation controls whether CloudFlare proxies traffic
|
||||
4. **Access Restrictions**: CloudFlare Access or Page Rules may restrict access
|
||||
5. **API Token Permissions**: The token must have Zone:Zone:Read and Zone:DNS:Edit permissions
|
||||
6. **Zone Detection**: If using subdomains, ensure the parent domain is included in the domain filter
|
||||
|
||||
Check CloudFlare dashboard for:
|
||||
|
||||
- DNS record existence
|
||||
- API access logs
|
||||
- DNS settings including proxy status
|
||||
- Any error messages or rate limit warnings
|
||||
Reference in New Issue
Block a user