CSTF/wild-cloud

Fork 0

Files

Paul Payne 1aa9f1050d Update docs.

2025-08-31 14:30:09 -07:00

7.2 KiB

Raw Blame History

Troubleshoot Service Visibility

This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them.

Common Issues

External access to your services might fail for several reasons:

DNS Resolution Issues - Domain names not resolving to the correct IP address
Network Connectivity Issues - Traffic can't reach the cluster's external IP
TLS Certificate Issues - Invalid or missing certificates
Ingress/Service Configuration Issues - Incorrectly configured routing

Diagnostic Steps

1. Check DNS Resolution

Symptoms:

Browser shows "site cannot be reached" or "server IP address could not be found"
ping or nslookup commands fail for your domain
Your service DNS records don't appear in CloudFlare or your DNS provider

Checks:

# Check if your domain resolves (from outside the cluster)
nslookup yourservice.yourdomain.com

# Check if ExternalDNS is running
kubectl get pods -n externaldns

# Check ExternalDNS logs for errors
kubectl logs -n externaldns -l app=external-dns  < /dev/null |  grep -i error
kubectl logs -n externaldns -l app=external-dns | grep -i "your-service-name"

# Check if CloudFlare API token is configured correctly
kubectl get secret cloudflare-api-token -n externaldns

Common Issues:

a) ExternalDNS Not Running: The ExternalDNS pod is not running or has errors.

b) Cloudflare API Token Issues: The API token is invalid, expired, or doesn't have the right permissions.

c) Domain Filter Mismatch: ExternalDNS is configured with a --domain-filter that doesn't match your domain.

d) Annotations Missing: Service or Ingress is missing the required ExternalDNS annotations.

Solutions:

# 1. Recreate CloudFlare API token secret
kubectl create secret generic cloudflare-api-token \
  --namespace externaldns \
  --from-literal=api-token="your-api-token" \
  --dry-run=client -o yaml | kubectl apply -f -

# 2. Check and set proper annotations on your Ingress:
kubectl annotate ingress your-ingress -n your-namespace \
  external-dns.alpha.kubernetes.io/hostname=your-service.your-domain.com

# 3. Restart ExternalDNS
kubectl rollout restart deployment -n externaldns external-dns

2. Check Network Connectivity

Symptoms:

DNS resolves to the correct IP but the service is still unreachable
Only some services are unreachable while others work
Network timeout errors

Checks:

# Check if MetalLB is running
kubectl get pods -n metallb-system

# Check MetalLB IP address pool
kubectl get ipaddresspools.metallb.io -n metallb-system

# Verify the service has an external IP
kubectl get svc -n your-namespace your-service

Common Issues:

a) MetalLB Configuration: The IP pool doesn't match your network or is exhausted.

b) Firewall Issues: Firewall is blocking traffic to your cluster's external IP.

c) Router Configuration: NAT or port forwarding issues if using a router.

Solutions:

# 1. Check and update MetalLB configuration
kubectl apply -f infrastructure_setup/metallb/metallb-pool.yaml

# 2. Check service external IP assignment
kubectl describe svc -n your-namespace your-service

3. Check TLS Certificates

Symptoms:

Browser shows certificate errors
"Your connection is not private" warnings
Cert-manager logs show errors

Checks:

# Check certificate status
kubectl get certificates -A

# Check cert-manager logs
kubectl logs -n cert-manager -l app=cert-manager

# Check if your ingress is using the correct certificate
kubectl get ingress -n your-namespace your-ingress -o yaml

Common Issues:

a) Certificate Issuance Failures: DNS validation or HTTP validation failing.

b) Wrong Secret Referenced: Ingress is referencing a non-existent certificate secret.

c) Expired Certificate: Certificate has expired and wasn't renewed.

Solutions:

# 1. Check and recreate certificates
kubectl apply -f infrastructure_setup/cert-manager/wildcard-certificate.yaml

# 2. Update ingress to use correct secret
kubectl patch ingress your-ingress -n your-namespace --type=json \
  -p='[{"op": "replace", "path": "/spec/tls/0/secretName", "value": "correct-secret-name"}]'

4. Check Ingress Configuration

Symptoms:

HTTP 404, 503, or other error codes
Service accessible from inside cluster but not outside
Traffic routed to wrong service

Checks:

# Check ingress status
kubectl get ingress -n your-namespace

# Check Traefik logs
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik

# Check ingress configuration
kubectl describe ingress -n your-namespace your-ingress

Common Issues:

a) Incorrect Service Targeting: Ingress is pointing to wrong service or port.

b) Traefik Configuration: IngressClass or middleware issues.

c) Path Configuration: Incorrect path prefixes or regex.

Solutions:

# 1. Verify ingress configuration
kubectl edit ingress -n your-namespace your-ingress

# 2. Check that the referenced service exists
kubectl get svc -n your-namespace

# 3. Restart Traefik if needed
kubectl rollout restart deployment -n kube-system traefik

Advanced Diagnostics

For more complex issues, you can use port-forwarding to test services directly:

# Port-forward the service directly
kubectl port-forward -n your-namespace svc/your-service 8080:80

# Then test locally
curl http://localhost:8080

You can also deploy a debug pod to test connectivity from inside the cluster:

# Start a debug pod
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh

# Inside the pod, test DNS and connectivity
nslookup your-service.your-namespace.svc.cluster.local
wget -O- http://your-service.your-namespace.svc.cluster.local

ExternalDNS Specifics

ExternalDNS can be particularly troublesome. Here are specific debugging steps:

Check Log Level: Set --log-level=debug for more detailed logs
Check Domain Filter: Ensure --domain-filter includes your domain
Check Provider: Ensure --provider=cloudflare (or your DNS provider)
Verify API Permissions: CloudFlare token needs Zone.Zone and Zone.DNS permissions
Check TXT Records: ExternalDNS uses TXT records for ownership tracking

# Restart with verbose logging
kubectl set env deployment/external-dns -n externaldns -- --log-level=debug

# Check for specific domain errors
kubectl logs -n externaldns -l app=external-dns | grep -i yourservice.yourdomain.com

CloudFlare Specific Issues

When using CloudFlare, additional issues may arise:

API Rate Limiting: CloudFlare may rate limit frequent API calls
DNS Propagation: Changes may take time to propagate through CloudFlare's CDN
Proxied Records: The external-dns.alpha.kubernetes.io/cloudflare-proxied annotation controls whether CloudFlare proxies traffic
Access Restrictions: CloudFlare Access or Page Rules may restrict access
API Token Permissions: The token must have Zone:Zone:Read and Zone:DNS:Edit permissions
Zone Detection: If using subdomains, ensure the parent domain is included in the domain filter

Check CloudFlare dashboard for:

DNS record existence
API access logs
DNS settings including proxy status
Any error messages or rate limit warnings

7.2 KiB Raw Blame History

Troubleshoot Service Visibility

Common Issues

Diagnostic Steps

1. Check DNS Resolution

2. Check Network Connectivity

3. Check TLS Certificates

4. Check Ingress Configuration

Advanced Diagnostics

ExternalDNS Specifics

CloudFlare Specific Issues

7.2 KiB

Raw Blame History