7.2 KiB
Troubleshoot Service Visibility
This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them.
Common Issues
External access to your services might fail for several reasons:
- DNS Resolution Issues - Domain names not resolving to the correct IP address
- Network Connectivity Issues - Traffic can't reach the cluster's external IP
- TLS Certificate Issues - Invalid or missing certificates
- Ingress/Service Configuration Issues - Incorrectly configured routing
Diagnostic Steps
1. Check DNS Resolution
Symptoms:
- Browser shows "site cannot be reached" or "server IP address could not be found"
ping
ornslookup
commands fail for your domain- Your service DNS records don't appear in CloudFlare or your DNS provider
Checks:
# Check if your domain resolves (from outside the cluster)
nslookup yourservice.yourdomain.com
# Check if ExternalDNS is running
kubectl get pods -n externaldns
# Check ExternalDNS logs for errors
kubectl logs -n externaldns -l app=external-dns < /dev/null | grep -i error
kubectl logs -n externaldns -l app=external-dns | grep -i "your-service-name"
# Check if CloudFlare API token is configured correctly
kubectl get secret cloudflare-api-token -n externaldns
Common Issues:
a) ExternalDNS Not Running: The ExternalDNS pod is not running or has errors.
b) Cloudflare API Token Issues: The API token is invalid, expired, or doesn't have the right permissions.
c) Domain Filter Mismatch: ExternalDNS is configured with a --domain-filter
that doesn't match your domain.
d) Annotations Missing: Service or Ingress is missing the required ExternalDNS annotations.
Solutions:
# 1. Recreate CloudFlare API token secret
kubectl create secret generic cloudflare-api-token \
--namespace externaldns \
--from-literal=api-token="your-api-token" \
--dry-run=client -o yaml | kubectl apply -f -
# 2. Check and set proper annotations on your Ingress:
kubectl annotate ingress your-ingress -n your-namespace \
external-dns.alpha.kubernetes.io/hostname=your-service.your-domain.com
# 3. Restart ExternalDNS
kubectl rollout restart deployment -n externaldns external-dns
2. Check Network Connectivity
Symptoms:
- DNS resolves to the correct IP but the service is still unreachable
- Only some services are unreachable while others work
- Network timeout errors
Checks:
# Check if MetalLB is running
kubectl get pods -n metallb-system
# Check MetalLB IP address pool
kubectl get ipaddresspools.metallb.io -n metallb-system
# Verify the service has an external IP
kubectl get svc -n your-namespace your-service
Common Issues:
a) MetalLB Configuration: The IP pool doesn't match your network or is exhausted.
b) Firewall Issues: Firewall is blocking traffic to your cluster's external IP.
c) Router Configuration: NAT or port forwarding issues if using a router.
Solutions:
# 1. Check and update MetalLB configuration
kubectl apply -f infrastructure_setup/metallb/metallb-pool.yaml
# 2. Check service external IP assignment
kubectl describe svc -n your-namespace your-service
3. Check TLS Certificates
Symptoms:
- Browser shows certificate errors
- "Your connection is not private" warnings
- Cert-manager logs show errors
Checks:
# Check certificate status
kubectl get certificates -A
# Check cert-manager logs
kubectl logs -n cert-manager -l app=cert-manager
# Check if your ingress is using the correct certificate
kubectl get ingress -n your-namespace your-ingress -o yaml
Common Issues:
a) Certificate Issuance Failures: DNS validation or HTTP validation failing.
b) Wrong Secret Referenced: Ingress is referencing a non-existent certificate secret.
c) Expired Certificate: Certificate has expired and wasn't renewed.
Solutions:
# 1. Check and recreate certificates
kubectl apply -f infrastructure_setup/cert-manager/wildcard-certificate.yaml
# 2. Update ingress to use correct secret
kubectl patch ingress your-ingress -n your-namespace --type=json \
-p='[{"op": "replace", "path": "/spec/tls/0/secretName", "value": "correct-secret-name"}]'
4. Check Ingress Configuration
Symptoms:
- HTTP 404, 503, or other error codes
- Service accessible from inside cluster but not outside
- Traffic routed to wrong service
Checks:
# Check ingress status
kubectl get ingress -n your-namespace
# Check Traefik logs
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
# Check ingress configuration
kubectl describe ingress -n your-namespace your-ingress
Common Issues:
a) Incorrect Service Targeting: Ingress is pointing to wrong service or port.
b) Traefik Configuration: IngressClass or middleware issues.
c) Path Configuration: Incorrect path prefixes or regex.
Solutions:
# 1. Verify ingress configuration
kubectl edit ingress -n your-namespace your-ingress
# 2. Check that the referenced service exists
kubectl get svc -n your-namespace
# 3. Restart Traefik if needed
kubectl rollout restart deployment -n kube-system traefik
Advanced Diagnostics
For more complex issues, you can use port-forwarding to test services directly:
# Port-forward the service directly
kubectl port-forward -n your-namespace svc/your-service 8080:80
# Then test locally
curl http://localhost:8080
You can also deploy a debug pod to test connectivity from inside the cluster:
# Start a debug pod
kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh
# Inside the pod, test DNS and connectivity
nslookup your-service.your-namespace.svc.cluster.local
wget -O- http://your-service.your-namespace.svc.cluster.local
ExternalDNS Specifics
ExternalDNS can be particularly troublesome. Here are specific debugging steps:
- Check Log Level: Set
--log-level=debug
for more detailed logs - Check Domain Filter: Ensure
--domain-filter
includes your domain - Check Provider: Ensure
--provider=cloudflare
(or your DNS provider) - Verify API Permissions: CloudFlare token needs Zone.Zone and Zone.DNS permissions
- Check TXT Records: ExternalDNS uses TXT records for ownership tracking
# Restart with verbose logging
kubectl set env deployment/external-dns -n externaldns -- --log-level=debug
# Check for specific domain errors
kubectl logs -n externaldns -l app=external-dns | grep -i yourservice.yourdomain.com
CloudFlare Specific Issues
When using CloudFlare, additional issues may arise:
- API Rate Limiting: CloudFlare may rate limit frequent API calls
- DNS Propagation: Changes may take time to propagate through CloudFlare's CDN
- Proxied Records: The
external-dns.alpha.kubernetes.io/cloudflare-proxied
annotation controls whether CloudFlare proxies traffic - Access Restrictions: CloudFlare Access or Page Rules may restrict access
- API Token Permissions: The token must have Zone:Zone:Read and Zone:DNS:Edit permissions
- Zone Detection: If using subdomains, ensure the parent domain is included in the domain filter
Check CloudFlare dashboard for:
- DNS record existence
- API access logs
- DNS settings including proxy status
- Any error messages or rate limit warnings