# Troubleshooting Service Visibility This guide covers common issues with accessing services from outside the cluster and how to diagnose and fix them. ## Common Issues External access to your services might fail for several reasons: 1. **DNS Resolution Issues** - Domain names not resolving to the correct IP address 2. **Network Connectivity Issues** - Traffic can't reach the cluster's external IP 3. **TLS Certificate Issues** - Invalid or missing certificates 4. **Ingress/Service Configuration Issues** - Incorrectly configured routing ## Diagnostic Steps ### 1. Check DNS Resolution **Symptoms:** - Browser shows "site cannot be reached" or "server IP address could not be found" - `ping` or `nslookup` commands fail for your domain - Your service DNS records don't appear in CloudFlare or your DNS provider **Checks:** ```bash # Check if your domain resolves (from outside the cluster) nslookup yourservice.yourdomain.com # Check if ExternalDNS is running kubectl get pods -n externaldns # Check ExternalDNS logs for errors kubectl logs -n externaldns -l app=external-dns < /dev/null | grep -i error kubectl logs -n externaldns -l app=external-dns | grep -i "your-service-name" # Check if CloudFlare API token is configured correctly kubectl get secret cloudflare-api-token -n externaldns ``` **Common Issues:** a) **ExternalDNS Not Running**: The ExternalDNS pod is not running or has errors. b) **Cloudflare API Token Issues**: The API token is invalid, expired, or doesn't have the right permissions. c) **Domain Filter Mismatch**: ExternalDNS is configured with a `--domain-filter` that doesn't match your domain. d) **Annotations Missing**: Service or Ingress is missing the required ExternalDNS annotations. **Solutions:** ```bash # 1. Recreate CloudFlare API token secret kubectl create secret generic cloudflare-api-token \ --namespace externaldns \ --from-literal=api-token="your-api-token" \ --dry-run=client -o yaml | kubectl apply -f - # 2. Check and set proper annotations on your Ingress: kubectl annotate ingress your-ingress -n your-namespace \ external-dns.alpha.kubernetes.io/hostname=your-service.your-domain.com # 3. Restart ExternalDNS kubectl rollout restart deployment -n externaldns external-dns ``` ### 2. Check Network Connectivity **Symptoms:** - DNS resolves to the correct IP but the service is still unreachable - Only some services are unreachable while others work - Network timeout errors **Checks:** ```bash # Check if MetalLB is running kubectl get pods -n metallb-system # Check MetalLB IP address pool kubectl get ipaddresspools.metallb.io -n metallb-system # Verify the service has an external IP kubectl get svc -n your-namespace your-service ``` **Common Issues:** a) **MetalLB Configuration**: The IP pool doesn't match your network or is exhausted. b) **Firewall Issues**: Firewall is blocking traffic to your cluster's external IP. c) **Router Configuration**: NAT or port forwarding issues if using a router. **Solutions:** ```bash # 1. Check and update MetalLB configuration kubectl apply -f infrastructure_setup/metallb/metallb-pool.yaml # 2. Check service external IP assignment kubectl describe svc -n your-namespace your-service ``` ### 3. Check TLS Certificates **Symptoms:** - Browser shows certificate errors - "Your connection is not private" warnings - Cert-manager logs show errors **Checks:** ```bash # Check certificate status kubectl get certificates -A # Check cert-manager logs kubectl logs -n cert-manager -l app=cert-manager # Check if your ingress is using the correct certificate kubectl get ingress -n your-namespace your-ingress -o yaml ``` **Common Issues:** a) **Certificate Issuance Failures**: DNS validation or HTTP validation failing. b) **Wrong Secret Referenced**: Ingress is referencing a non-existent certificate secret. c) **Expired Certificate**: Certificate has expired and wasn't renewed. **Solutions:** ```bash # 1. Check and recreate certificates kubectl apply -f infrastructure_setup/cert-manager/wildcard-certificate.yaml # 2. Update ingress to use correct secret kubectl patch ingress your-ingress -n your-namespace --type=json \ -p='[{"op": "replace", "path": "/spec/tls/0/secretName", "value": "correct-secret-name"}]' ``` ### 4. Check Ingress Configuration **Symptoms:** - HTTP 404, 503, or other error codes - Service accessible from inside cluster but not outside - Traffic routed to wrong service **Checks:** ```bash # Check ingress status kubectl get ingress -n your-namespace # Check Traefik logs kubectl logs -n kube-system -l app.kubernetes.io/name=traefik # Check ingress configuration kubectl describe ingress -n your-namespace your-ingress ``` **Common Issues:** a) **Incorrect Service Targeting**: Ingress is pointing to wrong service or port. b) **Traefik Configuration**: IngressClass or middleware issues. c) **Path Configuration**: Incorrect path prefixes or regex. **Solutions:** ```bash # 1. Verify ingress configuration kubectl edit ingress -n your-namespace your-ingress # 2. Check that the referenced service exists kubectl get svc -n your-namespace # 3. Restart Traefik if needed kubectl rollout restart deployment -n kube-system traefik ``` ## Advanced Diagnostics For more complex issues, you can use port-forwarding to test services directly: ```bash # Port-forward the service directly kubectl port-forward -n your-namespace svc/your-service 8080:80 # Then test locally curl http://localhost:8080 ``` You can also deploy a debug pod to test connectivity from inside the cluster: ```bash # Start a debug pod kubectl run -i --tty --rm debug --image=busybox --restart=Never -- sh # Inside the pod, test DNS and connectivity nslookup your-service.your-namespace.svc.cluster.local wget -O- http://your-service.your-namespace.svc.cluster.local ``` ## ExternalDNS Specifics ExternalDNS can be particularly troublesome. Here are specific debugging steps: 1. **Check Log Level**: Set `--log-level=debug` for more detailed logs 2. **Check Domain Filter**: Ensure `--domain-filter` includes your domain 3. **Check Provider**: Ensure `--provider=cloudflare` (or your DNS provider) 4. **Verify API Permissions**: CloudFlare token needs Zone.Zone and Zone.DNS permissions 5. **Check TXT Records**: ExternalDNS uses TXT records for ownership tracking ```bash # Restart with verbose logging kubectl set env deployment/external-dns -n externaldns -- --log-level=debug # Check for specific domain errors kubectl logs -n externaldns -l app=external-dns | grep -i yourservice.yourdomain.com ``` ## CloudFlare Specific Issues When using CloudFlare, additional issues may arise: 1. **API Rate Limiting**: CloudFlare may rate limit frequent API calls 2. **DNS Propagation**: Changes may take time to propagate through CloudFlare's CDN 3. **Proxied Records**: The `external-dns.alpha.kubernetes.io/cloudflare-proxied` annotation controls whether CloudFlare proxies traffic 4. **Access Restrictions**: CloudFlare Access or Page Rules may restrict access 5. **API Token Permissions**: The token must have Zone:Zone:Read and Zone:DNS:Edit permissions 6. **Zone Detection**: If using subdomains, ensure the parent domain is included in the domain filter Check CloudFlare dashboard for: - DNS record existence - API access logs - DNS settings including proxy status - Any error messages or rate limit warnings