cert-manager needs a kick sometimes #3

Open
opened 2025-07-25 00:49:06 +00:00 by payneio · 0 comments
Owner

Every time wild-setup-services is run, it has difficulties with cert-manager completing successfully.

It can eventually be fixed with: "The cleanup worked - cert-manager successfully retried and completed the ACME challenges. Your Cloudflare API token permissions were correct, the issue was just stuck challenges that needed to be cleared."

kubectl delete challenges --all -n cert-manager
kubectl delete certificaterequests --all -n cert-manager 

Suggestions from Claude Code:

Most Common Fix - Missing Zone ID:

Get your zone ID from Cloudflare dashboard, then patch:

kubectl patch clusterissuer letsencrypt-prod --type='json' -p='[{"op": "add", "path": 
"/spec/acme/solvers/0/dns01/cloudflare/zoneID", "value": "your-zone-id-here"}]'

Other preventive measures:

  1. Set explicit cleanup policy:

    Add to your certificates:

    spec:
      acme:
        config:
        - dns01:
            cloudflare:
              cleanupPolicy: "delete"  # or "retain"
    
  2. Use scoped API token (you're already doing this correctly)

  3. Add resource limits to prevent cert-manager from overwhelming Cloudflare API:
    In cert-manager deployment

env:
  - name: CLOUDFLARE_API_RATE_LIMIT
    value: "1200"  # requests per hour
  1. Monitor and auto-cleanup script:
#!/bin/bash
# Add to cron to run every 10 minutes
kubectl get challenges -n cert-manager --no-headers | while read name rest; do
  age=$(kubectl get challenge $name -n cert-manager -o jsonpath='{.metadata.creationTimestamp}')
  # Delete challenges older than 15 minutes
  if [[ $(date -d "$age" +%s) -lt $(date -d "15 minutes ago" +%s) ]]; then
    kubectl delete challenge $name -n cert-manager
  fi
done

Try the zone ID fix first - that's usually what resolves the cleanup errors permanently.

Every time `wild-setup-services` is run, it has difficulties with cert-manager completing successfully. It can eventually be fixed with: "The cleanup worked - cert-manager successfully retried and completed the ACME challenges. Your Cloudflare API token permissions were correct, the issue was just stuck challenges that needed to be cleared." ```bash kubectl delete challenges --all -n cert-manager kubectl delete certificaterequests --all -n cert-manager ``` ## Suggestions from Claude Code: Most Common Fix - Missing Zone ID: Get your zone ID from Cloudflare dashboard, then patch: ``` kubectl patch clusterissuer letsencrypt-prod --type='json' -p='[{"op": "add", "path": "/spec/acme/solvers/0/dns01/cloudflare/zoneID", "value": "your-zone-id-here"}]' ``` Other preventive measures: 1. Set explicit cleanup policy: Add to your certificates: ``` spec: acme: config: - dns01: cloudflare: cleanupPolicy: "delete" # or "retain" ``` 2. Use scoped API token (you're already doing this correctly) 3. Add resource limits to prevent cert-manager from overwhelming Cloudflare API: In cert-manager deployment ``` env: - name: CLOUDFLARE_API_RATE_LIMIT value: "1200" # requests per hour ``` 4. Monitor and auto-cleanup script: ``` #!/bin/bash # Add to cron to run every 10 minutes kubectl get challenges -n cert-manager --no-headers | while read name rest; do age=$(kubectl get challenge $name -n cert-manager -o jsonpath='{.metadata.creationTimestamp}') # Delete challenges older than 15 minutes if [[ $(date -d "$age" +%s) -lt $(date -d "15 minutes ago" +%s) ]]; then kubectl delete challenge $name -n cert-manager fi done ``` Try the zone ID fix first - that's usually what resolves the cleanup errors permanently.
payneio added this to the Ready for Early Adopters milestone 2025-07-25 15:52:33 +00:00
payneio added the
feature
label 2025-07-25 16:23:48 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: CSTF/wild-cloud#3
No description provided.