cert-manager needs a kick sometimes #3

New Issue

payneio · 2025-07-25T00:49:06Z

payneio commented

2025-07-25 00:49:06 +00:00

Every time wild-setup-services is run, it has difficulties with cert-manager completing successfully.

It can eventually be fixed with: "The cleanup worked - cert-manager successfully retried and completed the ACME challenges. Your Cloudflare API token permissions were correct, the issue was just stuck challenges that needed to be cleared."

kubectl delete challenges --all -n cert-manager
kubectl delete certificaterequests --all -n cert-manager

Suggestions from Claude Code:

Most Common Fix - Missing Zone ID:

Get your zone ID from Cloudflare dashboard, then patch:

kubectl patch clusterissuer letsencrypt-prod --type='json' -p='[{"op": "add", "path": 
"/spec/acme/solvers/0/dns01/cloudflare/zoneID", "value": "your-zone-id-here"}]'

Other preventive measures:

Set explicit cleanup policy:

Add to your certificates:

spec:
  acme:
    config:
    - dns01:
        cloudflare:
          cleanupPolicy: "delete"  # or "retain"

Use scoped API token (you're already doing this correctly)
Add resource limits to prevent cert-manager from overwhelming Cloudflare API:
In cert-manager deployment

env:
  - name: CLOUDFLARE_API_RATE_LIMIT
    value: "1200"  # requests per hour

Monitor and auto-cleanup script:

#!/bin/bash
# Add to cron to run every 10 minutes
kubectl get challenges -n cert-manager --no-headers | while read name rest; do
  age=$(kubectl get challenge $name -n cert-manager -o jsonpath='{.metadata.creationTimestamp}')
  # Delete challenges older than 15 minutes
  if [[ $(date -d "$age" +%s) -lt $(date -d "15 minutes ago" +%s) ]]; then
    kubectl delete challenge $name -n cert-manager
  fi
done

Try the zone ID fix first - that's usually what resolves the cleanup errors permanently.

Every time `wild-setup-services` is run, it has difficulties with cert-manager completing successfully. It can eventually be fixed with: "The cleanup worked - cert-manager successfully retried and completed the ACME challenges. Your Cloudflare API token permissions were correct, the issue was just stuck challenges that needed to be cleared." ```bash kubectl delete challenges --all -n cert-manager kubectl delete certificaterequests --all -n cert-manager ``` ## Suggestions from Claude Code: Most Common Fix - Missing Zone ID: Get your zone ID from Cloudflare dashboard, then patch: ``` kubectl patch clusterissuer letsencrypt-prod --type='json' -p='[{"op": "add", "path": "/spec/acme/solvers/0/dns01/cloudflare/zoneID", "value": "your-zone-id-here"}]' ``` Other preventive measures: 1. Set explicit cleanup policy: Add to your certificates: ``` spec: acme: config: - dns01: cloudflare: cleanupPolicy: "delete" # or "retain" ``` 2. Use scoped API token (you're already doing this correctly) 3. Add resource limits to prevent cert-manager from overwhelming Cloudflare API: In cert-manager deployment ``` env: - name: CLOUDFLARE_API_RATE_LIMIT value: "1200" # requests per hour ``` 4. Monitor and auto-cleanup script: ``` #!/bin/bash # Add to cron to run every 10 minutes kubectl get challenges -n cert-manager --no-headers | while read name rest; do age=$(kubectl get challenge $name -n cert-manager -o jsonpath='{.metadata.creationTimestamp}') # Delete challenges older than 15 minutes if [[ $(date -d "$age" +%s) -lt $(date -d "15 minutes ago" +%s) ]]; then kubectl delete challenge $name -n cert-manager fi done ``` Try the zone ID fix first - that's usually what resolves the cleanup errors permanently.

payneio added this to the Ready for Early Adopters milestone 2025-07-25 15:52:33 +00:00

payneio added the

feature

label 2025-07-25 16:23:48 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: CSTF/wild-cloud#3