Runbook 12 min read Advanced

Cloudflare cutover runbook

Use this runbook to coordinate the production traffic move to Cloudflare, including pre-checks, live checks, stakeholder updates, rollback triggers, and post-cutover review.

Use this runbook to coordinate the production traffic move to Cloudflare, including pre-checks, live checks, stakeholder updates, rollback triggers, and post-cutover review.

Topics: cloudflare, resource, cloudflare, cutover, runbook

Machine-readable context: /ai-index.json

Step by step

Runbook procedure

8 steps
  1. 1

    Set the cutover window and freeze unrelated changes, then confirm every prerequisite is already done: records built, proxy status set, SSL/TLS mode confirmed, certificates issued, and policies in their intended state.

  2. 2

    Assign explicit roles for the window — who executes the traffic move, who runs validation, who watches monitoring, who communicates status, and who alone can call rollback.

  3. 3

    Define go/no-go criteria and rollback triggers in measurable terms (error rate, origin health, certificate validity, key transaction success) so the decision is not a judgment call under pressure.

  4. 4

    Run pre-checks immediately before the move: lowered TTLs confirmed, origin reachable through Cloudflare, monitoring and alerts live, and the rollback path tested or at least rehearsed.

  5. 5

    Execute the traffic move — flip proxy status, change the record or nameservers, or shift the canary share — and start the clock on the validation pass.

  6. 6

    Run live checks against the criteria: page loads over TLS, critical user journeys and APIs, cache status, origin request volume, Security Events, and error rates, comparing to the pre-cutover baseline.

  7. 7

    Hold the window: if a rollback trigger fires, execute the documented rollback immediately; if checks pass, send the all-clear and keep heightened monitoring for an agreed soak period.

  8. 8

    After the soak period, run a post-cutover review — confirm metrics are stable, capture anything that surprised the team, and hand monitoring and ownership to operations.

Risk register

Risks to control

Rollback triggers are vague, so the team debates whether to revert while the incident grows.

Write triggers as measurable thresholds (error rate, origin health, transaction success) and give one named owner sole authority to call rollback.

TTLs were never lowered, so the cutover and any rollback are both slow.

Confirm lowered TTLs as a pre-check and do not start the window until they have actually propagated.

Validation only checks that the homepage loads, missing broken APIs or authenticated journeys.

Script a validation pass covering critical user journeys, APIs, TLS, cache status, and error rates against the baseline, not just a smoke test.

Monitoring and alerting are not live when traffic moves, so a regression is noticed by users first.

Bring dashboards, Security Events, and alerts up and confirm they are receiving data before executing the move.

Concurrent unrelated changes during the window make it impossible to tell what caused a problem.

Freeze unrelated changes for the window so any regression is attributable to the cutover.

The window closes the moment checks pass, and a delayed regression appears with nobody watching.

Keep heightened monitoring through an agreed soak period before declaring the cutover complete and handing over to operations.

Output

Useful deliverables

  • Cutover window plan with the change freeze and the list of confirmed prerequisites.
  • Role and ownership matrix for execution, validation, monitoring, communication, and the rollback decision.
  • Go/no-go criteria and measurable rollback triggers agreed by application, security, and infrastructure owners.
  • Pre-check list covering TTLs, origin reachability through Cloudflare, live monitoring, and a rehearsed rollback path.
  • Live validation script covering critical journeys, APIs, TLS, cache status, Security Events, and error rates against the baseline.
  • Documented rollback procedure with the exact steps, owner, and timing constraints.
  • Post-cutover review notes and the operations handover for monitoring and ownership.

Keep reading

Related resources

FAQ

Frequently asked questions

Common questions teams ask when putting this resource into practice.

What actually goes in a cutover runbook versus the migration plan?

The migration plan is everything built and validated beforehand. The runbook is the choreography of the live window itself: the timeline, who does what, the pre-checks, the exact traffic move, the validation pass, the measurable rollback triggers, and the soak period afterward. It assumes the build is already done and tested.

How do we decide go/no-go during the window?

Against criteria agreed in advance and expressed as measurable thresholds — error rate, origin health, certificate validity, and success of key transactions — so the call is not a debate under pressure. One named owner holds the go/no-go and rollback authority.

How long should we monitor after the traffic moves?

Beyond the validation pass, hold a soak period with heightened monitoring before declaring the cutover complete. Some regressions only appear under sustained or peak traffic, so the window should not close the instant the first checks pass.

What makes a rollback fast and clean?

Lowered TTLs that have actually propagated, a rollback path that was rehearsed rather than improvised, a single owner empowered to trigger it, and a change freeze so the cutover is the only variable. Those four things turn rollback into one decision instead of a scramble.

Nanosek

Prepare your cutover

Nanosek can turn this resource into a practical delivery plan for your environment — with rollback planning, stakeholder alignment, and 24/7 managed operations support.

Ready to talk?

Deliver Cloudflare without surprises.

Whether you're migrating, hardening, or operating Cloudflare — Nanosek brings authorized MSP & ASDP delivery, rollback-ready cutovers, and managed operations after launch.