Service Production Readiness Review
Service Production Readiness Review
Pre-launch inspection to verify a service is ready for production, including monitoring, alerting, runbooks, on-call ownership, scaling, and security controls before go-live.
Review Details
-
Service name and version identified
Record the service name, release version, and environment being reviewed.
-
Review date and launch target documented
Capture the readiness review date and planned production go-live date.
-
Review owner assigned
Identify the person responsible for coordinating the readiness review and closure of deficiencies.
Monitoring and Alerting
-
Production metrics are defined and visible
Confirm key service health metrics are available in dashboards, including latency, error rate, traffic, saturation, and availability.
-
Alert thresholds are documented and tuned
Verify alert thresholds are based on meaningful service signals and are not generating excessive false positives or alert fatigue.
-
Alert routing reaches the correct on-call channel
Confirm alerts route to the intended paging, chat, or ticketing destination for the owning team.
-
Synthetic or health checks are in place
Verify automated health checks or synthetic transactions exist to detect service degradation quickly.
Runbooks and Incident Response
-
Operational runbook is published and current
Confirm the runbook includes startup, shutdown, common failure modes, escalation steps, and recovery actions.
-
Incident escalation path is documented
Verify the escalation path identifies who to contact, when to escalate, and how to engage additional support.
-
Rollback or mitigation procedure is tested
Confirm a rollback, feature flag disablement, or mitigation plan has been validated for launch failure scenarios.
-
Known launch risks and open defects are tracked
Document any accepted risks, unresolved defects, or temporary workarounds that remain before go-live.
Ownership and Support Readiness
-
Primary on-call owner is assigned
Record the primary on-call owner or team responsible for production support at launch.
-
Secondary escalation contact is assigned
Record the backup responder or escalation contact for after-hours or high-severity incidents.
-
Support handoff completed
Confirm operations, support, and engineering teams have completed the launch handoff and understand support boundaries.
-
On-call schedule covers launch window
Verify the on-call schedule provides coverage for the launch period, including weekends or holidays if applicable.
Scaling, Capacity, and Resilience
-
Load or capacity test completed
Confirm the service has been tested at expected peak load and the results meet performance targets.
-
Autoscaling or capacity controls configured
Verify scaling rules, resource limits, or capacity reservations are configured for expected demand.
-
Single points of failure reviewed
Confirm major dependencies, regions, queues, databases, or third-party services have been reviewed for resilience risks.
-
Recovery time and recovery point objectives documented
Record the approved RTO and RPO for the service, if applicable.
Security and Compliance
-
Access controls follow least privilege
Confirm production access is limited to approved users, roles, and service accounts.
-
Secrets and credentials are stored securely
Verify secrets are not hardcoded and are managed through approved secret storage or key management controls.
-
Security review or sign-off completed
Confirm the service has completed the required security assessment, vulnerability review, or approval gate before launch.
-
Open security findings documented
Capture any remaining security findings that have been accepted for launch with an approved remediation plan.
Ask AI
Template Studio