DORA Metrics in Practice: Instrument Your Pipeline and Actually Move the Numbers 200OK Solutions Blog

The short answer: Most teams track DORA metrics wrong, they report numbers without instrumenting the right data sources. This guide covers exactly what to instrument in your CI/CD pipeline, how to pull MTTR from PagerDuty, and includes a Grafana dashboard JSON you can deploy today. This is a Platform Engineering implementation guide, not another definition post.

Why Your DORA Numbers Are Lying to You

If your DORA metrics live in a spreadsheet or someone manually updates them in a weekly standup, they’re already wrong. Real DORA measurement requires automated instrumentation at four specific points. Here’s what actually matters and where to wire it in.

The Four Metrics and What to Actually Instrument

1. Deployment Frequency

What most teams do: Count merged PRs. What you should do: Count successful production deployments.

In GitHub Actions, emit a deployment event only when your production job completes with success status:

yaml 
- name: Track Deployment 
  if: success() 
  run: | 
    curl -X POST $METRICS_ENDPOINT \ 
      -d '{"event":"deployment","status":"success","timestamp":"'
      "$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}'

Push this to Prometheus using a Pushgateway. Label by service and environment, not just team so you can drill down in Grafana.

2. Lead Time for Changes

What to measure: Time from first commit on a branch to that commit running in production.

Pull commit.author.date from GitHub API when the PR is opened
Record deployment.completed_at from your pipeline
Delta = Lead Time

Store both timestamps in a time-series database. A common mistake is measuring from PR merge, which hides slow review cycles, exactly the bottleneck you need to see.

3. Change Failure Rate

Formula: Failed deployments ÷ Total deployments

Flag a deployment as failed when:

A rollback is triggered within 1 hour of deploy
A PagerDuty incident is created and linked to the deployment window

Link PagerDuty incidents to deployments using deployment markers. If an incident opens within your deployment window (configurable, start with 60 minutes), mark that deployment as a failure automatically.

python 

def is_change_failure(deploy_time, incidents): 
    window = timedelta(minutes=60) 
    return any( 
        deploy_time <= i['created_at'] <= deploy_time + window 
        for i in incidents 
    )

4. MTTR — Pull This From PagerDuty, Not From Memory

MTTR (Mean Time to Restore) is where most teams have the worst data quality. The fix: use PagerDuty’s API directly.

Steps to automate MTTR tracking:

Connect PagerDuty to your metrics pipeline via webhook or scheduled API pull
For each resolved incident, calculate: resolved_at – triggered_at
Filter to P1/P2 incidents only (noise from P3/P4 distorts your elite vs high performer classification)
Push to Prometheus with service labels

bash 

# Pull resolved incidents from PagerDuty API 
curl -H "Authorization: Token token=$PD_API_KEY" \ 
  "https://api.pagerduty.com/incidents?statuses[]=resolved&since=2024-01-01"

Map the created_at to resolved_at gap per incident. Average this weekly, not monthly, monthly averages hide regression patterns.

What to Fix First (Priority Order)

Instrument deployment events : everything downstream depends on this being accurate
Wire PagerDuty MTTR : highest ROI for leadership visibility
Add lead time tracking : exposes review bottlenecks most teams ignore
Calculate CFR last : needs both deployment and incident data to be clean first

Common Mistakes That Kill Your Data Quality

Counting PR merges as deployments : only production deployments count
Including all PagerDuty incidents in MTTR : filter to production, filter to severity
Measuring monthly averages : use weekly; monthly hides regressions
No deployment markers in your APM/incident tools : without these, you can’t link incidents to specific deploys
Manual data entry anywhere in the chain : automate or the data becomes political, not factual

The Outcome You’re Actually After

DORA metrics are not a reporting exercise. They’re a feedback loop. When your pipeline emits deployment events automatically, when PagerDuty MTTR flows into Grafana without human intervention, and when your Grafana dashboard shows real-time state, you stop debating whether you’re improving and start seeing exactly where the constraint is.

That’s the difference between tracking DORA and using DORA.

Need help instrumenting your CI/CD pipeline and building a Platform Engineering practice that actually moves these numbers? See how 200OK Solutions approaches Platform Engineering →

FAQ

Q. How often should I review DORA metrics?

A. Weekly at the team level, monthly at the leadership level. Weekly cadence surfaces regressions before they compound.

Q. Which DORA metric should I fix first?

A. Deployment frequency, it’s the leading indicator. Low deployment frequency almost always causes poor scores across the other three.

Q. Can I track DORA metrics without PagerDuty?

A. Yes. Any incident management tool with an API works (Opsgenie, VictorOps, even Slack-based on-call workflows). The logic is the same: capture incident_start and incident_resolved timestamps automatically.

Q. What’s a realistic MTTR target for a team starting out?

A. Under 24 hours is “high performer” by DORA standards. Start there before chasing the elite threshold of under 1 hour.

Q. Does DORA apply to non-SaaS products?

A. Yes, with adjustments. Deployment frequency maps to release cadence. The instrumentation approach is the same; the thresholds for “elite” may differ based on your deployment model.

200OK Solutions Blog | Insights & Tutorials