Loading...
survey

CSAT and QA Score Correlation Review

Review how QA scorecard results line up with post-call CSAT for a specific period, team, or queue. Use it to spot scorecard gaps, evaluator drift, and the few changes most likely to improve customer satisfaction.

Trusted by frontline teams 15 years of frontline software AI customization in seconds

Built for: Customer Support · Financial Services · Healthcare · Telecommunications · Saas

Overview

This template is a structured review for comparing internal QA evaluation scores with post-call CSAT ratings on the same set of interactions. It helps you validate whether your scorecard is actually measuring the behaviors that drive customer satisfaction, or whether it is rewarding process compliance that customers do not feel.

Use it when you need to review a specific period, team, queue, or business unit and turn QA and CSAT data into a decision-ready readout. The template covers review context, aggregate score alignment, scorecard criteria validity, evaluator calibration, detractor and outlier analysis, and action planning. It is especially useful after a scorecard change, a coaching push, or a period of unusual CSAT movement.

Do not use it as a generic performance summary or as a substitute for a full QA program review. If you do not have linked QA and CSAT data for the same interactions, the correlation analysis will be weak. It is also not the right tool for very small samples where one or two ratings can distort the pattern. The best output is a clear explanation of where QA and CSAT agree, where they diverge, and what should change next in the scorecard, calibration process, or coaching plan.

Standards & compliance context

  • If CSAT comments or QA notes contain customer or employee personal data, keep the review limited to authorized users and follow your organization’s data retention rules.
  • When the template is used in regulated environments such as financial services or healthcare, ensure the reviewed interactions follow the applicable recording, disclosure, and privacy requirements.
  • If evaluator notes are used for coaching or performance management, apply the same access controls and documentation standards used for other employee performance records.
  • Keep the review focused on linked interaction data and avoid adding unnecessary demographic fields that could create privacy risk or bias the analysis.

General regulatory context for orientation only — verify current requirements with counsel or the relevant agency before relying on this template for compliance.

What's inside this template

Review Context & Attribution

This section establishes the exact period, scope, sample size, response rate, and reviewer so the correlation analysis can be interpreted correctly.

  • What is the review period covered by this correlation analysis? (required)

    Select the cadence period this review covers (e.g., monthly, quarterly).

  • Which team, queue, or business unit is being reviewed? (required)

    Specify the team, skill group, or product line (e.g., ‘Tier 1 Support – EMEA’).

  • What is the total number of interactions included in this analysis? (required)

    Enter the sample size of QA-evaluated interactions that also have a matched CSAT response.

  • What was the CSAT response rate for this period?

    Enter as a percentage (e.g., 18%). Low response rates (<10%) reduce correlation reliability.

  • Who conducted this correlation review? (required)

    Name and role of the QA analyst or program manager completing this review.

Aggregate Score Alignment

This section shows whether QA and CSAT move together at the overall level or whether the scorecard is missing the customer experience signal.

  • The average QA score for this period accurately reflects the level of service customers reported receiving. (required)

    Rate alignment: 1 = Strongly disagree (large gap between QA avg and CSAT avg) → 5 = Strongly agree (scores closely mirror each other).

  • The distribution of QA scores (high / mid / low) mirrors the distribution of CSAT ratings (promoter / passive / detractor) for the same interactions. (required)

    1 = Strongly disagree → 5 = Strongly agree.

  • If aggregate alignment is rated 3 or below, describe the nature of the gap (e.g., QA scores high but CSAT low, or vice versa).

    Provide directional detail: which metric is inflated or deflated relative to the other, and by how much.

  • Interactions scored 90%+ on QA consistently receive a CSAT rating of 4 or 5 out of 5. (required)

    1 = Strongly disagree → 5 = Strongly agree. This tests whether top QA scores predict promoter-level satisfaction.

  • Interactions scored below 70% on QA consistently receive a CSAT rating of 1 or 2 out of 5. (required)

    1 = Strongly disagree → 5 = Strongly agree. This tests whether low QA scores predict detractor-level satisfaction.

Scorecard Criteria Validity

This section tests which criteria actually track customer satisfaction and which ones may be over-weighted, under-weighted, or missing entirely.

  • Which QA scorecard section or criterion shows the STRONGEST positive correlation with high CSAT scores? (required)

    Name the specific criterion (e.g., ‘Empathy & Tone’, ‘First Contact Resolution’, ‘Accurate Information Provided’).

  • Which QA scorecard section or criterion shows the WEAKEST or NEGATIVE correlation with CSAT scores? (required)

    Identify criteria where high QA scores do not translate to high customer satisfaction — these are calibration candidates.

  • The current scorecard weighting (point values per criterion) reflects what customers actually care about most. (required)

    1 = Strongly disagree → 5 = Strongly agree. Misaligned weighting is a leading cause of QA-CSAT divergence.

  • Describe any scorecard criteria you believe are over-weighted relative to their impact on customer satisfaction.

    Include the criterion name, current weight, and your recommended adjustment with rationale.

  • Are there customer satisfaction drivers surfaced in CSAT verbatim comments that are NOT currently captured by the QA scorecard? (required)

    Select Yes or No. If Yes, detail the missing drivers in the follow-up field.

  • If yes, describe the unscored satisfaction drivers identified in CSAT verbatim feedback.

    Examples: ‘Customers frequently cite hold time as a dissatisfier — not currently scored’ or ‘Customers value proactive follow-up, which has no scorecard item’.

Evaluator Calibration & Consistency

This section checks whether the QA team is applying the scorecard consistently enough for the correlation results to be trusted.

  • QA evaluators are applying scoring criteria consistently across agents and interaction types. (required)

    1 = Strongly disagree → 5 = Strongly agree. Inconsistent calibration inflates or deflates QA scores independently of actual performance.

  • Inter-rater reliability (IRR) sessions have been conducted this period to align evaluator standards. (required)

    Select Yes, No, or Partially. Lack of calibration sessions is a common root cause of QA-CSAT divergence.

  • Evaluator halo or recency bias appears to be inflating QA scores for certain agents or time periods. (required)

    1 = Strongly disagree (no bias detected) → 5 = Strongly agree (clear bias pattern observed).

  • If evaluator bias or inconsistency is suspected, describe the pattern and the agents or evaluators involved.

    Be specific: e.g., ‘Evaluator A scores Agent X 8-10 points higher than the team average on soft-skills criteria with no CSAT support’.

Detractor & Outlier Analysis

This section isolates the mismatched interactions that reveal the most useful root causes for scorecard or coaching changes.

  • How many interactions received a high QA score (≥85%) but a low CSAT rating (1-2 out of 5) this period? (required)

    Enter the count. These ‘false positive’ interactions are the most valuable for scorecard recalibration.

  • How many interactions received a low QA score (<70%) but a high CSAT rating (4-5 out of 5) this period? (required)

    Enter the count. These ‘false negative’ interactions may indicate over-penalization of criteria customers don’t value.

  • Root causes have been identified for the majority of high-QA / low-CSAT outlier interactions. (required)

    1 = Strongly disagree → 5 = Strongly agree.

  • Summarize the most common root causes identified in high-QA / low-CSAT outlier interactions.

    Examples: ‘Resolution was technically correct but agent tone was perceived as dismissive’, ‘Policy constraints frustrated customers despite agent compliance’.

  • Summarize the most common root causes identified in low-QA / high-CSAT outlier interactions.

    Examples: ‘Agent deviated from script but customer appreciated the personalized approach’, ‘Scored down for hold time but customer was satisfied with outcome’.

Action Planning & Program Recommendations

This section converts the analysis into concrete scorecard updates, calibration actions, and a follow-up review plan.

  • Based on this review, the QA scorecard requires updates to better predict customer satisfaction. (required)

    1 = Strongly disagree (scorecard is well-calibrated) → 5 = Strongly agree (significant revision needed).

  • What specific scorecard changes are recommended as a result of this correlation review?

    Include: criteria to add, remove, or reweight; suggested new point values; and the CSAT evidence supporting each change.

  • What coaching or calibration actions will be taken with evaluators or agents based on this review?

    Examples: ‘Schedule IRR session for all evaluators in Q3’, ‘Coach Agent cohort on empathy language linked to detractor verbatims’.

  • A follow-up correlation review is scheduled to measure the impact of changes made this period. (required)

    Select Yes or No. Continuous loop reviews are essential to validate that scorecard changes improve predictive accuracy.

  • Is there anything else about the QA-CSAT relationship this period that should be documented for program leadership?

    Include any contextual factors (e.g., product outages, policy changes, seasonal volume spikes) that may have distorted the correlation this period.

How to use this template

  1. 1. Define the review period, team or queue, interaction count, CSAT response rate, and reviewer so the analysis has a clear attribution frame.
  2. 2. Pull the linked interaction set with both QA scores and CSAT ratings, then separate the results into high, mid, and low QA bands and promoter, passive, and detractor CSAT groups.
  3. 3. Compare the aggregate alignment and identify whether the scorecard is over-scoring, under-scoring, or missing the customer experience reflected in verbatim comments.
  4. 4. Review each scorecard criterion to find the strongest positive correlation, the weakest or negative correlation, and any unscored satisfaction drivers that appear repeatedly in CSAT feedback.
  5. 5. Check evaluator consistency by looking for halo effects, recency bias, or IRR gaps, then document the outlier patterns and assign scorecard, coaching, or calibration actions.
  6. 6. Record the follow-up plan, including scorecard changes, evaluator coaching, and the date of the next correlation review to confirm whether the changes improved alignment.

Best practices

  • Use only interactions that have both a QA score and a CSAT response, because unmatched records can make the correlation look cleaner than it is.
  • Treat CSAT verbatims as evidence for engagement drivers, not as anecdotal color, and look for repeated themes across outlier interactions.
  • Separate analysis by queue, interaction type, or business unit when the customer journey differs enough that one blended scorecard would hide the real pattern.
  • Flag high QA / low CSAT and low QA / high CSAT outliers separately, because each pattern points to a different scorecard or coaching problem.
  • Check evaluator consistency before rewriting the scorecard, since scoring drift can create a false mismatch between QA and CSAT.
  • Reweight or rewrite only the criteria that repeatedly fail to track customer satisfaction, and leave stable criteria alone to avoid unnecessary churn.
  • Document whether the issue is scorecard validity, agent execution, or customer expectation mismatch so leadership can act on the right problem.

What this template typically catches

Issues teams running this template most often surface in practice:

QA rewards script adherence while CSAT reflects whether the customer felt understood and helped.
A scorecard criterion is heavily weighted even though customers rarely mention it in positive or negative comments.
High QA / low CSAT outliers cluster around unresolved issues, weak ownership, or poor handoff behavior.
Low QA / high CSAT outliers show that the agent recovered well despite missing a process step.
Evaluator scoring varies by agent, time period, or interaction type, creating inconsistent QA results.
CSAT comments reveal an unscored satisfaction driver such as speed, clarity, empathy, or follow-through.
The CSAT response rate is too low to support strong conclusions, so the sample needs to be interpreted cautiously.

Common use cases

Contact Center QA Lead Review
A QA lead reviews one month of billing queue interactions to see whether the current scorecard predicts customer satisfaction. The output is used to decide whether a few criteria should be reweighted before the next coaching cycle.
CX Operations Scorecard Validation
A CX operations manager compares post-call CSAT with QA results after a scorecard refresh. The review helps confirm whether the new rubric captures the engagement drivers customers mention most often in verbatim comments.
Evaluator Calibration Session
A QA program owner uses the template before an IRR session to identify where evaluators are drifting. The team then calibrates on the criteria that show the weakest correlation with customer outcomes.
Healthcare Patient Support Queue
A patient support team reviews calls where compliance was perfect but patient satisfaction was mixed. The template helps separate necessary regulatory steps from the service behaviors that most affect trust and intent to stay.
SaaS Technical Support Review
A support manager examines cases where agents scored well on troubleshooting but customers still rated the interaction poorly. The review surfaces missing ownership, unclear next steps, or weak expectation-setting that the scorecard should capture.

Frequently asked questions

What does this correlation review template actually measure?

It compares internal QA scores against post-call CSAT ratings for the same interactions. The goal is to see whether the scorecard predicts customer satisfaction, not just whether agents followed process. It also captures outliers, such as high QA with low CSAT, so you can identify where the scorecard misses real customer experience drivers.

When should we run a CSAT and QA score correlation review?

Run it on a regular cadence such as monthly or quarterly, depending on call volume and how often your scorecard changes. Monthly works well for active coaching programs and fast-moving queues, while quarterly is better when volume is lower or changes are more stable. If you launch a new scorecard or major policy change, run an extra review after the rollout.

Who should complete this review?

A QA lead, CX operations manager, contact center analyst, or team manager usually owns it. The best reviewer has access to both QA results and CSAT verbatims, plus enough context to interpret outliers by queue, agent, and interaction type. If evaluator calibration is part of the issue, include the QA program owner or a calibration facilitator.

What sample size or scope should we include?

Use a defined review period and include the interactions that have both a QA score and a CSAT response. The template is designed for a specific team, queue, or business unit, so avoid mixing very different interaction types unless you plan to compare them separately. If response rate is low, note that limitation because weak CSAT coverage can distort the correlation.

How do we interpret high QA but low CSAT results?

That pattern usually means the scorecard is rewarding behaviors that do not fully reflect customer satisfaction, or that some customer pain points are not being scored. Common causes include missed ownership, weak empathy, unresolved issues, or policy compliance that feels rigid to the customer. The template asks you to document the root causes so the scorecard can be adjusted instead of only coaching the agent.

How do we interpret low QA but high CSAT results?

That pattern often means the scorecard is over-weighting process details that customers do not care about as much as the final outcome. It can also show that the agent recovered well despite a technical miss, or that the customer valued speed, clarity, or resolution more than perfect script adherence. Use the review to decide whether the criterion should be reweighted, rewritten, or removed.

Should we change the scorecard every time we find a mismatch?

No. One review should not automatically trigger a redesign unless the pattern is consistent across enough interactions to be meaningful. The template is built to separate isolated outliers from repeated gaps in scorecard validity. Use the findings to decide whether the issue is the scorecard, evaluator calibration, or a coaching opportunity.

Can this template be used with other systems or reporting tools?

Yes. It works well alongside QA platforms, speech analytics, CRM notes, and CSAT survey exports. The key is to keep the interaction-level linkage intact so you can compare the same call or case across QA and customer feedback. If you use BI dashboards, this template can serve as the narrative review layer for the numbers.

What is the biggest mistake teams make with this review?

The most common mistake is treating QA and CSAT as separate scorecards instead of testing whether they align. Another pitfall is ignoring the open-text CSAT comments, which often reveal the engagement driver or service failure behind the rating. Teams also sometimes skip evaluator consistency checks, which can make the correlation look worse or better than it really is.

Go deeper on the topic

Related concepts
  • Benchmarking is the practice of comparing an organization's metrics — compensation, engagement, turnover, time-to-hire, training hours, span of control, any...
  • Communication at work is the practice of moving information reliably — announcements, decisions, expectations, problems — between the people who have it and...
  • A communications cascade is the pattern where corporate leadership sends a message to the next management layer, which rebriefs the layer below it, and so on...
  • Corporate communications is the broad function that owns how the company communicates — to employees, investors, customers, regulators, and the press....
Related guides

Ready to use this template?

Get started with MangoApps and use CSAT and QA Score Correlation Review with your team — pricing built for small business.

Ask AI Product Advisor

Hi! I'm the MangoApps Product Advisor. I can help you with:

  • Understanding our 40+ workplace apps
  • Finding the right solution for your needs
  • Answering questions about pricing and features
  • Pointing you to free tools you can try right now

What would you like to know?