Open Data & Transparency Reports·4 min read·Updated 1 April 2026

Accuracy Metrics — Live Report

Current AI accuracy rates across all question categories, confidence level distribution, flagging rates, and quarter-on-quarter improvement trends.

Live metrics

93.0%Overall accuracy (Q1 2026)
847Flags received Q1 2026
36.8%Confirmed error rate from flags
2,400+Benchmark test cases

Q1 2026 Accuracy Summary

These figures are measured against our internal benchmark suite of 2,400+ test queries, updated quarterly. They reflect performance of the currently deployed model version.

By question category:

  • Calculation questions: 98.2% (↑ from 97.8% in Q4 2025)
  • Pattern questions: 94.7% (↑ from 93.1% in Q4 2025)
  • Factual questions: 91.3% (→ stable from Q4 2025)
  • Recommendation questions: 87.6% (↑ from 85.2% in Q4 2025)

Error type distribution:

  • Hallucination/fabrication: 1.2% of responses
  • Data misinterpretation: 2.8% of responses
  • Overconfidence: 3.1% of responses
  • Under-specificity: 4.2% of responses
  • Currency/unit errors: 0.9% of responses

Confidence level distribution (all queries, April 2026):

  • High confidence: 71%
  • Medium confidence: 21%
  • Low confidence: 6%
  • Estimate: 2%

User-Flagged Errors

Q1 2026 flagging summary:

  • Total flags received: 847
  • Confirmed errors: 312 (36.8%)
  • Not confirmed (correct answer): 398 (47.0%)
  • Ambiguous / additional context needed: 137 (16.2%)

Most common confirmed error types flagged by users:

1. Incorrect date range interpretation (23% of confirmed errors)

2. Incorrect currency conversion (18%)

3. Seasonality misattribution (15%)

4. Missing data source not flagged (14%)

5. Regulatory information outdated (11%)

6. Other / miscellaneous (19%)

Actions taken:

  • System prompt updates addressing top error types: 6
  • New benchmark test cases added: 89
  • Escalated to Anthropic for model-level review: 3

Improvement Trend

Overall accuracy has improved consistently since launch:

  • Q1 2025 (launch): 84.2% overall accuracy
  • Q2 2025: 86.7%
  • Q3 2025: 88.4%
  • Q4 2025: 90.1%
  • Q1 2026: 93.0% (weighted average across all categories)

The most significant single improvement was the deployment of our custom financial domain fine-tuning layer in Q3 2025, which reduced currency and calculation errors by 34%.