Data Quality Assessment
Data Quality Assessment (DQA) checks are implemented at two levels to ensure data completeness, consistency, and accuracy for the MOH Quality Assurance Register: Initial Test (CM-QAI EN v1.2).
Part 1 — Client-Side Data Validation
Validation rules run on the ScanForm device in real time as the field worker photographs the form. These checks fire before the record is submitted — catching errors at the point of collection, not after upload.
Important constraints of client-side validation:
- Validates OCR-extracted bubble selections and digit-box readings immediately after scanning.
- Field workers can skip checks after 3 failed attempts (
min_tries_before_skipping = 3); client-side checks are advisory, not blocking. - Does not have access to the full dataset — cross-record and longitudinal checks must be done server-side.
Because field workers can bypass checks, server-side validation is always required as a second layer.
Page-Level Eligibility
Before any per-row checks run, a page-level eligibility check determines whether the page as a whole should be validated:
A page is considered active (i.e. row-level validation rules apply) when
discardPageis not marked. If the Discard Page bubble is filled, no further checks fire for any row on that page.
This avoids spurious alerts on pages that supervisors have deliberately voided.
Two page-level fields are also checked for completeness regardless of row content:
screeningTest1Test1ExpDate— must have ≥ 1 filled box (eligibility threshold: 2 boxes).visitDate— must have ≥ 1 filled box.
Row-Level Eligibility
A patient row is considered active (i.e. row validation rules apply) when:
At least 2 of the following fields are non-empty AND none of
discardPage,qualityControl, ordiscardRoware marked for that row:visitDay,age,sex,entryPoint,screeningStrategy,riskCategory,lastTestNc,screeningTest1Result,knownHivPos,providerInitials.
This threshold (≥ 2 active fields) prevents false alerts on genuinely blank rows — the form holds 15 rows per page and not all slots are always used.
Discard criteria: If discardPage, qualityControl, or discardRow are marked for a row, that row is treated as non-active and no checks fire. Quality control sample rows and discarded rows are intentionally excluded from validation.
Page-Level Checks
Per-Row Active Checks
Check Types Explained
| Validation Check Types | |
| Check Type | Description |
|---|---|
| enough_filled | At least one digit box in the group is filled — prevents blank mandatory entry fields. |
| exactly_one (enough_answers + not_too_many) | Exactly one bubble is marked — enforces single-select fields. Implemented as two paired checks: one catching zero selections, one catching multiple selections. |
| not_too_many | No more than one bubble is marked — allows the field to be left blank (optional) but prevents multi-selection on a single-select field. |
Fields Not Validated Client-Side
The following fields exist on the form but have no client-side validation rule. They are present on the physical paper and captured by OCR, but errors in these fields are caught by server-side DQA only — or are intentionally left unvalidated.
Part 2 — Server-Side Data Validation
Server-side DQA checks run automatically each time the dbt pipeline executes against the full submitted dataset. Unlike client-side validation, these checks can compare across records and across forms — catching errors visible only at the population level.
Capabilities beyond client-side:
- Validates that OCR-extracted text values can be parsed as integers or dates.
- Detects multi-bubble selections that slipped through client-side checks.
- Cross-references records against other forms (linking Initial ↔︎ Confirmatory registers).
- Checks for future visit dates relative to submission date.
- Applies to all submitted records, including those where client-side warnings were dismissed.
The SQL checks documented here are drawn from cm_hts_clean__confirmatory_checks.sql, which covers the Confirmatory register pipeline. They share the same field names and conventions as the Initial register and illustrate the server-side DQA architecture applied across all CM-HTS forms.
Record Exclusion Logic
Records are excluded from the clean dataset — not deleted. Excluded records remain visible in the DQA dashboard and can be corrected and re-entered.
A record is excluded from the clean layer when:
- It has one or more checks with severity Error.
- Its
Page_IDis associated with a record wherequalityControlSample,invalidTestResult, orimplementationErrorSampleis true — these pages are excluded wholesale from the checks output.
Check catalogue for this pipeline: 5 Warning-level check categories · 2 Error-level check categories (linking checks).
Error Classification
| Severity Levels | ||
| Severity | Meaning | Action |
|---|---|---|
| Error | The record violates a hard rule — typically a failed linking check or a future date. Records with ≥1 Error are excluded from the clean dataset and surfaced in the DQA dashboard for correction. | Exclude from analysis · Flag for re-entry |
| Warning | The record is unusual but not necessarily wrong — e.g. an unparseable number or a multi-bubble selection. Kept in the clean dataset but flagged for review. | Keep in analysis · Flag for review |
Checks by Category
- 🔢 Invalid Numbers — Checks that OCR-extracted digit-box values can be parsed as integers.
- 📅 Invalid Dates — Checks that date fields contain valid, parseable calendar dates and are not in the future.
- 🔘 Multiple Bubbles Selected — Checks that single-select bubble fields have exactly one or zero options marked.
- 🔗 Cross-Record Linking — Checks that records link correctly between the Initial and Confirmatory registers via
clientCode. - 🚫 Future Date — Checks that visit dates are not later than the submission date.
Full Check Catalogue
Summary
| Server-Side Checks by Category — CM-HTS Pipeline | ||
| Category | ⚠️ Warnings | ❌ Errors |
|---|---|---|
| Invalid Numbers | 1 | 0 |
| Invalid Dates | 3 | 0 |
| Future Date | 0 | 1 |
| Multiple Bubbles Selected | 14 | 0 |
| Cross-Record Linking | 0 | 2 |
| **Total** | **18** | **3** |
Pipeline exclusion filter: Records whose Page_ID is associated with qualityControlSample = true, invalidTestResult = true, or implementationErrorSample = true are excluded from the checks output entirely. These pages represent known non-standard data collection events (e.g. QC runs) and are not expected to meet standard validation rules.
Generated automatically from CM-QAI EN v1.2 source files. Last updated: 2026-06-30.