...

What is OCR in ID and Document Scanning?

OCR in remote and in person ID and passport scanning

Optical Character Recognition (OCR) is the automated conversion of printed or handwritten text in an image, like on an ID or passport, into machine-readable data or textstring. OCR is a key part of modern ID verification workflows, especially when paired with advanced methods like MRZ and barcode parsing. 

How OCR works

OCR is completed in several rapid steps.

Image capture & preprocessing

A document is first captured, via camera or ID scanner. Preprocessing steps such as de-skewing, despeckling, binarization, and contrast enhancement help optimize text clarity for recognition. All of the preprocessing cleanup helps ensure better reading capabilities in the next steps. The test must be readable to perform OCR. Many documents which are frequently OCR’d use specific fonts that are optimized for digital readability.

Text detection & recognition

OCR engines, like the one built into our proprietary ID authentication, detect text regions and apply either pattern-matching or feature-extraction algorithms to identify characters.

Post-processing & validation

Recognized text then undergoes normalization, dictionary matching, or error correction to improve accuracy; contextual analysis can also help, especially for structured documents such as IDs and drivers’ licenses.

Data mapping

OCR tailored for ID and passport scanning leverages “templating”, turning the text fields present on the document into the known fields, and comparing them to known formats for each field.

OCR vs. Barcode Parsing / MRZ Parsing

Document TypePrimary Data ExtractionRole of OCR
Passport (MRZ page)MRZ parsingNot used for MRZ itself
Driver’s License / ID CardPDF417 barcode parsingUsed optionally for front/back checks
When barcode/MRZ failsOCR fallbackReads printed front text
ID AuthenticationCross-verification + trust scoringIntegral for layered validation
Digital/remote ID verificationCross-verification + trust scoringIntegral for layered validation
IDs which do not contain a barcode or MRZOCR onlyCritical for all parsing and data analysis

When comparing OCR to barcode and MRZ scanning, there are a few differences to understand in order to determine which type of document scanning needs to be utilized.

Passports use a Machine Readable Zone (MRZ). This is a standardized block of text, often a mix of letters and numbers, that can be read by passport scanners. Passports oftentimes do not rely on OCR scanning because MRZ parsing is quicker and typically more accurate. Passport validation typically uses OCR to confirm that the data in the MRZ matches the data in the text of the passport.

Driver’s licenses and ID cards use PDF417 barcodes, also known as a 2D barcode. Scanning and parsing this barcode is the most reliable method and is preferred over OCR as it is faster and more accurate.

OCR is typically used when the front printed text of a document needs to be read, either because the barcode is unavailable, or to crossmatch against barcode data for further ID verification. OCR is also used for many non-US documents, which do not use 2D barcodes or MRZ. 

IDScan.net software also uses front/back data consistency by comparing OCR data to the information embedded in the 2D barcode data for mismatches. If there is a mismatch, it typically hints at tampering or fraudulent documents.

How OCR is incorporated into ID scanning

OCR is used to read the front of a driver’s license, ID, or passport when either the barcode/MRZ cannot be read or to perform front/back matching.

Optical Character Recognition is also used in ID authentication, to perform crossmatch and check that text data on the front of the ID matches the data in the barcode.

OCR can also be incorporated into the remote identity verification process. When verifying an ID remotely, users are asked to take a picture of the front and back of the ID, then take a selfie or follow a series of prompts. OCR reads the text on the front side of an ID and cross-checks the data parsed from the barcode on the back for further verification. Once that step has been completed, AI then compares the selfie or liveness prompts taken by the user to the one on the ID to ensure it is the same face while checking for liveness and anti-spoofing measures.

This multi-layered approach, both in person and online, ensures a higher assurance of authenticity and identity, beyond what single methods (OCR or barcode/MRZ alone) could provide.

Overview of how OCR is used in ID scanning

OCR bridges a gap, capturing printed data on the front of documents and validating it against backend encoded data, enabling the detection of subtle signs of tampering. It is useful when reading documents that do not contain a symbology, and when performing authentication checks.

Start verifying identity today

We work with businesses of all sizes to provide scalable solutions to their identity verification challenges.