Data extraction from PDF – CoreView

21NovNovember 21, 2022

Data extraction from PDF

By Sachin Kalaskar Uncategorized Comments Off

Customer Challenges:

Enable a leading Test Automation Platform to use semi-structured PDF data as a query-able, addressable data source to query/read data, text, tables, images

Result:

~90% accuracy at the end of POC Period
Seamless PDF Data Extraction on Production rollout for first Customer
Easy, Automated Training of new PDF formats

CoreView Solution:

NLP, ML Data Extraction solution based on pre-trained Google models
Confidence Score of PDF Data Detection and Extraction process
UI Driven & Automated Training of new PDF Formats
Future extensibility to Scanned PDFs

Other Considerations:

Work with any PDF layouts
Computer generated, scanned PDFs
Easily learn variations & new layouts

Author

Sachin Kalaskar