Data extraction from PDF

Data extraction from PDF

Customer Challenges:

  • Enable a leading Test Automation Platform to use semi-structured PDF data as a query-able, addressable data source to query/read data, text, tables, images


  • ~90% accuracy at the end of POC Period
  • Seamless PDF Data Extraction on Production rollout for first Customer
  • Easy, Automated Training of new PDF formats

CoreView Solution:

  • NLP, ML Data Extraction solution based on pre-trained Google models
  • Confidence Score of PDF Data Detection and Extraction process
  • UI Driven & Automated Training of new PDF Formats
  • Future extensibility to Scanned PDFs

Other Considerations:

  • Work with any PDF layouts
  • Computer generated, scanned PDFs
  • Easily learn variations & new layouts

Share this post