Document AI Tools Research

Potential Technology to use for capturing document citation information

Minimum Requirements

  •  

Tools

Tool

Site

Status

Notes

Tool

Site

Status

Notes

Google

https://cloud.google.com/document-ai

Rejected

  • can extract text but not reference with associated citation text with hierarchy. IE {“1. text”} versus {{“1.“}, “”text”).

  • emit JSON API responses which require programming to present information.

  • used to only support OCR of image files. Now, it supports OCR of image and PDF but not Microsoft file formats.

Microsoft

https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence

Rejected

  • used to only support OCF of image files. It now supports OCR of images, PDF, and Microsoft Word.

AWS

https://aws.amazon.com/textract/

Rejected

  • used to only support OCR of image files and PDFs but maximum file size was too small for majority of our documents. Maximum file size resolved.