Large Language Models (Part 2 of 3): How can Intelligent Document Processing Leverage the ChatGPT Revolution?
Blog Part 2: Tactical and Strategic Usages of GPT-X in the IDP AI Tech Stack
By Dr Lewis Z Liu – Co-founder and CEO at Eigen Technologies
In the second of this three-part blog series, we look in more detail at the opportunities large language models (LLMs) present for intelligent document processing (IDP), including the stages in the end-to-end process they can support and some specific use cases within financial services.
If you missed part one, you can view it here.
Opportunities with LLM/GPT-X in end-to-end IDP

In the table below, we list the various stages and components of IDP (as shown in diagram 1 above) and how LLMs and GPT-X can potentially improve each one. Of the 24 components, we identified six where LLMs/GPT-X have a high potential of improving results, five where there’s potential for them to deliver some improvements and 13 where there is no potential or expected improvement from the addition of LLMs/GPT-X.
Stage | Task/IDP Component | Description | Today | GPT/LLM Potential | 2023 Eigen Roadmap |
Pre-processing | Machine Vision (MV): OCR (Typed) | Transforming scanned documents into machine-readable format (e.g., JPEG to searchable PDF) | Multiple options: Commercial OEM (ABBYY, Kofax), Open Source (Tesseract) | None | Continuous upgrade as new commercial or open-source OCR components improve over time |
Pre-processing | MV: OCR (Handwriting) | Transforming handwriting images into searchable text | 2023 Roadmap Item | None | 2023 Roadmap Item via commercial OEM partner |
Pre-processing | Natural Language Processing (NLP): Classification | Classifying documents into categories (e.g., is this a mortgage application vs a passport vs a bank statement?) | Eigen Proprietary | High – may increase accuracy/speed | 2023 R&D item |
Pre-processing | NLP: De-blobbing | Splitting up documents that may have been scanned into a single PDF | Eigen Proprietary | High – Improved classification will result in a knock-on effect on de-blobbing | 2023 R&D item |
Pre-processing | MV: Check-boxes | Reading check-boxes (usually hand-written) into machine-readable data | Commercial OEM Partner | None | Continuous upgrade as new commercial or open-source check-box detection components improve over time |
Pre-processing | MV: Tables Detection & Reconstruction | Detecting and transforming table images (e.g., in scanned PDF) or PDF tables (e.g., messy XML) into clean machine consistent HTML or JSON table data structures | Eigen Proprietary Table Foundational Model | None | Continuous upgrades (including table normalisation) on performance and adding new sources of training data to the foundational model |
Post-processing | Platform: Parent/Child Mapping (post classification and discovery) | Linking documents into groups of related families (e.g., master contract doc with amendment docs and schedule docs | Rules-based approach (post extraction) to link docs | None | Continuous improvements to performance |
Single Document Intelligence | NLP: Point Extraction (Regulated) | Extracting specific data points (a value, an entity, a short phrase, a date) from a document | Eigen Proprietary (including Eigen domain specific topic and language models) | None – the approach today serves a very specific data compliance and model governance purpose | Continuous upgrades on performance and feature engineering |
Single Document Intelligence | NLP: Section Extraction (Regulated) | Extracting longer phrases, clauses, sections | Eigen Proprietary (including Eigen domain specific topic and language models) | None – the approach today serves a very specific data compliance and model governance purpose | Continuous upgrades on performance and feature engineering |
Single Document Intelligence | NLP: Instant Answers or Question-Answering Point/Section Extraction | Ability to ask questions and get an answer back (‘single shot extraction’ or ‘question answering’, sometimes called ‘chatting with your docs’) | Eigen Proprietary Domain Specific Information Retrieval (IR) with Eigen Modified Domain Specific BERT (an LLM) | High – GPT-X may perform significantly better than BERT (or other LLMs) on many documents | By Mid-April clients will have the ability to switch between GPT-X and Eigen Domain Specific BERT (an LLM) |
Single Document Intelligence | Multi-modal: Table Extraction | Extracting a specific table in a document and outputting that into an easily digestible format (CSV/XLS/JSON/HTML) | Eigen Proprietary Table Foundational Model in conjunction with an Eigen Proprietary table extraction model | Potential to improve performance of extraction based on text inside the table (currently using a non-LLM approach); potential multi-modal GPT release | Improvements in model governance and user workflows |
Single Document Intelligence | Multi-modal: Cell-level tables extraction | Extracting specific cells out of specific tables even when tables can be heterogenous across documents (e.g., ‘extract the Group EBTIDA for 2022 across all annual reports) | Eigen Proprietary Table Foundational Model in conjunction with an Eigen Proprietary table extraction model | Potential to improve performance of extraction based on text inside the table (currently using a non-LLM approach); potential multi-modal GPT release | Improvements in model governance and user workflows |
Single Document Intelligence | Multi-modal: pure multi-modal point extraction | Extracting information from highly visual heterogenous documents (e.g., PowerPoint, complex invoices, certificates, etc…) | Eigen Proprietary incorporating both Eigen’s MV and NLP capabilities | High – similar to instant answers, LLMs/GPT can be used to improve results as a base language model; potential multi-modal GPT release | Transition from an API-only service to a no-code service |
Single Document Intelligence | Correlated Answers | Correlating different answers to different variables (e.g., interest rate for a loan may differ for different facilities, need to correlate the exact interest rate for each facility) | Eigen Platform’s rules-based approach via API plug-ins | Medium – GPT’s natural language understanding can potentially speed this up significantly and enable the extraction to move away from rule-based correlations | 2023 R&D item |
Post-processing | No-Code Business and Legal Logic Builder | Interpreting extracted results based on business rules set by user | Eigen Platform | None | Improvements in user workflow |
Post-processing | Data transformation and advanced logic | Leveraging either commercial no-code partners or python plug-in scripts to undertake data transformations or more advanced business logic tasks | Eigen Platform (python plug-ins) or commercial partner (e.g., Unqork, Xceptor, Microsoft) | High – Potential to automate python scripts using Chat-GPT or Co-Pilot | Eigen’s own solutions team is experimenting with this today for plug-in writing |
Post-processing | NLP: Clause and Section Comparison | Ability to compare clauses or sections against a ‘gold standard clause’ or across each other to ascertain risk, deviations, etc.. | Eigen Proprietary | High – potential for LLMs to improve comparisons in a more organic way | Q1 Release |
Post-processing | Platform: Parent/Child Roll-up Logic | Understanding linkage logic (e.g., if an amendment supersedes the original master contract) | Rules-based approach to determine roll-up logic | Potential for LLM to understand roll-up linkage logic innately without rules (likely not possible with GPT-4 but potentially future LLMs) | Future R&D Item |
Post-processing | Human-in-the-loop review | Workflow enabling humans to review machine results | Eigen Platform | None | Improvements in user workflow |
Ops: Machine Learning (ML) Ops | Model Governance | Providing automated cross-validation services pre-production; managing model drift in production; managing high/low confidence scoring to determine exception handling; reporting for model risk management | Eigen Platform with Eigen open-sourced cross-validation techniques | None | Improvements in user workflow |
Ops: ML Ops | Annotator | UX and backend system to enable human users to easily train ML models | Eigen Platform | None | Improvements in user workflow |
Ops: Workflow | Workflow Orchestrator | Platform to orchestrate tasks and document flow | Eigen Integration Pipeline (EIP), Unqork, Xceptor, or Microsoft | None | Addition of more no-code components |
Ops: Workflow | Document and Answer Server | UX and API endpoints to serve documents (especially large documents) and answers (in a variety of ways) | Eigen Platform | None | Better API endpoints for partners |
Solutions | Pre-build Eigen or Partner Industry Solutions on Eigen Platform | - | - | LLMs/GPT can improve solutions insofar as each task module improves performance | - |
Table 1: Opportunities for improvement within IDP for LLMs/GPT-X
What does this this mean for key financial services-specific IDP use cases?
We know from experience that customer requirements, and the documents they need to process, are often highly complex and domain specific. In table 2 below, we take a look at some key banking and financial services use cases where LLMs could be applied for document to data transformation in financial services use cases.
Document Type | Typical Document Format | Eigen Point + Section Extraction* | BERT (alone)* | Eigen w/ BERT (with Eigen Instant Answers)* | GPT-4 (alone)* | Eigen w/ ChatGPT (with Eigen Instant Answers)* |
ISDA + Schedule | 40,000 words Includes semi-heterogeneous tables | 90%+ F1 (PE+SE) | N/A - Too long | 90%+ F1 (PE+SE) | N/A - Too long | 60%+ F1 (PE) 90%+ F1 (SE) ** |
LSTA Loan Agreement | 100,000 words Includes mostly heterogenous tables | 90%+ F1 (PE+SE) | N/A - Too long | 90%+ F1 (PE+SE) | N/A - Too long | 60%+ F1 (PE) 90%+ F1 (SE) ** |
Collateralized Loan Obligation (CLO) | 500,000 words Includes semi-heterogeneous tables across docs | 90%+ F1 (PE+SE) | N/A - Too long | 90%+ F1 (PE+SE) | N/A - Too long | 60%+ F1 (PE) 90%+ F1 (SE) ** |
Brokerage Account Statement | 1000 words Includes completely heterogenous tables across docs | N/A – needs machine vision (e.g., Eigen’s tables) | N/A – needs machine vision (e.g., Eigen’s tables) | N/A – needs machine vision (e.g., Eigen’s tables) | N/A – needs machine vision (e.g., Eigen’s tables) | N/A – needs machine vision (e.g., Eigen’s tables) |
Money Transfer Email | 50 words Pure String | 80%+ F1 (PE+SE) | 90%+ F1 (PE+SE) | N/A – information retrieval (IR) not needed | 99%+ F1 (PE+SE) | N/A – IR not needed |
Bloomberg Chat | 25 words Pure String | N/A – not suitable | 80%+ F1 (PE+SE) | N/A – IR not needed | 99%+ F1 (PE+SE) | N/A – IR not needed |
Table 2: Financial services specific use cases where LLMs/GPT-X could be applied.
* F1 score is an evaluation metric that measures a model’s accuracy. It combines the precision and recall scores of a model.
** Eigen with ChatGPT excels at replicating results for section extraction, but required significant post-processing to replicate point extraction due to the generative and non-deterministic nature of its outputs, giving the current Eigen point extraction model still a distinct advantage, as such the exact GTI F1 score leveraging ChatGPT was much lower for point extraction type tasks vs section extraction type tasks; over the coming weeks, we plan to improve Eigen GPT point extraction Q&A and re-benchmark here.
In conclusion and as expected, ChatGPT specifically is much better suited than previous information extraction techniques for ‘regular human language’ like chat or email. However, there is architectural work required to get any LLM (including BERT or GPT-X) to work for longer, more complex, and domain specific document types. Specifically, the information retrieval or section extraction layer must be accurate, which benefits significantly from domain specific fine-tuning.
From an Eigen perspective, we have always embraced the power of LLMs since BERT came out in 2019, and we are excited to offer clients a GPT-X option (as an alternative to our existing domain specific BERT offering) in the coming weeks. I include the example of brokerage account statements, which are heavily heterogenous tabular to serve as a reminder that IDP is a heavily multi-modal and domain specific affair.
Looking at table 2 above, regarding longer, more complex, finance specific documents, from a pure accuracy perspective, as long as there is a domain specific fine-tuning with information retrieval (IR) (such as Eigen’s IR model to pre-process for BERT or GPT-X), accuracy rates with Eigen’s traditional probabilistic graphical models driven point extraction is in line with Instant Answers (Question Answering) that leverage either BERT or GPT-4. As such, deciding which technology to use will be dependent upon:
- Cost (traditional Eigen point extraction and BERT are orders of magnitude cheaper than GPT-4, but GPT-4 is much more accurate for shorter, more ‘natural human’ text where cost is less of an issue)
- Model Governance (LLMs make model governance more complex and therefore significantly more costly)
- IP/Privacy (a potentially huge issue with GPT which needs to be mitigated).
- Model Output Parameters (BERT is great for data extraction due to its bi-directional nature; GPT-X requires significant pre and post processing due to its generative nature, these issues around GPT need to be better addressed if used in larger scale production).
In the next blog in this series, we will look in detail at the real-world technical challenges and key risks of using LLMs for IDP.
Get in touch to find out more about Eigen's intelligent document processing capabilities or request a platform demo.
-
World Economic forum 2020
-
Gartner Cool Vendor 2020
-
AI 100 2021
-
Lazard T100
-
FT Intelligent Business 2019
-
FT Intelligent Business 2020
-
CogX Awards 2019
-
CogX Awards 2021
-
Ai BreakThrough Award 2022
-
CogX Awards Best AI Product in Insurance
-
FStech 2023 awards shortlisted
-
ISO27001
-
ISO22301
-
ISO27701
-
ISO27017
-
ISO27018