Blog

Large Language Models (Part 2 of 3): How can Intelligent Document Processing Leverage the ChatGPT Revolution?

Blog Part 2: Tactical and Strategic Usages of GPT-X in the IDP AI Tech Stack

By Dr Lewis Z Liu – Co-founder and CEO at Eigen Technologies

In the second of this three-part blog series, we look in more detail at the opportunities large language models (LLMs) present for intelligent document processing (IDP), including the stages in the end-to-end process they can support and some specific use cases within financial services.

If you missed part one, you can view it here.

Opportunities with LLM/GPT-X in end-to-end IDP

Diagram 1: Intelligent document processing components


In the table below, we list the various stages and components of IDP (as shown in diagram 1 above) and how LLMs and GPT-X can potentially improve each one. Of the 24 components, we identified six where LLMs/GPT-X have a high potential of improving results, five where there’s potential for them to deliver some improvements and 13 where there is no potential or expected improvement from the addition of LLMs/GPT-X.

Stage

Task/IDP Component

Description

Today

GPT/LLM Potential

2023 Eigen Roadmap

Pre-processing

Machine Vision (MV): OCR (Typed)

Transforming scanned documents into machine-readable format (e.g., JPEG to searchable PDF)

Multiple options: Commercial OEM (ABBYY, Kofax), Open Source (Tesseract)

None

Continuous upgrade as new commercial or open-source OCR components improve over time

Pre-processing

MV: OCR (Handwriting)

Transforming handwriting images into searchable text

2023 Roadmap Item

None

2023 Roadmap Item via commercial OEM partner

Pre-processing

Natural Language Processing (NLP): Classification

Classifying documents into categories (e.g., is this a mortgage application vs a passport vs a bank statement?)

Eigen Proprietary

High – may increase accuracy/speed

2023 R&D item

Pre-processing

NLP: De-blobbing

Splitting up documents that may have been scanned into a single PDF

Eigen Proprietary

High – Improved classification will result in a knock-on effect on de-blobbing

2023 R&D item

Pre-processing

MV: Check-boxes

Reading check-boxes (usually hand-written) into machine-readable data

Commercial OEM Partner

None

Continuous upgrade as new commercial or open-source check-box detection components improve over time

Pre-processing

MV: Tables Detection & Reconstruction

Detecting and transforming table images (e.g., in scanned PDF) or PDF tables (e.g., messy XML) into clean machine consistent HTML or JSON table data structures

Eigen Proprietary Table Foundational Model

None

Continuous upgrades (including table normalisation) on performance and adding new sources of training data to the foundational model

Post-processing

Platform: Parent/Child Mapping (post classification and discovery)

Linking documents into groups of related families (e.g., master contract doc with amendment docs and schedule docs

Rules-based approach (post extraction) to link docs

None

Continuous improvements to performance

Single Document Intelligence

NLP: Point Extraction (Regulated)

Extracting specific data points (a value, an entity, a short phrase, a date) from a document

Eigen Proprietary (including Eigen domain specific topic and language models)

None – the approach today serves a very specific data compliance and model governance purpose

Continuous upgrades on performance and feature engineering

Single Document Intelligence

NLP: Section Extraction (Regulated)

Extracting longer phrases, clauses, sections

Eigen Proprietary (including Eigen domain specific topic and language models)

None – the approach today serves a very specific data compliance and model governance purpose

Continuous upgrades on performance and feature engineering

Single Document Intelligence

NLP: Instant Answers or Question-Answering Point/Section Extraction

Ability to ask questions and get an answer back (‘single shot extraction’ or ‘question answering’, sometimes called ‘chatting with your docs’)

Eigen Proprietary Domain Specific Information Retrieval (IR) with Eigen Modified Domain Specific BERT (an LLM)

High – GPT-X may perform significantly better than BERT (or other LLMs) on many documents

By Mid-April clients will have the ability to switch between GPT-X and Eigen Domain Specific BERT (an LLM)

Single Document Intelligence

Multi-modal: Table Extraction

Extracting a specific table in a document and outputting that into an easily digestible format (CSV/XLS/JSON/HTML)

Eigen Proprietary Table Foundational Model in conjunction with an Eigen Proprietary table extraction model

Potential to improve performance of extraction based on text inside the table (currently using a non-LLM approach); potential multi-modal GPT release

Improvements in model governance and user workflows

Single Document Intelligence

Multi-modal: Cell-level tables extraction

Extracting specific cells out of specific tables even when tables can be heterogenous across documents (e.g., ‘extract the Group EBTIDA for 2022 across all annual reports)

Eigen Proprietary Table Foundational Model in conjunction with an Eigen Proprietary table extraction model

Potential to improve performance of extraction based on text inside the table (currently using a non-LLM approach); potential multi-modal GPT release

Improvements in model governance and user workflows

Single Document Intelligence

Multi-modal: pure multi-modal point extraction

Extracting information from highly visual heterogenous documents (e.g., PowerPoint, complex invoices, certificates, etc…)

Eigen Proprietary incorporating both Eigen’s MV and NLP capabilities

High – similar to instant answers, LLMs/GPT can be used to improve results as a base language model; potential multi-modal GPT release

Transition from an API-only service to a no-code service

Single Document Intelligence

Correlated Answers

Correlating different answers to different variables (e.g., interest rate for a loan may differ for different facilities, need to correlate the exact interest rate for each facility)

Eigen Platform’s rules-based approach via API plug-ins

Medium – GPT’s natural language understanding can potentially speed this up significantly and enable the extraction to move away from rule-based correlations

2023 R&D item

Post-processing

No-Code Business and Legal Logic Builder

Interpreting extracted results based on business rules set by user

Eigen Platform

None

Improvements in user workflow

Post-processing

Data transformation and advanced logic

Leveraging either commercial no-code partners or python plug-in scripts to undertake data transformations or more advanced business logic tasks

Eigen Platform (python plug-ins) or commercial partner (e.g., Unqork, Xceptor, Microsoft)

High – Potential to automate python scripts using Chat-GPT or Co-Pilot

Eigen’s own solutions team is experimenting with this today for plug-in writing

Post-processing

NLP: Clause and Section Comparison

Ability to compare clauses or sections against a ‘gold standard clause’ or across each other to ascertain risk, deviations, etc..

Eigen Proprietary

High – potential for LLMs to improve comparisons in a more organic way

Q1 Release

Post-processing

Platform: Parent/Child Roll-up Logic

Understanding linkage logic (e.g., if an amendment supersedes the original master contract)

Rules-based approach to determine roll-up logic

Potential for LLM to understand roll-up linkage logic innately without rules (likely not possible with GPT-4 but potentially future LLMs)

Future R&D Item

Post-processing

Human-in-the-loop review

Workflow enabling humans to review machine results

Eigen Platform

None

Improvements in user workflow

Ops: Machine Learning (ML) Ops

Model Governance

Providing automated cross-validation services pre-production; managing model drift in production; managing high/low confidence scoring to determine exception handling; reporting for model risk management

Eigen Platform with Eigen open-sourced cross-validation techniques

None

Improvements in user workflow

Ops: ML Ops

Annotator

UX and backend system to enable human users to easily train ML models

Eigen Platform

None

Improvements in user workflow

Ops: Workflow

Workflow Orchestrator

Platform to orchestrate tasks and document flow

Eigen Integration Pipeline (EIP), Unqork, Xceptor, or Microsoft

None

Addition of more no-code components

Ops: Workflow

Document and Answer Server

UX and API endpoints to serve documents (especially large documents) and answers (in a variety of ways)

Eigen Platform

None

Better API endpoints for partners

Solutions

Pre-build Eigen or Partner Industry Solutions on Eigen Platform

-

-

LLMs/GPT can improve solutions insofar as each task module improves performance

-

Table 1: Opportunities for improvement within IDP for LLMs/GPT-X

What does this this mean for key financial services-specific IDP use cases?

We know from experience that customer requirements, and the documents they need to process, are often highly complex and domain specific. In table 2 below, we take a look at some key banking and financial services use cases where LLMs could be applied for document to data transformation in financial services use cases.

Document Type

Typical Document Format

Eigen Point + Section Extraction*

BERT (alone)*

Eigen w/ BERT (with Eigen Instant Answers)*

GPT-4 (alone)*

Eigen w/ ChatGPT (with Eigen Instant Answers)*

ISDA + Schedule

40,000 words

Includes semi-heterogeneous tables

90%+ F1 (PE+SE)

N/A - Too long

90%+ F1 (PE+SE)

N/A - Too long

60%+ F1 (PE)

90%+ F1 (SE)

**

LSTA Loan Agreement

100,000 words

Includes mostly heterogenous tables

90%+ F1 (PE+SE)

N/A - Too long

90%+ F1 (PE+SE)

N/A - Too long

60%+ F1 (PE)

90%+ F1 (SE)

**

Collateralized Loan Obligation (CLO)

500,000 words

Includes semi-heterogeneous tables across docs

90%+ F1 (PE+SE)

N/A - Too long

90%+ F1 (PE+SE)

N/A - Too long

60%+ F1 (PE)

90%+ F1 (SE)

**

Brokerage Account Statement

1000 words

Includes completely heterogenous tables across docs

N/A – needs machine vision (e.g., Eigen’s tables)

N/A – needs machine vision (e.g., Eigen’s tables)

N/A – needs machine vision (e.g., Eigen’s tables)

N/A – needs machine vision (e.g., Eigen’s tables)

N/A – needs machine vision (e.g., Eigen’s tables)

Money Transfer Email

50 words

Pure String

80%+ F1 (PE+SE)

90%+ F1 (PE+SE)

N/A – information retrieval (IR) not needed

99%+ F1

(PE+SE)

N/A – IR not needed

Bloomberg Chat

25 words

Pure String

N/A – not suitable

80%+ F1 (PE+SE)

N/A – IR not needed

99%+ F1

(PE+SE)

N/A – IR not needed

Table 2: Financial services specific use cases where LLMs/GPT-X could be applied.

* F1 score is an evaluation metric that measures a model’s accuracy. It combines the precision and recall scores of a model.

** Eigen with ChatGPT excels at replicating results for section extraction, but required significant post-processing to replicate point extraction due to the generative and non-deterministic nature of its outputs, giving the current Eigen point extraction model still a distinct advantage, as such the exact GTI F1 score leveraging ChatGPT was much lower for point extraction type tasks vs section extraction type tasks; over the coming weeks, we plan to improve Eigen GPT point extraction Q&A and re-benchmark here.

In conclusion and as expected, ChatGPT specifically is much better suited than previous information extraction techniques for ‘regular human language’ like chat or email. However, there is architectural work required to get any LLM (including BERT or GPT-X) to work for longer, more complex, and domain specific document types. Specifically, the information retrieval or section extraction layer must be accurate, which benefits significantly from domain specific fine-tuning.

From an Eigen perspective, we have always embraced the power of LLMs since BERT came out in 2019, and we are excited to offer clients a GPT-X option (as an alternative to our existing domain specific BERT offering) in the coming weeks. I include the example of brokerage account statements, which are heavily heterogenous tabular to serve as a reminder that IDP is a heavily multi-modal and domain specific affair.

Looking at table 2 above, regarding longer, more complex, finance specific documents, from a pure accuracy perspective, as long as there is a domain specific fine-tuning with information retrieval (IR) (such as Eigen’s IR model to pre-process for BERT or GPT-X), accuracy rates with Eigen’s traditional probabilistic graphical models driven point extraction is in line with Instant Answers (Question Answering) that leverage either BERT or GPT-4. As such, deciding which technology to use will be dependent upon:

  1. Cost (traditional Eigen point extraction and BERT are orders of magnitude cheaper than GPT-4, but GPT-4 is much more accurate for shorter, more ‘natural human’ text where cost is less of an issue)
  2. Model Governance (LLMs make model governance more complex and therefore significantly more costly)
  3. IP/Privacy (a potentially huge issue with GPT which needs to be mitigated).
  4. Model Output Parameters (BERT is great for data extraction due to its bi-directional nature; GPT-X requires significant pre and post processing due to its generative nature, these issues around GPT need to be better addressed if used in larger scale production).

In the next blog in this series, we will look in detail at the real-world technical challenges and key risks of using LLMs for IDP.

Get in touch to find out more about Eigen's intelligent document processing capabilities or request a platform demo.