Blog

The costly mistakes companies make when dabbling in do-it-yourself data extraction

Date 29 Jun 2023

What are the hidden dangers of dabbling in data extraction automation? To find out, we spoke to one of our leading data extraction experts to discover the key mistakes we hear companies make, and how to avoid them.

Doing intelligent document processing (IDP) well can be incredibly complex with a number of technologies and processes needing to work in unison. And each company invariably has its own unique set of workflow challenges, legacy systems and other priorities to tackle. It’s therefore not unsurprising to see costly mistakes made when companies try to develop and operationalize data extraction capabilities in-house.

Tim Crowe, Director of Insurance Solutions at Eigen Technologies is someone who understands the complexity more than most.

“I’ve had the unique experience of being on both the vendor and insurance customer sides, having been an SVP of Underwriting before joining Eigen. At Eigen, we’ve been helping global banks, financial services and insurance firms to successfully implement IDP solutions for years. As a result, we’ve learned a few lessons along the way about how to get it right,” Tim says.

We spoke to Tim about the common errors to avoid when building and implementing data extraction solutions.

Don’t underestimate the complexity of data extraction

Data extraction and automation initiatives surrounding those efforts require profound expertise in differing areas; natural language processing, machine learning, and computer vision to name a few. There are multiple technologies and processes that work in harmony to automate the end-to-end journey from document input to a structured, usable data output, as shown in the diagram below.

Diagram 1: Intelligent document processing components

“The complexity of human language presents a fascinating challenge, and this is something all language models are still grappling with to varying degrees of success.” Tim says.

Insurance policies are often long, and convoluted documents filled with exclusions, indented phrases, and coverage write-backs, and can be particularly daunting even for insurance experts to comprehend, let alone a machine.

Most language models find the challenge of ‘normal, everyday’ language difficult to process, “the insurance industry poses a unique challenge as it has its own vocabulary, filled with bespoke wordings and unfamiliar verbiage."

But the insurance industry isn't alone when it comes to specific terminology and jargon, this is also true in other industries such as finance and legal services.

Solving this problem requires specialist knowledge of the nuances and contextual meaning of the language to ensure your data extraction systems are accurately interpreting and extracting the right information. Solution development and implementation teams need to understand the document inputs and the data outputs to effectively build and deliver a solution that works as required and delivers value.

“Corporately as a vendor, achieving the right balance between being large enough to solve complex problems while remaining agile enough to innovate requires constant attention. There are just very few companies that have the capacity to successfully tackle these document ingestion challenges. The ones that are not solely focused on this challenge are unlikely to be market leaders in the space.”

“What I’ve seen is that extracting data from multiple complex document types is not something you can successfully ‘dabble’ with. Having dedicated experts who understand the nuances of the task at hand is vital,” Tim says.

Don’t overlook the cost benefits of outsourcing vs building in-house

Not many companies have the luxury of the right mix of people with the necessary skills and subject matter expertise, along with sufficient time on their hands, to dedicate to a project of this nature.

“One of the biggest issues I saw coming from the world of insurance was that the people building or buying the technology were not the same people who were going to be using it. Often so-called ‘solutions’ ended up causing more issues than they solved. With a platform like Eigen, the actual business users can easily build models that impact their day-to-day positively,” Tim says.

"In the case of AI, it’s not simply a case of calling the latest LLM models and trusting the output implicitly. In every task you need to ask yourself what the objective is, what costs are acceptable, and what are the data privacy and provenance requirements. The proper utilization of models and approaches is paramount and that only comes with expertise in data science, the market, and from years of dealing with and learning from the issues surrounding data extraction.”

For an insurance entity looking to build out their automation capabilities, it’s imperative to work with vendors who know the space and can deliver as promised. Equally critical is having an in-house team of experts who can bridge the technical and business operational worlds to provide input and feedback. This will shape any solution into something that can work as intended and add value in production.

What adds to the difficulty is technical and domain expertise are expensive to acquire and take a long time to build up. Nevertheless, both are critical to the success of your implementation effort.

Having all the key stakeholders and subject matter experts involved in the planning process is vital, “there are companies that ran into problems trying to DIY a solution previously because they didn’t have quite the right skillsets and inputs, and maybe misunderstood the scope of the challenge,” Tim says.

Don’t settle for long-term projects with no end in sight

This is closely linked to the point above as the less dedicated experts’ time you have the longer these projects tend to drag on. Time to value is vital. Getting to the right solution fast, and ideally the first time, is the goal. Without the right expertise and focus, successful implementation can take months if not years.

“We have a team of in-house experts focused on quick time-to-value for our customers,” Tim says. “We can deliver production-quality models in a very short amount of time in contrast to most other vendors.”

The other thing to consider is the flexibility and scalability of your solution. While the scope of your initial implementation needs to be narrow enough that the team can deliver something workable relatively quickly to create value, it needs to include core capabilities that enable it to scale across document types, use cases and workflows. While your models may work for one business line, they may not work as intended for others, making the ease of iteration key in the selection of any data extraction vendor.

You also need to anticipate that business and data needs will change, and your solution and models will need to be adjustable to meet any new requirements.

Don’t get left behind. AI is constantly developing

“ Trying to keep up with new AI/ML developments, in general, is challenging for anyone, expert and non-expert alike” Tim says.

Most companies building their first intelligent document processing solution are unlikely to have a well-tested model risk management framework in place to ensure data accuracy and integrity. Nor will they benefit from having previous projects and challenges to learn from. It takes years to amass this experience and requires huge amounts of data to perfect.

“As a bleeding-edge tech company, we’re always striving to stay on top of the latest developments in natural language processing including transformer architectures such as GPT-4” Tim says. “Guiding customers on when such approaches are appropriate is one of our key value propositions. With each customer engagement and project delivered, we learn, iterate and perfect our solutions.”

Eigen’s AI-powered, intelligent document processing capabilities provide insurance, financial services, and enterprise customers with the ability to quickly and accurately extract, classify and interpret virtually any information from any document to make smarter business decisions, eliminate manual processing, and optimize the flow and use of data.

Find out about Underwriter Assistant, our solution purpose-built to power operational efficiencies and improve outcomes for underwriters.