INX Programme – Data Extraction from Tables
Over a series of projects Bricolage has worked with Elemendar Ltd and other partners to develop and test a series of software tools for extracting text and technical data from tables contained in complex formats such as PDF for a defence and security client.
As well as helping develop the ideas and functional specifications for an end-to-end automated extraction system, Bricolage conducted background research on the latest methods and tools for data extraction from tables and designed the trial scenarios. We then carried out the tests, using the testing paramaters, measures, processes and systems we had defined to assess and analyse outputs from the software tools.
Our work contributed to the successful development of a prototype tool to extract data from technical tables within PDF files with proved performance. We created a standardised table extraction scoring and analysis system to measure the level of performance in both quantity and quality of extraction.
The latest project within the INX programme has built upon the successful performance of the prototype tool by investigating its ability to take the extracted data and map it to an advanced information model (High Quality Data Model – HQDM). Our work has included the design and assessment and testing including specifying methods and measures for reliability of ontology mappings.