Information triage
How do you extract data without telling the software where to look for it? Is it even possible to set a machine to accurately identify data by itself from a variety of forms and other documents containing text? Bricolage was engaged by the Laboratory for Analytical Sciences to find the answer
The Laboratory for Analytical Sciences (LAS) at North Carolina State University (NCSU) needed an assessment of the capabilities of current technology available for extracting data without a pre-determined guide to where the data lies in a structured form (a process known as template free data extraction). LAS was seeking to automate the process of identifying and extracting data by building a system that would automatically recognise forms from among hundreds of other documents, and then be able to read the content and identify and categorise  the data that should be extracted without external assistance.
Bricolage led on the management and delivery of the project. We evaluated and tested technologies for the extraction of data contained in forms without the use of templates. We designed the assessment, specifying the methods to be used and defining the metrics to be measured and carried out assessment of these technologies using our own scoring system. From the outcome of this work we were able to develop ideas and specifications for an end-to-end automated data extraction system for LAS.
The concept of automated data extraction from structured forms was proven to be viable and specifications and functions for an end-to-end system were developed. Future work to carry out the next stage of the project is pending.