Speaker Dr. Eric Schoen
Title: Using AI to Extract and Structure Data from Documents Abstract: The critical strategic and tactical decisions that E&P companies take are based upon written interpretations of heterogeneous quantitative data, from proprietary internal and published external reports and slide presentations. Similarly, much non-technical content upon which E&P companies depend, such …
Title: Using AI to Extract and Structure Data from Documents
Abstract:
The critical strategic and tactical decisions that E&P companies take are based upon written interpretations of heterogeneous quantitative data, from proprietary internal and published external reports and slide presentations. Similarly, much non-technical content upon which E&P companies depend, such as contracts, lease agreements and regulatory filings, is naturally unstructured. In both cases, these documents have immediate value upon initial use; however, they quickly fade into the clutter of a company’s document management practices, and their information, knowledge, and learnings—the implicit value that the documents embody—are lost.
While the industry has used platforms that manage, search and analyze structured data for many years, platforms that manage text documents are still maturing, and largely provide capabilities centered upon indexing, versioning, searching, annotating (i.e., with metadata), and enforcing document retention and disposal policies. A key missing feature in most content management systems is the ability to enrich text by “reverse engineering” latent information from it.
We describe an AI Platform, an artificial intelligence application that uses natural language analysis, computer vision, and domain knowledge to structure and classify text, recognize entities, and extract their properties. The platform, which implements the SPE’s Research Portal, adopts a knowledge-centric approach to automated text enrichment, augmented by input from subject matter experts and from statistical machine learning. In recent work, we have extended the platform to detect document structure and to recognize, interpret and extract data points from text and tables. Our long-term goal is to be able to answer questions by referencing the knowledge that we can find and extrapolate from a large corpus of documents.
Bio:
Eric Schoen is the Chief Technical Officer of i2k Connect Inc. Before joining i2k Connect, Eric spent over thirty years at Schlumberger, in both research and engineering functions, most recently as its Chief Software Architect. At Schlumberger, he contributed to a broad range of software, from the company’s early pioneering efforts to leverage knowledge-based systems, its GeoFrame and Ocean platforms for reservoir characterization, its software quality processes, and its strategies for enterprise-scale architecture for data acquisition, transmission, processing, and delivery. Eric holds a Ph.D. in Computer Science (Artificial Intelligence) from Stanford University.
Full Description