As oil and gas companies undergo their digital transformations, they typically first focus their efforts on analytic and machine learning solutions. They expect immediate advancements in automated operations and detecting predictive failures. The solutions generally rely on small-scale proof of concept exercises to demonstrate their worth. Far too often, this approach relies on manually collected datasets. Ironically, these consume the majority of the project's time and resources; consequently, they fall short of their promise to yield significant financial returns.
Organizations must centralize available information into a corporate data lake to enable data scientists to access all available data. New challenges also arise from information governance and data management because this data originates from different business units with their own goals and concerns. Rather than focusing on the analogy of a data lake as a storage methodology for information, it is useful to view a data lake model as a manufacturing facility that produces analytical insights and enhanced capabilities.
Just as a manufacturing facility is organized around specific processes to deliver finished goods, a data lake should provide all capabilities necessary to transform raw data into valuable assets for oil and gas organizations. The data lake must therefore feature several analogous capabilities and properties. These include a receiving dock, quality assurance/quality control (QA/QC) stations, warehousing, and tooling and engineering, as well as flexible, lean assembly lines to build new products and shipping capacity to deliver the finished goods to customers. By applying successful manufacturing techniques to the data lake design, oil and gas companies can efficiently develop and maintain assembly lines for manufacturing analytical insights.
This paper explains the similarities between delivering analytics and manufacturing processes. It also describes the data lake functionality. Each part of the process provides a critical component for generating analytical results and can be managed like its manufacturing counterpart to deliver lean processes that enable more efficient data science results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.