2019
DOI: 10.3390/molecules24010179
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment

Abstract: Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed envir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
1
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 41 publications
0
1
0
Order By: Relevance
“…For instance, Stanley Curtis carried out an experiment related to pig memory. He placed a ball, a frisbee, and a sinker in front of several pigs, and was able to teach them to jump, sit next to, or retrieve any of these items [22][23][24]. Based on the performed experiments, he claimed that pigs are wiser than dogs.…”
Section: Introduction To Pig Herd Optimizationmentioning
confidence: 99%
“…For instance, Stanley Curtis carried out an experiment related to pig memory. He placed a ball, a frisbee, and a sinker in front of several pigs, and was able to teach them to jump, sit next to, or retrieve any of these items [22][23][24]. Based on the performed experiments, he claimed that pigs are wiser than dogs.…”
Section: Introduction To Pig Herd Optimizationmentioning
confidence: 99%
“…Astechnologyadvancemententerprisedatawarehouseisnotsuitablefordatastorageforcurrent marketdemand.Enterprisedatawarehouseworksontheconceptofschema-on-writearchitecture,to getdataindatawarehouseanextraction,transformation,andloading(ETL)processisrequired (Cha, Park,Kim,Pan,&Shin,2018;Khine&Wang,2018).Withthisarchitecture,organizationdesigna datamodelandprepareananalyticplanbeforeloadingdata.Inotherwords,organizationmustknow instarting,beforeloadingdata,howtheyareplanningtousethatdata,andthisisverylimiting.Big dataanalyticswantdatastoragewhoworksonschema-on-readconceptinwhichdataisstoredin rawformatasdatageneratedorinotherwords,thereisnoneedtoprepareananalyticplanbefore loadingdata,andnoneedtoknowaheadoftimehowtheyplantousethatdata. EnterprisedatawarehousestoredatathathasbeenmodeledorstructuredbutBigDataanalyticsin themarketneedstoragewhostorerawdataandstoreallkindofdatasuchasstructured,unstructured, semi-structuredandquasi-structureddataatoneplace.Tofulfillmarketdemandresearchers,work onnewdatarepositorysystemforBigDatastorageknownasdatalake.Theideaofdatalakeis to enhance enterprise data warehouse environment (Mrozek, Dabek, & Małysiak-Mrozek, 2019;Nogueira,Romdhane,&Darmont,2018).Thedatalakeisdefinedasadatalandingareafortheraw datafrommanyandalwaysincreasesnumberofdatasourceinorganization.Datafromdatalakecan betransformedanddistributedtothedownstreamsystemastheyrequired.Nowit'sclearthatdata lakesupportsBigDatainitiatesanddatalakeapproachcanreducedatasilos (Sawadogo,Scholly, Favre,&Ferey,2019;Shepherd,Kesa,Cooper,Onema,&Kovacs,2018;Singh,2019).Thedatalake istherequirementoftheindustryfordatastoragebuttherearesomeconfusionandquestionwhich mustbeansweredaboutdatalake.Forexample,howtodesign&deploydatalake?Howtogovern andsecuredatalake?Whatkindofdatathatcanbemanagedindatalake?Whydoorganizations needdatalake?Theobjectiveofthissurveypaperistoreducetheconfusionandaddressingtheabove mentionquestionwiththehelpofdatalakearchitecture.…”
Section: Introductionmentioning
confidence: 99%