Murat M. Tanik scite author profile

A sound epistemological foundation for biological inquiry comes, in part, from application of valid statistical procedures. This tenet is widely appreciated by scientists studying the new realm of highdimensional biology, or 'omic' research, which involves multiplicity at unprecedented scales. Many papers aimed at the high-dimensional biology community describe the development or application of statistical techniques. The validity of many of these is questionable, and a shared understanding about the epistemological foundations of the statistical methods themselves seems to be lacking. Here we offer a framework in which the epistemological foundation of proposed statistical methods can be evaluated. The challenge we faceHigh-dimensional biology (HDB) encompasses the 'omic' technologies 1 and can involve thousands of genetic polymorphisms, sequences, expression levels, protein measurements or combination thereof. How do we derive knowledge about the validity of statistical methods for HDB? A shared understanding regarding this second-order epistemological question seems to be lacking in the HDB community. Although our comments are applicable to HDB overall, we emphasize microarrays, where the need is acute. "The field of expression data analysis is particularly active with novel analysis strategies and tools being published weekly" (ref. 2; Fig. 1), and the value of many of these methods is questionable 3 . Some results produced by using these methods are so anomalous that a breed of 'forensic' statisticians 4,5 , who doggedly detect and correct other HDB investigators' prominent mistakes, has been created.Here we offer a 'meta-methodology' and framework in which to evaluate epistemological foundations of proposed statistical methods. On the basis of this framework, we consider that many statistical methods offered to the HDB community do not have an adequate epistemological foundation. We hope the framework will help methodologists to develop robust methods and help applied investigators to evaluate whether statistical methods are valid.

show abstract

Transformation of higher education: the transdisciplinary approach in engineering

Ertaş

Maxwell

Rainey

et al. 2003

IEEE Trans. Educ.

View full text Add to dashboard Cite

A self-updating road map of The Cancer Genome Atlas

Robbins

Grüneberg

Deus

et al. 2013

View full text Add to dashboard Cite

Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months.Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals.Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at http://bit.ly/TCGARoadmap. A video tutorial is available at http://bit.ly/TCGARoadmapTutorial.Contact: robbinsd@uab.edu

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Murat M. Tanik

Towards sound epistemological foundations of statistical methods for high-dimensional biology

Transformation of higher education: the transdisciplinary approach in engineering

A self-updating road map of The Cancer Genome Atlas

Contact Info

Product

Resources

About