Robert Budden scite author profile

Robert Budden

4Publications

58Citation Statements Received

17Citation Statements Given

How they've been cited

How they cite others

Affiliations

Carnegie Mellon University, Pittsburgh Supercomputing Center

Publications

Order By: Most citations

TCGA Expedition: A Data Acquisition and Management System for TCGA Data

et al. 2016

View full text Add to dashboard Cite

BackgroundThe Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices.ResultsTCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable.ConclusionOur open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.

show abstract

Kerberized Lustre 2.0 over the WAN

Palencia

Budden

Sullivan

2010

View full text Add to dashboard Cite

In this paper, we describe our current implementation of kerberized Lustre 2.0 over the WAN with partners from the Teragrid (SDSC), the Naval Research Lab, and the Open Science Grid (University of Florida). After formulating several single kerberos realms, we enable the distributed OSTs over the WAN, create local OST pools, and perform kerberized data transfers between local and remote sites. To expand the accessibility to the lustre filesystem, we also include our efforts towards crossrealm authentication and integration of Lustre 2.0 with the kerberos-enabled NFS4.

show abstract

Secure wide area network access to CMS analysis data using the Lustre filesystem

Bourilkov

Avery

Cheng

et al. 2012

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

This paper reports the design and implementation of a secure, wide area network, distributed filesystem by the ExTENCI project, based on the Lustre filesystem. The system is used for remote access to analysis data from the CMS experiment at the Large Hadron Collider, and from the Lattice Quantum ChromoDynamics (LQCD) project. Security is provided by Kerberos authentication and authorization with additional fine grained control based on Lustre ACLs (Access Control List) and quotas. We investigate the impact of using various Kerberos security flavors on the I/O rates of CMS applications on client nodes reading and writing data to the Lustre filesystem, and on LQCD benchmarks. The clients can be real or virtual nodes. We are investigating additional options for user authentication based on user certificates. We compare the Lustre performance to those obtained with other distributed storage technologies.

show abstract

Using kerberized lustre over the WAN for high energy physics data

Palencia

Budden

Benninger

et al. 2012

View full text Add to dashboard Cite

This paper reports the design and implementation of a secure, wide area network, distributed filesystem by the ExTENCI project (Extending Science Through Enhanced National Cyberinfrastructure) based on lustre. The filesystem is used for remote access to analysis data from the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC), and from the Lattice Quantum ChromoDynamics (LQCD) project. Security is provided for by kerberos and reinforced with additional finegrained control using lustre ACLs and quotas. We show the impact of using kerberized lustre on the IO rates of CMS and LQCD applications on client nodes, both real and virtual. Preconfigured images of lustre virtual clients containing the complete software stack ease the difficulty of managing these systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Robert Budden

TCGA Expedition: A Data Acquisition and Management System for TCGA Data

Kerberized Lustre 2.0 over the WAN

Secure wide area network access to CMS analysis data using the Lustre filesystem

Using kerberized lustre over the WAN for high energy physics data

Contact Info

Product

Resources

About