Alex Poliakov scite author profile

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

show abstract

The Architecture of SciDB

Stonebraker

Brown

Poliakov

et al. 2011

137

View full text Add to dashboard Cite

Implementing connected component labeling as a user defined operator for SciDB

Oloso

Kuo

Clune

et al. 2016

View full text Add to dashboard Cite

We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficiently manipulates arrays whose memory requirements far exceed available physical memory. The UDO takes as primary inputs a binary mask array and a binary stencil array that specifies the connectivity of a given cell to its neighbors. The UDO returns an array of the same shape as the input mask array with each foreground cell containing the label of the component it belongs to. By default, dimensions are treated as non-periodic, but the UDO also accepts optional input parameters to specify periodicity in any of the array dimensions. The UDO requires four stages to completely label connected components. In the first stage, labels are computed for each subarray or chunk of the mask array in parallel across SciDB instances using the weighted quick union (WQU) with half-path compression algorithm. In the second stage, labels around chunk boundaries from the first stage are stored in a temporary SciDB array that is then replicated across all SciDB instances. Equivalences are resolved by again applying the WQU algorithm to these boundary labels. In the third stage, relabeling is done for each chunk using the resolved equivalences. In the fourth stage, the resolved labels, which so far are "flattened" coordinates of the original binary mask array, are renamed with sequential integers for legibility. The UDO is demonstrated on a 3-D mask of O(10 11 ) elements, with O(10 8 ) foreground cells and O(10 6 ) connected components. The operator completes in 19 minutes using 84 SciDB instances.

show abstract

The SciDB codeline: crossing the chasm

Sharma¹,

Poliakov²,

Kinchen³

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alex Poliakov

Initial sequencing and comparative analysis of the mouse genome

The Architecture of SciDB

Implementing connected component labeling as a user defined operator for SciDB

The SciDB codeline: crossing the chasm

Contact Info

Product

Resources

About