Space management is the activity of monitoring and ensuring adequate free space on all volumes in a clustered storage system. Volumes that exceed used space limits are typically relieved by migrating a part of their data to other under utilized volumes. Without deduplication, space reclamation is simple as one has to just migrate as much data as the desired space reclamation. However, in deduped volumes there is no direct relation between the logical size of the file and the physical space occupied by it. Therefore, optimal space reclamation is hard as: a)migrating few files may produce little or zero bytes of free space, but still incur significant network costs. b)migrating a heavily shared file destroys the disk sharing relationships in that volume and increases the physical space consumption of that dataset.In this work, we have designed and built a fast and efficient tool Rangoli, that identifies the optimal set of files for space reclamation in a deduped environment. It can scale to millions of files and terabytes of data, running in tens of minutes. We show by experimenting on real world datasets, that alternate strategies such as those based on finding unique files or using MinHash, impact physical space consumption by a wide margin (up to 35 times) as compared to Rangoli.
FORS-D is a measure of the contribution of base order to the stem loop potential of a nucleic acid sequence and can also give information on evolutionary pressures on sequences to move away from secondary structure. Negative FORS-D values in a gene are associated with exons and nucleotide substitutions such as SNPs. An analysis of P. falciparum genes under selection pressure shows a correlation between negative FORS-D values and SNP density for genes that drug targets but not for drug transporters or antigenic variation genes. Analysis of the dhfr gene shows that a majority of rare mutations that associate with drug resistance also fall into regions with negative FORS-D values. These data suggest that FORS-D values might be predictors for drug target genes and drug resistance mutations in these genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.