Long noncoding RNAs (lncRNAs) can promote or repress the cellular hallmarks of cancer.Understanding their molecular roles and realising their therapeutic potential depend on highquality catalogues of cancer lncRNA genes. Presently, such catalogues depend on labourintensive curation of heterogeneous data with permissive criteria, resulting in unknown numbers of genes without direct functional evidence. Here, we present an approach for semiautomated curation focused exclusively on pathogenic functionality. The result is Cancer LncRNA Census 2 (CLC2), comprising 492 gene loci in 33 cancer types. To complement manual literature curation, we develop an automated pipeline, CLIO-TIM, to identify novel cancer lncRNAs based on functional evolutionary conservation with mouse. This yields 95 novel lncRNAs, which display characteristics of known cancer genes and include LINC00570 (ncRNA-a5), which we demonstrate experimentally to promote cell proliferation. The clinical importance and curation accuracy of CLC2 lncRNAs is highlighted by a range of features, including evolutionary selection, expression in tumours, and both somatic and germline polymorphisms. The entire dataset is available in a highly-curated format facilitating the widest range of downstream applications. In summary, we show how manual and automated methods can be integrated to catalogue known and novel functional cancer lncRNAs with unique genomic and clinical properties.
Long non-coding RNAs (lncRNAs) play key roles in cancer and are at the vanguard of precision therapeutic development. These efforts depend on large and high-confidence collections of cancer lncRNAs. Here, we present the Cancer LncRNA Census 2 (CLC2). With 492 cancer lncRNAs, CLC2 is 4-fold greater in size than its predecessor, without compromising on strict criteria of confident functional/genetic roles and inclusion in the GENCODE annotation scheme. This increase was enabled by leveraging high-throughput transposon insertional mutagenesis screening data, yielding 92 novel cancer lncRNAs. CLC2 makes a valuable addition to existing collections: it is amongst the largest, contains numerous unique genes (not found in other databases) and carries functional labels (oncogene/tumour suppressor). Analysis of this dataset reveals that cancer lncRNAs are impacted by germline variants, somatic mutations and changes in expression consistent with inferred disease functions. Furthermore, we show how clinical/genomic features can be used to vet prospective gene sets from high-throughput sources. The combination of size and quality makes CLC2 a foundation for precision medicine, demonstrating cancer lncRNAs’ evolutionary and clinical significance.
Evolutionary conservation is a measure of gene functionality that is widely used to prioritise long noncoding RNAs (lncRNA) in cancer research. Intriguingly, while updating our Cancer LncRNA Census (CLC), we observed an inverse relationship between year of discovery and evolutionary conservation. This observation is specific to cancer over other diseases, implying a sampling bias in the selection of lncRNA candidates and casting doubt on the value of evolutionary metrics for the prioritisation of cancer-related lncRNAs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.