We aimed to identify genomic markers in hepatitis B virus (HBV) that are associated with hepatocellular carcinoma (HCC) development by comparing the complete genomic sequences of HBVs among patients with HCC and those without. One hundred patients with HBV-related HCC and 100 age-matched HBV-infected non-HCC patients (controls) were studied. HBV DNA from serum was directly sequenced to study the whole viral genome. Data mining and rule learning were employed to develop diagnostic algorithms. An independent cohort of 132 cases (43 HCC and 89 non-HCC) was used to validate the accuracy of these algorithms. Among the 100 cases of HCC, 37 had genotype B (all subgenotype Ba) and 63 had genotype C (16 subgenotype Ce and 47 subgenotype Cs) HBV infection. In the control group, 51 had genotype B and 49 had genotype C (10 subgenotype Ce and 39 subgenotype Cs) HBV infection. Genomic algorithms associated with HCC were derived based on genotype/subgenotype-specific mutations. In genotype B HBV, mutations C1165T, A1762T and G1764A, T2712C/A/G, and A/T2525C were associated with HCC. HCC-related mutations T31C, T53C, and A1499G were associated with HBV subgenotype Ce, and mutations G1613A, G1899A, T2170C/G, and T2441C were associated with HBV subgenotype Cs. Amino acid changes caused by these mutations were found in the X, envelope, and precore/core regions in association with HBV genotype B, Ce, and Cs, respectively. In conclusion, infections with different genotypes of HBV (B, Ce, and Cs) carry different genomic markers for HCC at different parts of the HBV genome. Different HBV genotypes may have different virologic mechanisms of hepatocarcinogenesis.
BackgroundInferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes.ResultsWe have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain.ConclusionsWe have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential programif required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Chinese spell checking is different from its counterparts for Western languages because Chinese words in texts are not separated by spaces. Chinese spell checking in this article refers to how to identify the misuse of characters in text composition. In other words, it is error correction at the word level rather than at the character level. Before Chinese sentences are spell checked, the text is segmented into semantic units. Error detection can then be carried out on the segmented text based on thesaurus and grammar rules. Segmentation is not a trivial process due to ambiguities in the Chinese language and errors in texts. Because it is not practical to define all Chinese words in a dictionary, words not predefined must also be dealt with. The number of word combinations increases exponentially with the length of the sentence. In this article, a Block‐of‐Combinations (BOC) segmentation method based on frequency of word usage is proposed to reduce the word combinations from exponential growth to linear growth. From experiments carried out on Hong Kong newspapers, BOC can correctly solve 10% more ambiguities than the Maximum Match segmentation method. To make the segmentation more suitable for spell checking, user interaction is also suggested.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.