BackgroundTargeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data.ResultsIn order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp.ConclusionWe tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.
The Binder of SPerm 1 (BSP1) protein is involved in the fertilization and semen cryopreservation processes and is described to be both beneficial and detrimental to sperm. Previously, the relationship of BSP1 with freezability events has not been completely understood. The objective of this work was to determine the differential abundance of the forms of the BSP1 protein in cryopreserved seminal plasma of Bos taurus indicus bulls with different patterns of semen freezability using proteomics. A wide cohort of adult bulls with high genetic value from an artificial insemination center was used as donors of high quality, fresh semen. Nine bulls presenting different patterns of semen freezability were selected. Two-dimensional gel electrophoresis showed differential abundance in a group of seven protein spots in the frozen/thawed seminal plasma from the bulls, ranging from 15 to 17 kDa, with pI values from 4.6 to 5.8. Four of these spots were confirmed to be BSP1 using mass spectrometry, proteomics, biochemical, and computational analysis (Tukey's test at P < 0.05). The protein spot weighing 15.52 ± 0.53 kDa with a pI value of 5.78 ± 0.12 is highlighted by its high abundance in bulls with low semen freezability and its absence in bulls presenting high semen freezability. This is the first report showing that more than two forms of BSP1 are found in the seminal plasma of Nelore adult bulls and not all animals have a similar abundance of each BSP1 form. Different BSP1 forms may be involved in different events of fertilization and the cryopreservation process.
BackgroundThe need to manage large amounts of data is a clear demand for laboratories nowadays. The use of Laboratory Information Management Systems (LIMS) to achieve this is growing each day. A LIMS is a complex computational system used to manage laboratory data with emphasis in quality assurance. Several LIMS are available currently. However, most of them have proprietary code and are commercialized with a high cost. Moreover, due to its complexity, LIMS are usually designed to comply with the needs of one kind of laboratory, making it very difficult to reuse a LIMS. In this work we describe the Sistema Integrado de Gerência de Laboratórios (SIGLa), an open source LIMS with a new approach designed to allow it to adapt its activities and processes to various types of laboratories.ResultsSIGLa incorporates a workflow management system, making it possible to create and manage customized workflows. For each new laboratory a workflow is defined with its activities, rules and procedures. During the execution, for each workflow created, the values of attributes defined in a XPDL file (which describe the workflow) are stored in SIGLa’s database, allowing then to be managed and retrieved upon request. These characteristics increase system’s flexibility and extend its usability to include the needs of multiple types of laboratories. To construct the main functionalities of SIGLa a workflow of a proteomic laboratory was first defined. To validate the SIGLa capability of adapting to multiples laboratories, on this paper we study theprocess and the needs of a microarray laboratory and define its workflow. This workflow has been defined in a period of about two weeks, showing the efficiency and flexibility of the tool.ConclusionsUsing SIGLa it has been possible to construct a microarray LIMS in a few days illustrating the flexibility and power of the method proposed. With SIGLa’s development we hope to contribute positively to the area of management of complex data in laboratory by managing its large amounts of data, guaranteeing the consistence of the data and increasing the laboratory productivity. We also hope to make possible to laboratories with little resources to afford a high level system for complex data management.
BackgroundA research area that has greatly benefited from the development of new and improved analysis technologies is Proteomics and large amounts of data have been generated by proteomic analysis as a consequence. Previously, the storage, management and analysis of these data have been done manually. This is, however, incompatible with the volume of data generated by modern proteomic analysis. Several attempts have been made to automate the tasks of data analysis and management. In this work we propose PRODIS (Proteomics Database Integrated System), a system for proteomic experimental data management. The proposed system enables an efficient management of the proteomic experimentation workflow, simplifies controlling experiments and associated data and establishes links between similar experiments through the experiment tracking function.ResultsPRODIS is fully web based which simplifies data upload and gives the system the flexibility necessary for use in complex projects. Data from Liquid Chromatography, 2D-PAGE and Mass Spectrometry experiments can be stored in the system. Moreover, it is simple to use, researchers can insert experimental data directly as experiments are performed, without the need to configure the system or change their experiment routine. PRODIS has a number of important features, including a password protected system in which each screen for data upload and retrieval is validated; users have different levels of clearance, which allow the execution of tasks according to the user clearance level. The system allows the upload, parsing of files, storage and display of experiment results and images in the main formats used in proteomics laboratories: for chromatographies the chromatograms and lists of peaks resulting from separation are stored; For 2D-PAGE images of gels and the files resulting from the analysis are stored, containing information on positions of spots as well as its values of intensity, volume, etc; For Mass Spectrometry, PRODIS presents a function for completion of the mapping plate that allows the user to correlate the positions in plates to the samples separated by 2D-PAGE. Furthermore PRODIS allows the tracking of experiments from the first stage until the final step of identification, enabling an efficient management of the complete experimental process.ConclusionsThe construction of data management systems for Proteomics data importing and storing is a relevant subject. PRODIS is a system complementary to other proteomics tools that combines a powerful storage engine (the relational database) and a friendly access interface, aiming to assist Proteomics research directly at data handling and storage.
Key words: Schistosoma mansoni -bioinformatics -expressed sequences tag -clustering analysis -metabolismSchistosoma mansoni is a dioiceous trematode and one of the etiologic agents of schistosomiasis, the second more significant tropical disease concerning public health. Despite recent efforts undertaken to contain its progress, the disease is still endemic in several countries, with around 200 million people infected by the parasite (http://www.who.int/ctd/schisto/epidemio.htm). The study of S. mansoni is, therefore, very important in human parasitology. Gaining knowledge on the genome of this parasite is essential for a better understanding of its metabolism and biology and will help to elucidate important aspects of the mechanisms of drug resistance and antigenic variation that allow it to escape from the host immune system (Franco et al. 2000).The size of S. mansoni genome is estimated in 270Mb with the number of expressed genes ranging from 15000 to 20000 (Simpson et al. 1982, Franco & Simpson 2001. Although some genomic sequences of S. mansoni have been produced, the Schistosoma Genome Network (SGN) has chosen as first priority the sequencing of cDNA using the expressed sequence tags (ESTs) strategy, from which is possible to obtain fast and relevant information Although resulting in fast and very important information, ESTs available from public databases, such as dbEST, show some degree of redundancy and present a great number of errors, because they are single pass sequences (Miller et al. 1999). To overcome these problems and to increase the length of the sequences, facilitating identification by homology searches, clustering procedures are performed (Oliveira & Johnston 2001). In this kind of procedure, sequences that have some region of similarity are joined into a cluster. Therefore, sequences possessing overlapping regions and representing a single gene are joined into the same cluster, decreasing redundancy. Sequences of each cluster are then aligned to generate a consensus sequence. In this approach, the base (and, if available, the quality value designated by the base caller program) present in each sequence position is considered in the construction of a high quality consensus (Huang & Madan 1999). The clustering procedure can, therefore, have two outcomes: consensus are generated by the alignment of the sequences of a cluster and singlets result from sequences that have not been grouped to any others. Theoretically, each sequence (either a consensus or a singlet) should represent an individual gene, and so, these sequences are called uniques. As is expected that each sequence represents a single gene, the comparison of the number of uniques with the total number of predicted genes make it possible to know, approximately, how many genes have not been discovered yet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.