Datasets for recommender systems are few and often inadequate for the contextualized nature of news recommendation. News recommender systems are both time-and location-dependent, make use of implicit signals, and often include both collaborative and content-based components. In this paper we introduce the Adressa compact news dataset, which supports all these aspects of news recommendation. The dataset comes in two versions, the large 20M dataset of 10 weeks' traffic on Adresseavisen's news portal, and the small 2M dataset of only one week's traffic. We explain the structure of the dataset and discuss how it can be used in advanced news recommender systems.
Glycoproteins
are biologically significant large molecules that
participate in numerous cellular activities. In order to obtain site-specific
protein glycosylation information, intact glycopeptides, with the
glycan attached to the peptide sequence, are characterized by tandem
mass spectrometry (MS/MS) methods such as collision-induced dissociation
(CID) and electron transfer dissociation (ETD). While several emerging
automated tools are developed, no consensus is present in the field
about the best way to determine the reliability of the tools and/or
provide the false discovery rate (FDR). A common approach to calculate
FDRs for glycopeptide analysis, adopted from the target-decoy strategy
in proteomics, employs a decoy database that is created based on the
target protein sequence database. Nonetheless, this approach is not
optimal in measuring the confidence of N-linked glycopeptide
matches, because the glycopeptide data set is considerably smaller
compared to that of peptides, and the requirement of a consensus sequence
for N-glycosylation further limits the number of
possible decoy glycopeptides tested in a database search. To address
the need to accurately determine FDRs for automated glycopeptide assignments,
we developed GlycoPep Evaluator (GPE), a tool that helps to measure
FDRs in identifying glycopeptides without using a decoy database.
GPE generates decoy glycopeptides de novo for every target glycopeptide,
in a 1:20 target-to-decoy ratio. The decoys, along with target glycopeptides,
are scored against the ETD data, from which FDRs can be calculated
accurately based on the number of decoy matches and the ratio of the
number of targets to decoys, for small data sets. GPE is freely accessible
for download and can work with any search engine that interprets ETD
data of N-linked glycopeptides. The software is provided
at .
Studying protein O-glycosylation remains an analytical challenge. Different from N-linked glycans, the O-glycosylation site is not within a known consensus sequence. Additionally, O-glycans are heterogeneous with numerous potential modification sites. Electron transfer dissociation (ETD) is the method of choice in analyzing these glycopeptides since the glycan side chain is intact in ETD, and the glycosylation site can be localized on the basis of the c and z fragment ions. Nonetheless, new software is necessary for interpreting O-glycopeptide ETD spectra in order to expedite the analysis workflow. To address the urgent need, we studied the fragmentation of O-glycopeptides in ETD and found useful rules that facilitate their identification. By implementing the rules into an algorithm to score potential assignments against ETD-MS/MS data, we applied the method to glycopeptides generated from various O-glycosylated proteins including mucin, erythropoietin, fetuin and an HIV envelope protein, 1086.C gp120. The site-specific O-glycopeptide composition was correctly assigned in every case, proving the merits of our method in analyzing glycopeptide ETD data. The algorithm described herein can be easily incorporated into other automated glycomics tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.