Abstract-YouTube draws large number of users who contribute actively by uploading videos or commenting on existing videos. However, being a crowd sourced and large content pushed onto it, there is limited control over the content. This makes malicious users push content (videos and comments) which is inappropriate (unsafe), particularly when such content is placed around cartoon videos which are typically watched by kids. In this paper, we focus on presence of unsafe content for children and users who promote it. For detection of child unsafe content and its promoters, we perform two approaches, one based on supervised classification which uses an extensive set of videolevel, user-level and comment-level features and another based Convolutional Neural Network using video frames. Detection accuracy of 85.7% is achieved which can be leveraged to build a system to provide a safe YouTube experience for kids. Through detailed characterization studies, we are able to successfully conclude that unsafe content promoters are less popular and engage less as compared with other users. Finally, using a network of unsafe content promoters and other users based on their engagements (likes, subscription and playlist addition) and other factors, we find that unsafe content is present very close to safe content and unsafe content promoters form very close knit communities with other users, thereby further increasing the likelihood of a child getting getting exposed to unsafe content.
With increasing regulation of Big Data, it is becoming essential for organizations to ensure compliance with various data protection standards. The Federal Acquisition Regulations System (FARS) within the Code of Federal Regulations (CFR) includes facts and rules for individuals and organizations seeking to do business with the US Federal government. Parsing and gathering knowledge from such lengthy regulation documents is currently done manually and is time and human intensive. Hence, developing a cognitive assistant for automated analysis of such legal documents has become a necessity. We have developed semantically rich approach to automate the analysis of legal documents and have implemented a system to capture various facts and rules contributing towards building an efficient legal knowledge base that contains details of the relationships between various legal elements, semantically similar terminologies, deontic expressions and cross-referenced legal facts and rules. In this paper, we describe our framework along with the results of automating knowledge extraction from the FARS document (Title 48, CFR). Our approach can be used by Big Data Users to automate knowledge extraction from Large Legal documents.
Federal government agencies and organizations doing business with them have to adhere to the Code of Federal Regulations (CFR). The CFRs are currently available as large text documents that are not machine processable and so require extensive manual effort to parse and comprehend, especially when sections cross-reference topics spread across various titles. We have developed a novel framework to automatically extract knowledge from CFRs and represent it using a semantically rich knowledge graph. The framework captures knowledge in the form of key terms, rules, topic summaries, relationships between various terms, semantically similar terminologies, deontic expressions, and cross-referenced facts and rules. We built our framework using deep learning technologies like TensorFlow for word embeddings and text summarization, Gensim for topic modeling, and Semantic Web technologies for building the knowledge graph. In this article, we describe our framework in detail and present the results of our analysis of the Title 48 CFR knowledge base that we have built using this framework. Our framework and knowledge graph can be adopted by federal agencies and businesses to automate their internal processes that reference the CFR rules and policies. CCS Concepts: • Applied computing → Law; • Computing methodologies → Natural language processing; Knowledge representation and reasoning;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.