Cyber-attacks cost the global economy over $450 billion annually. To combat this issue, researchers and practitioners put enormous efforts into developing Cyber Threat Intelligence, or the process of identifying emerging threats and key hackers. However, the reliance on internal network data to has resulted in inherently reactive intelligence. CTI experts have urged the importance of proactively studying the large, everevolving online hacker community. Despite their CTI value, collecting data from hacker community platforms is a non-trivial task. In this paper, we summarize our efforts in systematically identifying and automatically collecting a large-scale of hacker forums, carding shops, Internet-Relay-Chat, and DarkNet Marketplaces. We also present our efforts to provide this data to the larger CTI community via the AZSecure Hacker Assets Portal (www.azsecure-hap.com). With our methodology, we collected 102 platforms for a total of 43,902,913 records. To the best of our knowledge, this compilation of hacker community data is the largest such collection in academia, and can enable a numerous novel and valuable proactive CTI research inquiries.
While neural networks produce state-of-the-art performance in many NLP tasks, they generally learn from lexical information, which may transfer poorly between domains. Here, we investigate the importance that a model assigns to various aspects of data while learning and making predictions, specifically, in a recognizing textual entailment (RTE) task. By inspecting the attention weights assigned by the model, we confirm that most of the weights are assigned to noun phrases. To mitigate this dependence on lexicalized information, we experiment with two strategies of masking. First, we replace named entities with their corresponding semantic tags along with a unique identifier to indicate lexical overlap between claim and evidence. Second, we similarly replace other word classes in the sentence (nouns, verbs, adjectives, and adverbs) with their super sense tags (Ciaramita and Johnson, 2003). Our results show that, while performance on the in-domain dataset remains on par with that of the model trained on fully lexicalized data, it improves considerably when tested out of domain. For example, the performance of a state-of-the-art RTE model trained on the masked Fake News Challenge (Pomerleau and Rao, 2017) data and evaluated on Fact Extraction and Verification (Thorne et al., 2018) data improved by over 10% in accuracy score compared to the fully lexicalized model.
We propose an empirical framework to understand the impact of non-verbal cues across various research contexts. A large percentage of communication on the Internet uses text-driven non-verbal communication cues often referred to as emojis. Our framework proposes two types of factors to understand the impact of emojis. The first type consists of pictographs, ideograms, and emojis (PIE) factors such as usage, valence, position, and skin tone, and the second type consists of contextual factors depending on the research context, such as fake news, which has high social impact. We discuss how the effect of PIE factors and contextual factors can be used to measure belief, trust, reputation, and intentions across these contexts.
The intentional and non-intentional use of social media platforms resulting in digital wildfires of misinformation has increased significantly over the last few years. However, the factors that influence this rapid spread in the online space remain largely unknown. We study how believability and intention to share information are influenced by multiple factors in addition to confirmation bias. We conducted an experiment where a mix of true and false articles were evaluated by study participants. Using hierarchical linear modelling to analyze our data, we found that in addition to confirmation bias, believability is influenced by source endorser credibility and argument quality, both of which are moderated by the type of informationtrue or false. Source likeability also had a positive main effect on believability. After controlling for belief and confirmation bias, intention to share information was affected by source endorser credibility and information source likeability. Prior Theory and ResearchAs fake news has broad definitions and purposes, understanding it is emerging as a significant research challenge. Technical and behavioral scientists are looking at this problem from multiple perspectives. Behavioral scientists have made significant progress in understanding how readability, placement of titles, etc. affect belief. However, a key aspect of fake news is its ability to persuade readers that it is true. The Elaboration Likelihood Model (ELM) is a key theory for understanding persuasive communication better. The two key routes of ELM, the central and the peripheral routes, explain how persuasive communication affects both individual belief and intention to share.
While neural networks produce state-of-theart performance in several NLP tasks, they generally depend heavily on lexicalized information, which transfer poorly between domains. We present a combination of two strategies to mitigate this dependence on lexicalized information in fact verification tasks. We present a data distillation technique for delexicalization, which we then combine with a model distillation method to prevent aggressive data distillation. We show that by using our solution, not only does the performance of an existing state-of-the-art model remain at par with that of the model trained on a fully lexicalized data, but it also performs better than it when tested out of domain. We show that the technique we present encourages models to extract transferable facts from a given fact verification dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.