Big Data: Issues and Challenges Moving Forward

Kaisler, Stephen H.; Armour, Frank; Espinosa, J. Alberto; Money, William H.

doi:10.1109/hicss.2013.645

Cited by 753 publications

(518 citation statements)

References 6 publications

Supporting

Mentioning

490

Contrasting

Unclassified

Order By: Relevance

“…As the field of Big Data reveals, an increase in the scale of social data available cannot be effectively managed by merely scaling up hardware and software, but creates new challenges which necessitate new methods and, indeed, new areas of expertise (Kaisler, Armour et al 2013). Our project is particularly interested in the study of virtual organizations mediated by social technologies.…”

Section: Introductionmentioning

confidence: 99%

Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing

Groß

Murthy

2014

Neural Networks

View full text Add to dashboard Cite

This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques.

show abstract

Section: Introductionmentioning

confidence: 99%

Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing

Groß

Murthy

2014

Neural Networks

View full text Add to dashboard Cite

show abstract

“…Both the academic and industrial communities are generating and collecting data at an unprecedented rate and scale. Analysis of these datasets represents a huge opportunity to advance domain knowledge and informed decision making [7]. However, despite a few early successes, there remain many significant challenges in realising the full potential of the knowledge buried within these datasets.…”

Section: Introduction a Big Data And Healthcarementioning

confidence: 99%

Data quality assessment and anomaly detection via map/reduce and linked data: A case study in the medical domain

Bonner

McGough

Kureshi

et al. 2015

2015 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Use policyThe full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-pro t purposes provided that:• a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.Please consult the full DRO policy for further details. Abstract-Recent technological advances in modern healthcare have lead to the ability to collect a vast wealth of patient monitoring data. This data can be utilised for patient diagnosis but it also holds the potential for use within medical research. However, these datasets often contain errors which limit their value to medical research, with one study finding error rates ranging from 2.3% -26.9% in a selection of medical databases.Previous methods for automatically assessing data quality normally rely on threshold rules, which are often unable to correctly identify errors, as further complex domain knowledge is required. To combat this, a semantic web based framework has previously been developed to assess the quality of medical data. However, early work, based solely on traditional semantic web technologies, revealed they are either unable or inefficient at scaling to the vast volumes of medical data.In this paper we present a new method for storing and querying medical RDF datasets using Hadoop Map / Reduce. This approach exploits the inherent parallelism found within RDF datasets and queries, allowing us to scale with both dataset and system size. Unlike previous solutions, this framework uses highly optimised (SPARQL) joining strategies, intelligent data caching and the use of a super-query to enable the completion of eight distinct SPARQL lookups, comprising over eighty distinct joins, in only two Map / Reduce iterations. Results are presented comparing both the Jena and a previous Hadoop implementation demonstrating the superior performance of the new methodology. The new method is shown to be five times faster than Jena and twice as fast as the previous approach.

show abstract

“…Indeed, the rapid growth in the volume of data intended for processing characterizes not only IT companies and the scientific sphere (including meteorology, genetic research, complex physical simulators, and biology and environmental research), but also a wide range of organizations in various fields. In modern science and technology, a separate direction appeared related to the analysis of large and super-large data sets, known as -Big data‖ [2].…”

Section: Introductionmentioning

confidence: 99%

An Anomaly Detection Based on Optimization

Alguliyev¹,

Alıguliyev²,

İmamverdiyev³

et al. 2017

IJISA

View full text Add to dashboard Cite

Abstract-At present, an anomaly detection is one of the important problems in many fields. The rapid growth of data volumes requires the availability of a tool for data processing and analysis of a wide variety of data types. The methods for anomaly detection are designed to detect object's deviations from normal behavior. However, it is difficult to select one tool for all types of anomalies due to the increasing computational complexity and the nature of the data. In this paper, an improved optimization approach for a previously known number of clusters, where a weight is assigned to each data point, is proposed. The aim of this article is to show that weighting of each data point improves the clustering solution. The experimental results on three datasets show that the proposed algorithm detects anomalies more accurately. It was compared to the k-means algorithm. The quality of the clustering result was estimated using clustering evaluation metrics. This research shows that the proposed method works better than k-means on the Australia (credit card applications) dataset according to the Purity, Mirkin and F-measure metrics, and on the heart diseases dataset according to F-measure and variation of information metric.

show abstract

Big Data: Issues and Challenges Moving Forward

Cited by 753 publications

References 6 publications

Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing

Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing

Data quality assessment and anomaly detection via map/reduce and linked data: A case study in the medical domain

An Anomaly Detection Based on Optimization

Contact Info

Product

Resources

About