Jason J. Jung scite author profile

a b s t r a c tBig data has become an important issue for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The rise of different big data frameworks such as Apache Hadoop and, more recently, Spark, for massive data processing based on the MapReduce paradigm has allowed for the efficient utilisation of data mining methods and machine learning algorithms in different domains. A number of libraries such as Mahout and SparkMLib have been designed to develop new efficient applications based on machine learning algorithms. The combination of big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas as social media and social networks. These new challenges are focused mainly on problems such as data processing, data storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and tracking data, among others. In this paper, we present a revision of the new methodologies that is designed to allow for efficient data mining and information fusion from social media and of the new applications and frameworks that are currently appearing under the "umbrella" of the social networks, social media and big data paradigms. (D. Camacho). petabytes (and even exabytes) in size, and the massive sizes of these datasets extend beyond the ability of average database software tools to capture, store, manage, and analyse them effectively.The concept of big data has been defined through the 3V model, which was defined in 2001 by Laney [5] as: "high-volume, highvelocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making". More recently, in 2012, Gartner [6] updated the definition as follows: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization". Both definitions refer to the three basic features of big data: Volume, Variety, and Velocity. Other organisations, and big data practitioners (e.g., researchers, engineers, and so on), have extended this 3V model to a 4V model by including a new "V": Value [7]. This model can be even extended to 5Vs if the concepts of Veracity is incorporated into the big data definition.Summarising, this set of * V-models provides a straightforward and widely accepted definition related to what is (and what is not) a big-data-based problem, application, software, or framework. These concepts can be briefly described as follows [5,7]:• Volume: refers to large amounts of any kind of data from any different sources, including mobile digital data creation devices and digital devices. The benefit from gathering, processing, and analysing these large amounts of data generates a number http://dx.

show abstract

Classification of crystal structure using a convolutional neural network

Park

Chung

Jung

et al. 2017

IUCrJ

182

194

View full text Add to dashboard Cite

A deep machine-learning technique based on a convolutional neural network (CNN) is introduced. It has been used for the classification of powder X-ray diffraction (XRD) patterns in terms of crystal system, extinction group and space group. About 150 000 powder XRD patterns were collected and used as input for the CNN with no handcrafted engineering involved, and thereby an appropriate CNN architecture was obtained that allowed determination of the crystal system, extinction group and space group. In sharp contrast with the traditional use of powder XRD pattern analysis, the CNN never treats powder XRD patterns as a deconvoluted and discrete peak position or as intensity data, but instead the XRD patterns are regarded as nothing but a pattern similar to a picture. The CNN interprets features that humans cannot recognize in a powder XRD pattern. As a result, accuracy levels of 81.14, 83.83 and 94.99% were achieved for the space-group, extinction-group and crystal-system classifications, respectively. The well trained CNN was then used for symmetry identification of unknown novel inorganic compounds.

show abstract

Stochastic dynamic itinerary interception refueling location problem with queue delay for electric taxi charging stations

Jung

Chow

Jayakrishnan

et al. 2014

Transportation Research Part C: Emerging Technologies

206

View full text Add to dashboard Cite

Dynamic Shared‐Taxi Dispatch Algorithm with Hybrid‐Simulated Annealing

Jung

Jayakrishnan

Park

2015

Computer aided Civil Eng

148

View full text Add to dashboard Cite

Taxi is certainly the most popular type of ondemand transportation service in urban areas because taxi-dispatching systems offer more and better services in terms of shorter wait times and passenger travel convenience. However, a shortage of taxicabs has always been critical in many urban contexts especially during peak hours, and taxi has great potential to maximize its efficiency by employing the shared-ride concept. There are recent successes in dynamic ride-sharing projects that are expected to bring substantial benefits arising from energy consumption and operation efficiency and thus, it is essential to develop advanced shared-taxi-dispatch algorithms and investigate the collective benefits of dynamic ride-sharing by maximizing occupancy and minimizing travel times in real-time. This article investigates how taxi services can be improved by proposing shared-taxi algorithms and what type of objective functions and constraints could be employed to prevent excessive passenger detours. Hybrid-simulated annealing (HSA) is applied to dynamically assign passenger requests efficiently. A series of simulations are conducted with two different taxi operation strategies. The simulation results reveal that allowing ride-sharing for taxicabs increases productivity over the various demand levels and HSA can be considered as a suitable solution to maximize the system efficiency of dynamic ride-sharing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jason J. Jung

Social big data: Recent achievements and new challenges

Classification of crystal structure using a convolutional neural network

Stochastic dynamic itinerary interception refueling location problem with queue delay for electric taxi charging stations

Dynamic Shared‐Taxi Dispatch Algorithm with Hybrid‐Simulated Annealing

Contact Info

Product

Resources

About