Big data analysis in e-commerce system using HadoopMapReduce

Suguna, S.; Vithya, M.; Eunaicy, J. I. Christy

doi:10.1109/inventive.2016.7824798

Cited by 10 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This dataset covers reviews of multiple products such as Books, Baby products, Electronics, Kindle store, Movies and TV, Health and During the execution of interest-based queries, it is observed that there is a severe drag in MR performance. In the business forecasting domain [16,17] in particular, to predict future product demand/sales of particular products, the reviews in respective categories alone need to be analysed rather than sweeping through the reviews in all categories. The data relating to the interest domain in the Amazon review data is shown in Table.4.3.…”

Section: Experimental Results Andmentioning

confidence: 99%

“…k-means [13], Hierarchical Agglomerative Clustering (HAC) [14] and Markov Clustering (MCL) [15] in grouping-aware data placement for data-intensive applications with interest locality. It has been proved in a heterogeneous distributed environment for the e-commerce dataset [16,17]. The results show that queries are solved by the domain analyst at the earliest possible time to enable quick decisions, as well as deriving maximum utilisation of resources.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Vengadeswaran

Ramakrishnan

2018

SCPE

View full text Add to dashboard Cite

In this data era, massive volumes of data are being generated every second in variety of domains such as Geoscience, Social Web, Finance, e-Commerce, Health Care, Climate modelling, Physics, Astronomy, Government sectors etc. Hadoop has been well-recognized as de factobig data processing platform that have been extensively adopted, and is currently widely used, in many application domains processing Big Data. Even though it is considered as an efficient solution for such complex query processing, it has its own limitation when the data to be processed exhibit interest locality. The data required for any query execution follows grouping behavior wherein only a part of the Big-Data is accessed frequently. During such scenarion, the time taken to execute a queryand return results, increases exponentially as the amount of data increases leading to much waiting time for the user. Since Hadoop default data placement strategy (HDDPS) does not consider such grouping behavior, it does not perform efficiently resulting in lacunas such as decreased local map task execution, increased query execution time etc. Hence proposed an Optimal Data Placement Strategy (ODPS) based on grouping semantics. In this paper we experiment the significance oftwo most promising clustering techniques viz. Hierarchical Agglomerative Clustering (HAC) and Markov Clustering (MCL) in grouping aware data placement for data intensive applications having interest locality. Initially user access pattern is identified by dynamically analyzing history log.Then both clustering techniques (HAC & MCL) are separately applied over the access pattern to obtain independent clusters. These clusters are interpreted and validated to extract the Optimal Data Groupings (ODG). Finally proposed strategy reorganizes the default data layouts in HDFSbased on ODG to achieve maximum parallel execution per group subjective to Load Balancer and Rack Awareness. Our proposed strategy is tested in 10 node cluster placed in a multi rack with Hadoop installed in every node deployed in cloud platform. Proposed strategy reduces the query execution time, significantly improves the data locality and has proved to be more efficient for massive datasets processing in heterogeneous distributed environment. Also MCL shows a marginal improved performance over HAC for queries exhibiting interest localities.

show abstract

Section: Experimental Results Andmentioning

confidence: 99%

mentioning

confidence: 99%

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Vengadeswaran

Ramakrishnan

2018

SCPE

View full text Add to dashboard Cite

show abstract

“…In E commerce the parallel processing is very important because every time the data come in is needed to be computed and resulted then only a recommendation can be powerful in nature. [7] Dr.S.Suguna, M.Vithya, J.I.Christy Eunaicy [8] proposed that how we can use Hadoop MapReduce to analyze the big data. Authors addressed the issue that how it is difficult to apply data mining techniques within the present amount of data.…”

Section: Literature Surveymentioning

confidence: 99%

“…The file passed as an input is filled with records which are termed as rows in SQL and these records are read first and then parsed into the records and can be delimited according to the user. [8]…”

Section: Fig 2b Content Based Filtering Hadoop Distributed File Sysmentioning

confidence: 99%

Optimized Recommendation System for E Commerce on Product Features and User Behavior

G.¹,

Singh²

2019

IJRTE

View full text Add to dashboard Cite

Big data is a late of huge information stored in it and all we need is to dig into get the important information out of it and create a useful system which can be very helpful in improving the current scenario. There are various applications where big data is being used and even there are few fields that are learning techniques to go with big data and evaluate their work and get an improve decision. This paper particularly concentrates on the e commerce system which is highly trending on the market field. [20] E commerce also known as electronic commerce is a market place which gives you a platform to enjoy various services from both buyers as well as sellers. It is a place with various varieties are provided that can help the consumer to choose from and the buyer can get a platform where he can show case his product and get millions of the customer at the same time and he does not have to look for site all the time, it’s the system that take care of it. Now big data is playing a vital role in e commerce as it reads about user behavior and provides him a suitable product that he may need according to his behavior and query. There are various machine learning algorithms that are working on this and improving the services. [11] Basically in this paper we will read the user information and combine it with the product attributes and get a suitable suggestion for the user that will be most likely to be purchased by him. In the existing system we just look at one part of the case and give suggestion but in this paper we looked at both the sides, that is we looked after the product entities (the attributes and features that it poses) and the user behavior (the information given by the user and its previous history) that will better prediction and improve the system. Moreover for the optimized working of the system we included an enhanced version of HPCA scheduling algorithm for the Hadoop distributed file system also known as HDFS, which is very suitable for the heterogeneous system, the existing algorithm looks after the overall capacity of the node and then the tasks were assigned but here we will consider the health and the left over capacity of the nodes and arrange the queue for the same which will be refreshed all the time after the task is completed by any node.[18] The aim of the paper is to provide fast and most suitable suggestions to the users which can play a vital role in improving the sales of the company and getting the target done soon and faster

show abstract

“…A number of specialised frameworks were created for offline processing of data, e.g., [16], [15], [13]. However, none of them are suited for processing streaming data.…”

Section: Introductionmentioning

confidence: 99%

Online and Offline Analysis of Streaming Data

Hoque¹,

Miranskyy²

2018

2018 IEEE International Conference on Software Architecture Companion (ICSA-C)

View full text Add to dashboard Cite

Online and offline analytics have been traditionally treated separately in software architecture design, and there is no existing general architecture that can support both. Our objective is to go beyond and introduce a scalable and maintainable architecture for performing online as well as offline analysis of streaming data.In this paper, we propose a 7-layered architecture utilising microservices, publish-subscribe pattern, and persistent storage. The architecture ensures high cohesion, low coupling, and asynchronous communication between the layers, thus yielding a scalable and maintainable solution.This design can help practitioners to engage their online and offline use cases in one single architecture, and also is of interest to academics, as it is a building block for a general architecture supporting data analysis.

show abstract

Big data analysis in e-commerce system using HadoopMapReduce

Cited by 10 publications

References 7 publications

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Optimized Recommendation System for E Commerce on Product Features and User Behavior

Online and Offline Analysis of Streaming Data

Contact Info

Product

Resources

About