We present the design and a first performance evaluation of Thrill -a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cachefriendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping).We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.
The shipping industry constantly strives to achieve efficient use of energy during sea voyages. Previous research that can take advantages of both ethnographic studies and big data analytics to understand factors contributing to fuel consumption and seek solutions to support decision making is rather scarce. This paper first employed ethnographic research regarding the use of a commercially available fuel-monitoring system. This was to contextualize the real challenges on ships and informed the need of taking a big data approach to achieve energy efficiency (EE). Then this study constructed two machine-learning models based on the recorded voyage data of five different ferries over a one-year period. The evaluation showed that the models generalize well on different training data sets and model outputs indicated a potential for better performance than the existing commercial EE system. How this predictive-analytical approach could potentially impact the design of decision support navigational systems and management practices was also discussed. It is hoped that this interdisciplinary research could provide some enlightenment for a richer methodological framework in future maritime energy research.
Since Cloud services are billed by the pay-as-you-go principle, organizations can save huge investment costs. Hence, they want to know, what costs will arise by the usage of those services. On the other hand, Cloud providers want to provide the best-matching hardware configurations for different use-cases. Therefore, CloudSim a popular event-based framework by Calheiros et al., was developed to model and simulate the usage of IaaS (Infrastructure-as-a-Service) Clouds. Metrics like usage costs, resource utilization and energy consumption can be also investigated using CloudSim. But this favored simulation framework does not provide any mechanisms to simulate todays object storage-based Cloud services (STaaS, Storage-as-a-Service). In this paper, we propose a storage extension for CloudSim to enable the simulations of STaaS-components. Interactions between users and the modeled STaaS Clouds are inspired by the CDMI (Cloud Data Management Interface) standard. In order to validate our extension, we evaluated the resource utilization and costs that arise by the usage of STaaS Clouds based on different simulation scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.