The energy performance certificate (EPC) is a document that certifies the average annual energy consumption of a building in standard conditions and allows it to be classified within a so-called energy class. In a period such as this, when greenhouse gas emissions are of considerable importance and where the objective is to improve energy security and reduce energy costs in our cities, energy certification has a key role to play. The proposed work aims to model and characterize residential buildings’ energy efficiency by exploring heterogeneous, geo-referenced data with different spatial and temporal granularity. The paper presents TUCANA (TUrin Certificates ANAlysis), an innovative data mining engine able to cover the whole analytics workflow for the analysis of the energy performance certificates, including cluster analysis and a model generalization step based on a novel spatial constrained K-NN, able to automatically characterize a broad set of buildings distributed across a major city and predict different energy-related features for new unseen buildings. The energy certificates analyzed in this work have been issued by the Piedmont Region (a northwest region of Italy) through open data. The results obtained on a large dataset are displayed in novel, dynamic, and interactive geospatial maps that can be consulted on a web application integrated into the system. The visualization tool provides transparent and human-readable knowledge to various stakeholders, thus supporting the decision-making process.
In a context of deep transformation of the entire automotive industry, starting from pervasive and native connectivity, commercial vehicles (heavy, light, and buses) are generating and transmitting much more data than passenger cars, with a much higher expected value, motivated by the higher costs of the vehicles and their added-value related businesses, such as logistics, freight, and transportation management. This paper presents a data-driven and unsupervised methodology to provide a descriptive model assessing the residual value estimates of heavy trucks subject to buy-back. A huge amount of telematics data characterizing the actual usage of commercial vehicles is jointly analyzed with different external conditions (e.g., altimetry), affecting the truck's performance to estimate the devaluation of the vehicle at the buyback. The proposed approach has been evaluated on a large set of real-world heavy trucks to demonstrate its effectiveness in correctly assessing the real status of wear and residual value at the end of leasing contracts, to provide a few and quantitative insights through an informative, interactive and user-friendly dashboard to make a proper decision on the next business strategies to be adopted. The proposed solution has already been deployed by a private company within its data analytics services since (1) an interpretable descriptive model of the main factors/parameters and corresponding weights affecting the residual value is provided and (2) the experimental results confirmed the promising outcomes of the proposed data-driven methodology. INDEX TERMS Business vs data-driven methodologies, Automotive industry, Commercial Vehicles, Residual value estimation.
Today, large amounts of data are collected in various domains, presenting unprecedented economic and societal opportunities. Yet, at present, the exploitation of these data sets through data science methods is primarily dominated by AI-savvy users. From an inclusive perspective, there is a need for solutions that can democratise data science that can guide non-specialists intuitively to explore data collections and extract knowledge out of them. This paper introduces the vision of a new data science engine, called DS4ALL (Data Science for ALL), that empowers users who are neither computer nor AI experts to perform sophisticated data exploration and analysis tasks. Therefore, DS4ALL is based on a conversational and intuitive approach that insulates users from the complexity of AI algorithms. DS4ALL allows a dialogue-based approach that gives the user greater freedom of expression. It will enable them to communicate using natural language without requiring a high level of expertise on data-driven algorithms. User requests are interpreted and handled internally by the system in an automated manner, providing the user with the required output by masking the complexity of the data science workflow. The system can also collect feedback on the displayed results, leveraging these comments to address personalized data analysis sessions. The benefits of the envisioned system are discussed, and a use case is also presented to describe the innovative aspects.
In recent years, the number and heterogeneity of large scientific datasets have been growing steadily. Moreover, the analysis of these data collections is not a trivial task. There are many algorithms capable of analyzing large datasets, but parameters need to be set for each of them. Moreover, larger datasets also mean greater complexity. All this leads to the need to develop innovative, scalable, and parameter-free solutions. The goal of this research activity is to design and develop an automated data analysis engine that effectively and efficiently analyzes large collections of text data with minimal user intervention. Both parameter-free algorithms and self-assessment strategies have been proposed to suggest algorithms and specific parameter values for each step that characterizes the analysis pipeline. The proposed solutions have been tailored to text corpora characterized by variable term distributions and different document lengths. In particular, a new engine called ESCAPE (enhanced self-tuning characterization of document collections after parameter evaluation) has been designed and developed. ESCAPE integrates two different solutions for document clustering and topic modeling: the joint approach and the probabilistic approach. Both methods include ad hoc self-optimization strategies to configure the specific algorithm parameters. Moreover, novel visualization techniques and quality metrics have been integrated to analyze the performances of both approaches and to help domain experts interpret the discovered knowledge. Both approaches are able to correctly identify meaningful partitions of a given document corpus by grouping them according to topics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.