The software developers and researchers have been facing difficulties regarding software development effort estimation (SDEE) since 1960s. Both overestimation and underestimation are problematic for future software development. The software engineering field is continuously adapting new technologies and development methodologies, so there is always a requirement to have an accurate SDEE method that can cater the needs of continually growing software industry. The major purpose of this state‐of‐the‐art review is to provide an additional insight of existing SDEE studies while considering five points of reference: techniques used to construct models, strengths and weaknesses of different models, availability of benchmark data sets, data set characteristics, generalization ability of models. We have performed a comprehensive review of SDEE studies published in the period 1981‐2016. We have defined a new scheme of categorizing existing SDEE models. We have found that a majority of available data sets do not include complete information of projects, which misleads the direction of research. To compare SDEE models, we recommend to use same data sets while focusing on specific aspects of accuracy as none of SDEE studies has yet been able to compare all the existing models over same data sets while considering same aspects of accuracy.
The software engineering researchers have worked on different dimensions to facilitate better software effort estimates, including those focusing on dataset quality improvement. In this research, we specially investigated the effectiveness of outlier removal to improve estimation performance of 5 machine learning (ML) methods (Support Vector Regression, Random Forest, Ridge Regression, K-Nearest Neighbor, and Gradient Boosting Machines) for software development effort estimation (SDEE). We propose a novel discretization method based on Golden Section (dubbed as Golden Section based Adaptive Discretization, GSAD) to identify optimal number of outliers for SDEE dataset. The results signify the importance of optimal number of outliers' removal to improve estimations. Moreover, the results obtained after applying GSAD technique have been compared with IQR and Cooks' distance based outlier identification methods over 4 datasets: ISBSG Release 2021, UCP, NASA93 and China. The empirical results confirm that the performance of ML based SDEE methods is generally improving by employing GSAD and the proposed GSAD method has the ability to compete with the other prevalent outlier identification methods.INDEX TERMS software development effort estimation, machine learning, discretization, outlier identification, golden section method.
Email {addhiwal_b05 1 , sgautam_b05 2 , aksingh_b05 3 , vijayk 4 } @iiita.ac.in Abstract -People are using P2P (Peer to Peer) network for sharing and transferring digital content containing video, audio, or any other data file over the internet from different part of the globe. All General P2P file sharing protocols were designed to work optimally in the case that all the peers have an end node on the internet i.e. they are connectible. But due to the huge number of computers behind NAT 1 and proxies this is rarely achievable. Due to this, the load is unevenly distributed between the connectible and non-connectible peers, and non-connectible peers usually suffer from slow download speeds, while connectible users suffer from too many uploads. In the case that all the peers are not connectible it is not possible to use P2P at all. In this paper, we present an entirely new p2p protocol which takes care of the deficiencies in P2P protocols. Low number of IPv4 addresses, hosts behind NAT, and the asymmetric property of broadband connections make most P2P protocols inefficient. We are proposing an email based P2P file sharing protocol which would be a huge improvement over the existing P2P networks since every node would be reachable, and it would be possible to send a file to multiple users without uploading it multiple times. And moreover if we use systems like Gmail and Yahoo then most of the mails would be transferred internally and much more efficiently, thus improving the overall efficiency of the internet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.