Carolyn Mair scite author profile

BACKGROUND -self evidently empirical analyses rely upon the quality of their data. Likewise replications rely upon accurate reporting and using the same rather than similar versions of data sets. In recent years there has been much interest in using machine learners to classify software modules into defectprone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research.OBJECTIVE -this short note investigates the extent to which published analyses based on the NASA defect data sets are meaningful and comparable. METHOD -we analyse the five studies published in IEEE Transactions on Software Engineering since2007 that have utilised these data sets and compare the two versions of the data sets currently in use.RESULTS -we find important differences between the two versions of the data sets, implausible values in one data set and generally insufficient detail documented on data set pre-processing.CONCLUSIONS -it is recommended that researchers (i) indicate the provenance of the data sets they use (ii) report any pre-processing in sufficient detail to enable meaningful replication and (iii) invest effort in understanding the data prior to applying machine learners.

show abstract

An investigation of machine learning based prediction systems

Mair

Kadoda

Lefley

et al. 2000

Journal of Systems and Software

194

View full text Add to dashboard Cite

Traditionally, researchers have used either o-the-shelf models such as COCOMO, or developed local models using statistical techniques such as stepwise regression, to obtain software eort estimates. More recently, attention has turned to a variety of machine learning methods such as arti®cial neural networks (ANNs), case-based reasoning (CBR) and rule induction (RI). This paper outlines some comparative research into the use of these three machine learning methods to build software eort prediction systems. We brie¯y describe each method and then apply the techniques to a dataset of 81 software projects derived from a Canadian software house in the late 1980s. We compare the prediction systems in terms of three factors: accuracy, explanatory value and con®gurability. We show that ANN methods have superior accuracy and that RI methods are least accurate. However, this view is somewhat counteracted by problems with explanatory value and con®gurability. For example, we found that considerable eort was required to con®gure the ANN and that this compared very unfavourably with the other techniques, particularly CBR and least squares regression (LSR). We suggest that further work be carried out, both to further explore interaction between the enduser and the prediction system, and also to facilitate con®guration, particularly of ANNs. Ó

show abstract

The consistency of empirical comparisons of regression and analogy-based software project cost prediction

Mair¹,

Shepperd²

View full text Add to dashboard Cite

OBJECTIVE-to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD-we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS-our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS-we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: "What is the best prediction system?"

show abstract

An analysis of data sets used to train and validate cost prediction systems

Mair¹,

Shepperd²,

Jørgensen³

2005

View full text Add to dashboard Cite

OBJECTIVE -to build up a picture of the nature and type of data sets being used to develop and evaluate different software project effort prediction systems. We believe this to be important since there is a growing body of published work that seeks to assess different prediction approaches. METHOD -we performed an exhaustive search from 1980 onwards from three software engineering journals for research papers that used project data sets to compare cost prediction systems. RESULTS -this identified a total of 50 papers that used, one or more times, a total of 71 unique project data sets. We observed that some of the better known and easily accessible data sets were used repeatedly making them potentially disproportionately influential. Such data sets also tend to be amongst the oldest with potential problems of obsolescence. We also note that only about 60% of all data sets are in the public domain. Finally, extracting relevant information from research papers has been time consuming due to different styles of presentation and levels of contextural information. CONCLUSIONS -first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need to assess the way results are presented in order to facilitate meta-analysis and whether a standard protocol would be appropriate.

show abstract

Using technology for enhancing reflective writing, metacognition and learning

Mair

2012

Journal of Further and Higher Education

View full text Add to dashboard Cite

There exists broad agreement on the value of reflective practice for personal and professional development. However, many students in higher education (HE) struggle with the concept of reflection, so they do not engage well with the process, and its full value is seldom realised. An online resource was developed to facilitate and structure the recording, storage and retrieval of reflections with the focus on facilitating reflective writing, developing metacognitive awareness and, ultimately, enhancing learning. Ten undergraduate students completed a semi-structured questionnaire prior to participating in a focus group designed to elicit a common understanding of reflective practice. They maintained reflective practice online for 6 weeks and participated in post-study individual interviews. Findings provide evidence for the positive acceptance, efficiency and effectiveness of the intervention. Using a structured approach to online reflective practice is empowering and ultimately enhances undergraduate learning through the development of metacognition

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Carolyn Mair

Data Quality: Some Comments on the NASA Software Defect Datasets

An investigation of machine learning based prediction systems

The consistency of empirical comparisons of regression and analogy-based software project cost prediction

An analysis of data sets used to train and validate cost prediction systems

Using technology for enhancing reflective writing, metacognition and learning

Contact Info

Product

Resources

About