Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables

Blackard, Jock A.; Dean, Denis J.

doi:10.1016/s0168-1699(99)00046-0

Cited by 401 publications

(215 citation statements)

References 13 publications

Supporting

Mentioning

209

Contrasting

Unclassified

Order By: Relevance

“…This spatial data set contains 581,012 examples with 54 attributes and 7 target classes and represents the forest cover type for 30 x 30 meter cells obtained from US Forest Service (USFS) Region 2 Resource Information System [14]. In Covertype data set, 40 attributes are binary columns representing soil type, 4 attributes are binary columns representing wilderness area, and the remaining 10 are continuous topographical attributes.…”

Section: Resultsmentioning

confidence: 99%

Data Reduction Using Multiple Models Integration

Lazarević

Obradović

2001

Principles of Data Mining and Knowledge Discovery

View full text Add to dashboard Cite

Abstract. Large amount of available information does not necessarily imply that induction algorithms must use all this information. Samples often provide the same accuracy with less computational cost. We propose several effective techniques based on the idea of progressive sampling when progressively larger samples are used for training as long as model accuracy improves. Our sampling procedures combine all the models constructed on previously considered data samples. In addition to random sampling, controllable sampling based on the boosting algorithm is proposed, where the models are combined using a weighted voting. To improve model accuracy, an effective pruning technique for inaccurate models is also employed. Finally, a novel sampling procedure for spatial data domains is proposed, where the data examples are drawn not only according to the performance of previous models, but also according to the spatial correlation of data. Experiments performed on several data sets showed that the proposed sampling procedures outperformed standard progressive sampling in both the achieved accuracy and the level of data reduction.

show abstract

Section: Resultsmentioning

confidence: 99%

Data Reduction Using Multiple Models Integration

Lazarević

Obradović

2001

Principles of Data Mining and Knowledge Discovery

View full text Add to dashboard Cite

show abstract

“…The RUSBoost (Random Under Sampling) algorithm is designed to classify when one class has many more observations than another and good reference results have been obtained (Seiffert et al 2010). Blackard and Dean (1999) describe an ANN classification of an imbalanced dataset achieving 70.6% accuracy, whereas RUSBoost obtained over 76% classification accuracy. The majority of class-imbalance learning techniques currently implemented, including RUSBoost, have been designed for two-class problems.…”

Section: Ensemble Design and Algorithm Implementationmentioning

confidence: 99%

Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems

Mounce

Ellis

Edwards

et al. 2017

Water Resour Manage

View full text Add to dashboard Cite

Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a 'black box' output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: 'Nowcast' (situation at end of calendar year) and 'Futurecast' (predict end of next year situation from this year's data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance.

show abstract

“…(22)(23)(24)(25), the term sign(x i × y i ) is common and provides the correlation information between the two vectors x and y. As a result, the computational complexity and power consumption due to signal analysis can be decreased significantly.…”

Section: Similarity Measures Based On Surface Gradientsmentioning

confidence: 99%

“…In the original paper, 58 % success rate is obtained using linear discriminant analysis and 70 % success rate is obtained using neural networks [22]. We use the same setup, which has 11340 training samples.…”

mentioning

confidence: 99%

Energy efficient cosine similarity measures according to a convex cost function

Akbas

Günay

Taşdemir³

et al. 2016

SIViP

View full text Add to dashboard Cite

We propose a new family of vector similarity measures. Each measure is associated with a convex cost function. Given two vectors, we determine the surface normals of the convex function at the vectors. The angle between the two surface normals is the similarity measure. Convex cost function can be the negative entropy function, total variation (TV) function and filtered variation function constructed from wavelets. The convex cost functions need not to be differentiable everywhere. In general, we need to compute the gradient of the cost function to compute the surface normals. If the gradient does not exist at a given vector, it is possible to use the sub-gradients and the normal producing the smallest angle between the two vectors is used to compute the similarity measure. The proposed measures are compared experimentally to other nonlinear similarity measures and the ordinary cosine similarity measure. The TV-based vector product is more energy efficient than the ordinary inner product because it does not require any multiplications.

show abstract

Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables

Cited by 401 publications

References 13 publications

Data Reduction Using Multiple Models Integration

Data Reduction Using Multiple Models Integration

Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems

Energy efficient cosine similarity measures according to a convex cost function

Contact Info

Product

Resources

About