2019
DOI: 10.1007/s11004-019-09839-z
|View full text |Cite
|
Sign up to set email alerts
|

Sample Truncation Strategies for Outlier Removal in Geochemical Data: The MCD Robust Distance Approach Versus t-SNE Ensemble Clustering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…The results, which contain the concentrations of 33 elements, were analyzed using Excel 2013 and SPSS 26 software. After separating the data of different ore zones (the oxide and sulfide ores of Black Hill and East Ridge ore zones), the outliers were omitted and replaced by adjacent data, the data then being normalized (Bá rdossy and Fodor, 2004;Kwak and Kim, 2017;Leung et al, 2021). The correlation coefficients and bivariate regression between the elements were calculated.…”
Section: Methodsmentioning
confidence: 99%
“…The results, which contain the concentrations of 33 elements, were analyzed using Excel 2013 and SPSS 26 software. After separating the data of different ore zones (the oxide and sulfide ores of Black Hill and East Ridge ore zones), the outliers were omitted and replaced by adjacent data, the data then being normalized (Bá rdossy and Fodor, 2004;Kwak and Kim, 2017;Leung et al, 2021). The correlation coefficients and bivariate regression between the elements were calculated.…”
Section: Methodsmentioning
confidence: 99%
“…As the merit of temperature modelling results is largely determined by the data set, and the data collection process itself suffers from problems such as inaccurate data and many confounding factors, there is often some anomalous data in the data collected in real time, which leads to increased modelling difficulties and reduced model building accuracy [11].In particular, the actual temperature acquisition images show large transient temperature variations, where individual temperature values are clearly anomalous and therefore need to be rejected prior to modelling.The pauta criterion (3σ criterion) is used to remove outliers for data cleaning in this paper.σ represents the standard deviation and μ represents the mean. It is generally accepted that the data take values almost exclusively within the (μ-3σ,μ+3σ) interval, and that those outside this range can be considered outliers, with a probability of being outside less than 0.3%.If the sample is X={x1,x2,…xn}, then the standard deviation is calculated according to Eq.…”
Section: Exception Data Pre-processingmentioning
confidence: 99%
“…For learning, 48,000+ assay samples were collected from exploration holes from a Pilbara iron-ore deposit located in Western Australia. Each sample consists of (c, g) where g ∈ Z denotes the geozone label, and c ∈ R 10 measures the chemical composition in terms of Fe, SiO 2 , Al 2 O 3 , P, LOI (loss on ignition), TiO 2 , MgO, Mn, CaO and S. The geochemical characteristics of this data is described in [30]. Samples were randomly split 60:40 to produce training and validation sets.…”
Section: Machine Learning Techniques For Likelihood Estimationmentioning
confidence: 99%
“…Following the approach in [30], the chemical data was ilr-transformed. Geozone classification performance associated with p(g | ilr(c)) is reported in Table 6.…”
Section: On the Use Of Isometric Log-ratio Transformmentioning
confidence: 99%
See 1 more Smart Citation