Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Schmarje, Lars; Brünger, Johannes; Santarossa, Monty; Schröder, Simon-Martin; Kiko, Rainer; Koch, Reinhard

doi:10.3390/s21196661

Cited by 13 publications

(32 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Schmarje et al reframes the handling of fuzzy labels as a semi-supervised learning problem by using a small set of certain images and a large number of fuzzy images that are treated as unlabeled data. 14 The authors apply the overclustering concept not only to improve classification accuracy on the labeled data, but improve the clustering, and therefore the identification, of substructures of fuzzy data (ambiguous, including intra or interoberver variability). Schmarje et al shows that their alternate inverse cross-entropy loss function and their overclustering allows them to cluster the fuzzy data to discover more meaningful substructures and therefore allow experts to analyze fuzzy images more consistently.…”

Section: Overclusteringmentioning

confidence: 99%

Iterative K-means clustering for disease subtype discovery

Aubert¹,

Huber²,

Furst³

et al. 2023

Medical Imaging 2023: Computer-Aided Diagnosis

View full text Add to dashboard Cite

Section: Overclusteringmentioning

confidence: 99%

Iterative K-means clustering for disease subtype discovery

Aubert¹,

Huber²,

Furst³

et al. 2023

Medical Imaging 2023: Computer-Aided Diagnosis

View full text Add to dashboard Cite

“…Other work [63,19,70] considers frameworks for learning from fuzzy human labels given possibly ambiguous data. Our dataset differs from these papers in that none of the listed datasets include images that are intentionally ambiguous, or depict more than a single object.…”

Section: Collecting Ambiguous Data In Computer Visionmentioning

confidence: 99%

Ambiguous Images With Human Judgments for Robust Visual Event Classification

Sanders¹,

Kriz²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data. 1 1 Dataset and code are available at https://katesanders9.github.io/ambiguous-images. 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.

show abstract

“…To overcome this issue, we chose to use the so-called overclustering approach, i.e., we divided the dataset into large (𝐾 = 100) number of clusters. This method is also applied for data clustering with fuzzy labels [24], which is what our data essentially is. The number of clusters 𝐾 = 100 is chosen following the expert suggestion of expected particle groups (10), with the intention to partition the hidden representation feature space into finely defined clusters containing semantically homogeneous sets of examples.…”

Section: Clustering Approachmentioning

confidence: 99%

Visual clustering of marine sediment particles using a combination of unsupervised machine learning methods

Krinitskiy¹,

Golikov²,

Борисов³

2022

Proceedings of the 6th International Workshop on Deep Learning in Computational Physics — PoS(DLCP2022)

View full text Add to dashboard Cite

The information on the past climates or environments is preserved in natural archives, such as, for example, marine sediments covering the sea-floor. The study of sediment composition in coarse fraction (>0.063 mm) is widely used, yet time-consuming technique useful for recognizing ancient environments. The coarse fraction analysis is generally performed visually under binocular microscope and requires the high qualification of the observer. In this study, we propose a method to automate and accelerate this kind of work using a combination of classic computer vision and machine learning algorithms. Using an optical digital microscope with precise automatic positioning system, we photographed sieved and dried sediment samples composed of particles over 0.1 mm in size. We then applied a clustering pipeline including classical and neural machine learning techniques. We demonstrate that the proposed method is capable of dividing visual representations of marine sediment grains into homogeneous groups suitable for further accurate classification by an experienced specialist. Our method may significantly reduce the time costs of an expert conducting a study of marine sediments. This will allow further evaluation of sediment composition, main sediment sources and some important characteristics (proxies/indicators) marking a particular environmental setting in the past. The clustering results obtained using our algorithm may be used to train a more accurate classification algorithm.

show abstract

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Cited by 13 publications

References 33 publications

Iterative K-means clustering for disease subtype discovery

Iterative K-means clustering for disease subtype discovery

Ambiguous Images With Human Judgments for Robust Visual Event Classification

Visual clustering of marine sediment particles using a combination of unsupervised machine learning methods

Contact Info

Product

Resources

About