Francisco J. Valverde-Albacete scite author profile

The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment.Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes.The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to “cheat” using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.

show abstract

Extending conceptualisation modes for generalised Formal Concept Analysis

Valverde-Albacete

Peláez-Moreno

2011

Information Sciences

View full text Add to dashboard Cite

Formal Concept Analysis (FCA) is an exploratory data analysis technique for boolean relations based on lattice theory. Its main result is the existence of a dual order isomorphism between two set lattices induced by a binary relation between a set of objects and a set of attributes. Pairs of dually isomorphic sets of objects and attributes, called formal concepts, form a concept lattice, but actually model only a conjunctive mode of conceptualisation.In this paper we augment this formalism in two ways: first we extend FCA to consider different modes of conceptualisation by changing the basic dual isomorphism in a modal-logic motivated way. This creates the three new types of concepts and lattices of extended FCA, viz., the lattice of neighbourhood of objects, that of attributes and the lattice of unrelatedness.Second, we consider incidences with values in idempotent semirings-concretely the completed max-plus or schedule algebra R max,+ -and focus on generalising FCA to try and replicate the modes of conceptualisation mentioned above.To provide a concrete example of the use of these techniques, we analyse the performance of multi-class classifiers by conceptually analysing their confusion matrices.

show abstract

Two information-theoretic tools to assess the performance of multi-class classifiers

Valverde-Albacete

Peláez-Moreno

2010

Pattern Recognition Letters

View full text Add to dashboard Cite

We develop two tools to analyze the behavior of multiple-class, or multi-class, classifiers by means of entropic measures on their confusion matrix or contingency table. First we obtain a balance equation on the entropies that captures interesting properties of the classifier. Second, by normalizing this balance equation we first obtain a 2-simplex in a three-dimensional entropy space and then the de Finetti entropy diagram or entropy triangle. We also give examples of the assessment of classifiers with these tools.

show abstract

Towards a Generalisation of Formal Concept Analysis for Data Mining Purposes

Valverde-Albacete

Peláez-Moreno

2006

View full text Add to dashboard Cite

Abstract. In this paper we justify the need for a generalisation of Formal Concept Analysis for the purpose of data mining and begin the synthesis of such theory. For that purpose, we first review semirings and semimodules over semirings as the appropriate objects to use in abstracting the Boolean algebra and the notion of extents and intents, respectively. We later bring to bear powerful theorems developed in the field of linear algebra over idempotent semimodules to try to build a Fundamental Theorem for K-Formal Concept Analysis, where K is a type of idempotent semiring. Finally, we try to put Formal Concept Analysis in new perspective by considering it as a concrete instance of the theory developed.

show abstract

Galois Connections Between Semimodules and Applications in Data Mining

Valverde-Albacete

Peláez-Moreno

View full text Add to dashboard Cite

Abstract. In [1] a generalisation of Formal Concept Analysis was introduced with data mining applications in mind, K-Formal Concept Analysis, where incidences take values in certain kinds of semirings, instead of the standard Boolean carrier set. A fundamental result was missing there, namely the second half of the equivalent of the main theorem of Formal Concept Analysis. In this continuation we introduce the structural lattice of such generalised contexts, providing a limited equivalent to the main theorem of K-Formal Concept Analysis which allows to interpret the standard version as a privileged case in yet another direction. We motivate our results by providing instances of their use to analyse the confusion matrices of multiple-input multiple-output classifiers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.