Proceedings of the 25th International Conference on Machine Learning - ICML '08 2008
DOI: 10.1145/1390156.1390272
|View full text |Cite
|
Sign up to set email alerts
|

Multi-classification by categorical features via clustering

Abstract: We derive a generalization bound for multiclassification schemes based on grid clustering in categorical parameter product spaces. Grid clustering partitions the parameter space in the form of a Cartesian product of partitions for each of the parameters. The derived bound provides a means to evaluate clustering solutions in terms of the generalization power of a built-on classifier. For classification based on a single feature the bound serves to find a globally optimal classification rule. Comparison of the g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
10
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 7 publications
1
10
0
Order By: Relevance
“…In fact, many state‐of‐the‐art algorithms search for a weighted combination of simpler rules (Germain et al, ): bagging (Breiman, , ), boosting (Schapire et al, ; Schapire & Singer, ), and Bayesian approaches (Gelman et al, ) or even Kernel methods (Vapnik, ) and neural networks (Bishop, ). The major open problem in this scenario is how to weight the different rules in order to obtain good performance (Berend & Kontorovitch, ; Catoni, ; Lever et al, , ; Nitzan & Paroush, ; Parrado‐Hernández et al, ), how these performances can be assessed (Catoni, ; Donsker & Varadhan, ; Germain et al, , ; Lacasse et al, ; Langford & Seeger, ; Laviolette & Marchand, , ; Lever et al, , ; London et al, ; McAllester, , , ; Shawe‐Taylor & Williamson, ; Tolstikhin & Seldin, ; Van Erven, ), and how this theoretical framework can be exploited for deriving new learning approaches or for applying it in other contexts (Audibert, ; Audibert & Bousquet, ; Bégin et al, ; Germain et al, ; McAllester, ; Morvant, ; Ralaivola et al, ; Roy et al, ; Seeger, , ; Seldin et al, , ; Seldin & Tishby, , ; Shawe‐Taylor & Langford, ). The PAC‐Bayes approach is one of the sharpest analysis frameworks in this context, since it can provide tight bounds on the risk of the Gibbs classifier (GC), also called randomized (or probabilistic) classifier, and the Bayes classifier (BC), also called weighted majority vote classifier (Germain et al, ).…”
Section: Pac‐bayes Theorymentioning
confidence: 99%
“…In fact, many state‐of‐the‐art algorithms search for a weighted combination of simpler rules (Germain et al, ): bagging (Breiman, , ), boosting (Schapire et al, ; Schapire & Singer, ), and Bayesian approaches (Gelman et al, ) or even Kernel methods (Vapnik, ) and neural networks (Bishop, ). The major open problem in this scenario is how to weight the different rules in order to obtain good performance (Berend & Kontorovitch, ; Catoni, ; Lever et al, , ; Nitzan & Paroush, ; Parrado‐Hernández et al, ), how these performances can be assessed (Catoni, ; Donsker & Varadhan, ; Germain et al, , ; Lacasse et al, ; Langford & Seeger, ; Laviolette & Marchand, , ; Lever et al, , ; London et al, ; McAllester, , , ; Shawe‐Taylor & Williamson, ; Tolstikhin & Seldin, ; Van Erven, ), and how this theoretical framework can be exploited for deriving new learning approaches or for applying it in other contexts (Audibert, ; Audibert & Bousquet, ; Bégin et al, ; Germain et al, ; McAllester, ; Morvant, ; Ralaivola et al, ; Roy et al, ; Seeger, , ; Seldin et al, , ; Seldin & Tishby, , ; Shawe‐Taylor & Langford, ). The PAC‐Bayes approach is one of the sharpest analysis frameworks in this context, since it can provide tight bounds on the risk of the Gibbs classifier (GC), also called randomized (or probabilistic) classifier, and the Bayes classifier (BC), also called weighted majority vote classifier (Germain et al, ).…”
Section: Pac‐bayes Theorymentioning
confidence: 99%
“…Furthermore, constraint (*) ensures that the information encoded in spikes is reliable. Applying a PAC-Bayes inspired variant of Ockham's razor [15], it can be shown that the higher the effective information generated by spikes, the smaller the difference between the empirical reward estimate R m and expected reward R m :…”
Section: Cooperative Learning In Abstractmentioning
confidence: 99%
“…The formulation and analysis of graph clustering presented here are based on the analysis of coclustering suggested in (Seldin and Tishby, 2008;Seldin, 2009), which is reviewed briefly in section 2. In section 3 we adapt the analysis to derive PAC-Bayesian generalization bound for the graph clustering problem.…”
Section: Introductionmentioning
confidence: 99%
“…This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008;Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes.…”
mentioning
confidence: 99%