Zeyd Boukhers scite author profile

Zeyd Boukhers

5Publications

59Citation Statements Received

78Citation Statements Given

How they've been cited

How they cite others

Affiliations

University Hospital Cologne, University of Koblenz and Landau, University of Siegen

Publications

Order By: Most citations

Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy

Koltcov

Ignatenko

Boukhers

et al. 2020

Entropy

View full text Add to dashboard Cite

Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models—Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)—we, first of all, show that the minimum of Renyi entropy coincides with the “true” number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.

show abstract

EXCITE – A Toolchain to Extract, Match and Publish Open Literature References

Hosseini

Ghavimi

Boukhers

et al. 2019

View full text Add to dashboard Cite

An End-to-End Approach for Extracting and Segmenting High-Variance References from PDF Documents

Boukhers

Ambhore

Staab

2019

View full text Add to dashboard Cite

This paper addresses the problem of extracting and segmenting references from PDF documents. The novelty of the presented approach lies in its capability to discover highly varying references mainly in terms of content, length and location in the document. Unlike existing works, the proposed method does not follow the classical pipeline that consists of sequential phases. It rather learns the different characteristics of references to be used in a coherent scheme that reduces the error accumulation by following a probabilistic approach. Contrary to conventional references, mentioning the sources of information in some publications, such as those of social science, is not subject to the same specifications such as being located in a unique reference section. Therefore, the proposed method aims to extract references of highly varying reference characteristics by relaxing the restrictions of existing methods. Additionally, we present in this paper a new challenging dataset of annotated references in German social science publications. The main purpose of this work is to serve the indexation of missing references by extracting them from challenging publications such as those of German social science. The effectiveness of the presented methods in terms of both extraction and segmentation is evaluated on different datasets, including the German social science set.

show abstract

Fractal approach for determining the optimal number of topics in the field of topic modeling.

Ignatenko

Koltcov

Staab

et al. 2019

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Environmental Microbiological Content-Based Image Retrieval System Using Internal Structure Histogram

Zou

Boukhers

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zeyd Boukhers

Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy

EXCITE – A Toolchain to Extract, Match and Publish Open Literature References

An End-to-End Approach for Extracting and Segmenting High-Variance References from PDF Documents

Fractal approach for determining the optimal number of topics in the field of topic modeling.

Environmental Microbiological Content-Based Image Retrieval System Using Internal Structure Histogram

Contact Info

Product

Resources

About