Nurul Hashimah Ahamed Hassain Malim scite author profile

Abstract. Turbo similarity searching uses information about the nearest neighbours in a conventional chemical similarity search to increase the effectiveness of virtual screening, with a data fusion approach being used to combine the nearest-neighbour information. A previous paper suggested that the approach was highly effective in operation; this paper further tests the approach using a range of different databases and of structural representations. Searches were carried out on three different databases of chemical structures, using seven different types of fingerprint, as well as molecular holograms, physicochemical properties, topological indices and reduced graphs. The results show that turbo similarity searching can indeed enhance retrieval but that this is normally achieved only if the similarity search that acts as its starting point has already achieved at least some reasonable level of search effectiveness. In other cases, a modified version of TSS that uses the nearest-neighbour information for approximate machine learning can be used effectively. Whilst useful for qualitative (active/inactive) predictions of biological activity, turbo similarity searching does not appear to exhibit any predictive power when quantitative property data is available.

show abstract

Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

Holliday

Kanoulas

Malim

et al. 2011

J Cheminform

View full text Add to dashboard Cite

BackgroundData fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption.ResultsSets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided.ConclusionsUsing multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening.

show abstract

Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting

Rostam

Malim

2021

Journal of King Saud University - Computer and Information Scie

View full text Add to dashboard Cite

The impact of big data market segmentation using data mining and clustering techniques

Yoseph

Malim

Heikkilä

et al. 2020

IFS

View full text Add to dashboard Cite

Targeted marketing strategy is a prominent topic that has received substantial attention from both industries and academia. Market segmentation is a widely used approach in investigating the heterogeneity of customer buying behavior and profitability. It is important to note that conventional market segmentation models in the retail industry are predominantly descriptive methods, lack sufficient market insights, and often fail to identify sufficiently small segments. This study also takes advantage of the dynamics involved in the Hadoop distributed file system for its ability to process vast dataset. Three different market segmentation experiments using modified best fit regression, i.e., Expectation-Maximization (EM) and K-Means++ clustering algorithms were conducted and subsequently assessed using cluster quality assessment. The results of this research are twofold: i) The insight on customer purchase behavior revealed for each Customer Lifetime Value (CLTV) segment; ii) performance of the clustering algorithm for producing accurate market segments. The analysis indicated that the average lifetime of the customer was only two years, and the churn rate was 52%. Consequently, a marketing strategy was devised based on these results and implemented on the departmental store sales. It was revealed in the marketing record that the sales growth rate up increased from 5% to 9%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.