A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Al-Helali, Baligh; Chen, Qi; Xue, Bing; Zhang, Mengjie

doi:10.1007/s00500-021-05590-y

Cited by 48 publications

(12 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then, it repeatedly creates new individuals with the crossover, mutation and reproduction operators, until some stopping criteria are met. GP is a powerful search mechanism that has been successfully applied to various problems [30], [14]. Since computer heuristics are computer programs in nature, GP can be utilised as a Hyper Heuristics (GPHH) for evolving them automatically.…”

Section: B Related Work 1) Methods For Ucarpmentioning

confidence: 99%

See 1 more Smart Citation

Knowledge Transfer Genetic Programming With Auxiliary Population for Solving Uncertain Capacitated Arc Routing Problem

Ardeh

Mei

Zhang

et al. 2023

IEEE Trans. Evol. Computat.

Self Cite

View full text Add to dashboard Cite

The uncertain capacitated arc routing problem is an NP-hard combinatorial optimisation problem with a wide range of applications in logistics domains. Genetic programming hyperheuristic has been successfully applied to evolve routing policies to effectively handle the uncertain environment in this problem. The real world usually encounters different but related instances due to events like season change and vehicle breakdowns, and it is desirable to transfer knowledge gained from solving one instance to help solve another related one. However, the solutions found by the genetic programming process can lack diversity, and the existing methods use the transferred knowledge mainly during initialisation. Thus, they cannot sufficiently handle the change from the source to the target instance. To address this issue, we develop a novel knowledge transfer genetic programming with an auxiliary population. In addition to the main population for the target instance, we initialise an auxiliary population using the transferred knowledge and evolve it alongside the main population. We develop a novel scheme to carefully exchange the knowledge between the two populations, and a surrogate model to evaluate the auxiliary population efficiently. The experimental results confirm that the proposed method performed significantly better than the state-of-the-art genetic programming approaches for a wide range of uncertain arc routing instances, in terms of both final performance and convergence speed.

show abstract

Section: B Related Work 1) Methods For Ucarpmentioning

confidence: 99%

“…GP-based transfer learning and optimisation has been applied to a range of problems including symbolic regression [14], [15] and UCARP [16], [17]. However, previous studies have found that the GP process can lose its population diversity [16], [18], [19].…”

Section: Introductionmentioning

confidence: 99%

Knowledge Transfer Genetic Programming With Auxiliary Population for Solving Uncertain Capacitated Arc Routing Problem

Ardeh

Mei

Zhang

et al. 2023

IEEE Trans. Evol. Computat.

Self Cite

View full text Add to dashboard Cite

show abstract

“…While the other hand, the unsupervised algorithm learns on its own, from the unlabelled data to extract features and patterns for missing data [14]. In some cases, hybrid approaches [15][16][17][18][19], have been utilized to solve the weaknesses of the traditional supervised and unsupervised imputation methods. However, it is important to note that the only suitable solution comes down to a virtuous design and good analysis [20].…”

Section: Introductionmentioning

confidence: 99%

A Survey On Missing Data in Machine Learning

Emmanuel

Maupong

Mpoeleng

et al. 2021

Preprint

View full text Add to dashboard Cite

Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

show abstract

“…GP takes a population of candidate solutions (computer programs) and progressively evolves better solutions by applying operators analogous to natural genetic processes such as mutation and crossover on the population. Typical applications of GP includes classification [218,259,194,32] and regression [10,9,22,96].…”

Section: Genetic Programming (Gp)mentioning

confidence: 99%

“…In [10], GP is hybridized with KNN to develop an imputation method that estimates the missing values. The KNN method is used to retrieve instances similar to the incomplete data and GP uses those retrieved instances to build regression models that predict the missing values.…”

Section: Gp and Symbolic Regressionmentioning

confidence: 99%

Automating Behavior-based Ransomware Analysis, Detection, and Classification Using Machine Learning

Abbasi¹

View full text Add to dashboard Cite

Ransomware is malware that hijacks a victim's data using encryption and demands a ransom in exchange for the decryption key. Ransomware has gained prominence due to its attack vector and the irreversible nature of damage to data. Ransomware has indiscriminately attacked individuals and organizations worldwide, disrupting their businesses and services. The number of successful ransomware attacks across the globe highlights the inadequacy of existing ransomware defense. Static and dynamic analysis are two popular approaches to malware analysis. The former does not require execution of the malware binary, whereas the latter requires executing the binary in a controlled environment. Static analysis-based detection, e.g., signature-matching, is widely adopted by commercial antivirus solutions but can be thwarted by evasion techniques, e.g., polymorphism and code obfuscation, utilized in modern malware. Consequently, dynamic analysis-based or behavior-based detection approaches have gained popularity because malware behavior cannot be changed entirely across its variants. Both signature and behavior-based detection complement each other. Behavior-based ransomware detection comes with certain challenges and problems, such as data high-dimensionality that occurs because a process may execute thousands of API calls per second. Manual inspection of these API calls for feature engineering requires an expert and is a time-intensive task. Another problem with some existing ransomware detection models is the reliance on handcrafted malice scoring functions that assign scores to the processes describing their threat levels. Other challenges to ransomware detection research include the limited availability of ransomware data sets that can be used with Machine Learning (ML) methods and their reuse scope. The scope of reuse of available data sets is limited because of their format, e.g., sequential data may be used with recurrent neural networks but not with commonly used ML-based classification algorithms, and focus, e.g., network activity and filesystem activity. For the above-mentioned reasons, ransomware detection research is generally followed by ransomware analysis. However, to the best of our knowledge, not many of the existing ransomware behavior analysis studies discuss the challenges involved in the process. This thesis aims to automate the solutions to the problems related to ransomware behavior detection and classification using evolutionary computation methods, i.e., particle swarm optimization and genetic programming, and deep neural networks, i.e., long short-term memory. This thesis proposes a wrapper feature selection method to address the high dimensionality in ransomware behavior data. The proposed method utilizes particle swarm optimization to automatically select a suitable number of features from each feature group and therefore does not require expert input. This thesis further proposes an automated method of evolving malice scoring models for ransomware detection. The proposed method formulates the problem as a symbolic regression problem and solves it using genetic programming. Unlike existing methods, the proposed method does not require expert knowledge to design the model. Furthermore, this thesis proposes an automated behavior analysis framework for highlighting challenges associated with ransomware behavior analysis and solutions to these challenges. Finally, this thesis proposes a new representation of the API call sequences that combines the API call names and important call arguments. The proposed representation of the API call sequences helped improve ransomware early detection performance. All the methods proposed in this thesis either automate the existing manual solutions or achieve comparable or better performance compared to existing methods.

show abstract

A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data

Cited by 48 publications

References 38 publications

Knowledge Transfer Genetic Programming With Auxiliary Population for Solving Uncertain Capacitated Arc Routing Problem

Knowledge Transfer Genetic Programming With Auxiliary Population for Solving Uncertain Capacitated Arc Routing Problem

A Survey On Missing Data in Machine Learning

Automating Behavior-based Ransomware Analysis, Detection, and Classification Using Machine Learning

Contact Info

Product

Resources

About