Optimal training and test sets design for machine learning

Genç, Burkay; Tunç, Hüseyin

doi:10.3906/elk-1807-212

Cited by 29 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, each of the five categories was made up of 1349 images, for a total of 6745 images. Of these, 80% were randomly selected for training, 10% for validation and 10% for testing [16].…”

Section: Methodsmentioning

confidence: 99%

Convolutional neural networks for the recognition of diseases and pests in Cassava leaves (Manihot esculenta)

Gómez-Pupo¹,

Saucedo²,

Fennix-Agudelo³

et al. 2022

Proceedings of the 20th LACCEI International Multi-Conference for Engineering, Education and Technology: “Education, Research A

View full text Add to dashboard Cite

In this work, we recognized cassava diseases and pests, by means of convolutional neural networks, as a way to avoid the spread of pathogens, prevent economic losses, and favor decision-making for a proper disease management. For the development of this system, VGG16, ResNet50 and Xception models were chosen for having displayed good performance in previous researches of disease classification in plants, which we considered very similar to our case of study. For the training procedure, a transfer learning technique was implemented, employing a database categorized by cassava diseases (bacterial blight, brown streak, green mite, mosaic disease), as well as healthy leaves. This database was balanced and refined manually, selecting the images that represented characteristics of each category, according to the description found in the existing literature. Finally, the best model was chosen taking into account its performance measured through the Accuracy metric. The best model obtained, which we propose in this work, was Xception, and was trained during a period of 35 epochs with 6120 images of cassava leaves, achieving an accuracy of 94.56% . This model provides an option to detect cassava leaf diseases early, reliably and cost-effectively.

show abstract

Section: Methodsmentioning

confidence: 99%

Convolutional neural networks for the recognition of diseases and pests in Cassava leaves (Manihot esculenta)

Gómez-Pupo¹,

Saucedo²,

Fennix-Agudelo³

et al. 2022

Proceedings of the 20th LACCEI International Multi-Conference for Engineering, Education and Technology: “Education, Research A

View full text Add to dashboard Cite

show abstract

“…To estimate the model, the data set can be divided into training and test data in ratios such as 1:1, 2:1 70:30, 60:40 [57], 66:34 [18] according to the user's purpose. Here, it is generally preferred that the training set consists of as much data as possible in order to obtain a stronger model [18,57,58]. A certain amount of the data set (20% -30%) is kept for testing data, which is called the storage procedure, and then the remaining amount can be used for training.…”

Section: Evaluation Of Datamentioning

confidence: 99%

Data Mining, Weka Decision Trees

Duran,

Akargöl,

Doğan

2023

OPRD

View full text Add to dashboard Cite

Nowadays, computer technologies are increasing rapidly. Thanks to the development of computer technologies, large and complex raw data sets can be transformed into useful information with different analysis techniques. Different algorithms developed thanks to computer technologies can offer different solutions to scientists and users working in different branches of science, especially engineering sciences, mathematics, medicine, industry, financial/economic fields, marketing, education, multimedia and statistics. Thanks to these solutions, it is possible to easily achieve the desired goals and objectives. Thus, by correctly managing and analyzing existing data in large and complex raw data datasets, accurate predictions can be made to be used in similar problems in the future. Data sets are analyzed and evaluated using different methods. It is also possible that the classification of data during the analysis and evaluation stages of data sets significantly affects the decision-making process regarding the work to be done. Classification of data can be done by statistical method or data mining method. Decision trees, which can be used to classify numerical and alphanumeric data, generally provide a great advantage for decision makers in terms of easy interpretation and understandability compared to other classification techniques. For these reasons, in this study, decision trees, one of the most used classification techniques in data mining, are mentioned.

show abstract

“…Machine learning (ML) depends on computational statistics, the main idea of ML is making predictions using computers. Machine learning algorithms create a mathematical predictive model that depends on a sample of data, known as "training dataset" [1]. Also, predictions or decisions made without explicit programming is another benefit of machine learning.…”

Section: Introductionmentioning

confidence: 99%

Machine Learning Algorithms to Classify Water Levels for Smart Irrigation Systems

Ali

2022

Journal of Engineering Research

View full text Add to dashboard Cite

Agriculture is the main source of food. With the passing of time, there are dangers in order to preserve on the freshwater in agriculture sector. Thus, one of solutions to save the freshwater is enhancing the wastewater. Machine learning (ML) algorithms are used in several applications, such as smart irrigation, to reduce freshwater loss via building highperformance ML algorithms. This paper proposes four algorithms: support vector machine (SVM), decision tree (DT), SVM with Adaboost, and DT with Adaboost to classify water levels of sprinklers for smart irrigation. Here, five levels of water are classified-Max, High, Medium, Low, and Stop. The proposed algorithms are tested to obtain which algorithm achieves better performance and higher accuracy. Five steps sequentially are implemented on the used dataset via Pandas and Scikit-learn frameworks. The steps are preprocessing data, feature selection, feature scaling, training, and classification; to analyze the performance of the algorithms. The results showed that the DT algorithm with Adaboost is the best algorithm compared to the rest of the algorithms. The DT algorithm achieves an accuracy score of 0.912 with a shorter testing time of 2.2 seconds and mean square error (MSE) of 0.08.

show abstract

Optimal training and test sets design for machine learning

Cited by 29 publications

References 16 publications

Convolutional neural networks for the recognition of diseases and pests in Cassava leaves (Manihot esculenta)

Convolutional neural networks for the recognition of diseases and pests in Cassava leaves (Manihot esculenta)

Data Mining, Weka Decision Trees

Machine Learning Algorithms to Classify Water Levels for Smart Irrigation Systems

Contact Info

Product

Resources

About