More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method.In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis.In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings.Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix.For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests).The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS ® scale.The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications.The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
VIII IX
ResumoCada vez mais assistimos a um aumento global do número de métodos de apoio a decisão e diagnóstico assistido por computador, aplicados a diversas áreas da medicina.Na área de investigação do cancro da mama muitos são os trabalhos que têm sido desenvolvidos como segunda leitura de modo a reduzir o número de falsos positivos no diagnóstico.Neste estudo é apresentado um conjunto de técnicas de data mining que poderão ser aplicadas a um sistema de apoio à decisão na área do diagnóstico de cancro da mama.Esta abordagem tem por objetivo ajudar os clínicos na identificação de achados mamográficos como microcalcificações, massas e mesmo tecidos normais, de forma a evitar diagnósticos errados.Para isso, neste trabalho é usada uma base de dados fidedigna, de 410 imagens correspondentes a 115 pacientes, contendo análises prévias, realizadas por radiologistas, de microcalcificações, massas e tecidos considerados normais.Ao longo deste trabalho são utilizadas duas técnicas de extração de características, a matriz de coocorrência de níveis de cinza e a matriz de comprimento da linha de níveis de cinza.