Introduction: The hypothesis of this study is that prognostic and predictive markers defined by immunohistochemistry (IHC) and other histopathologic characteristics of a primary breast carcinoma tumour, together with gene expression profiling of the tumour, DNA methylation profiling and microRNA (miRNA) will identify lymph node status in breast cancer patients and allow selection of patients for sentinel lymph node biopsy (SLNB) or axillary lymph node dissection (ALND) in addition to providing prognostic information.Methods: A web-based database of breast cancer patients was constructed using Distiller, SlidePath (Leica Microsystems, Germany) software. A MindMap was used to determine subgroupings and structure of the database. Three patient cohorts were identified from separate sources and patient records were identified using Systematized Nomenclature of Medicine (SNOMED) coding as a search parameter. Data was extracted into Microsoft Excel and uploaded into Distiller. 3200 patient reports were retrieved from a private pathology database, 1834 from a public hospital and 5200 from a separate public pathology hospital. Haematoxylin and eosin (H&E) slides from 645 patients were reviewed from the initial dataset to confirm the imported data and reliability of pathology information. This included tumour type, tumour grade and size, presence or absence of lymphatic and vascular permeation, margin type, lymphocytic infiltration, lymph node status and number of involved lymph nodes. Survival data was also obtained from the Queensland Cancer Registry under HREC approval to evaluate applications of the data for prognostic information. A breast cancer cluster was incorporated into the project and evaluated to ascertain if there were commonalities between the tumours which would offer potential insight into causes of breast cancer or prognostic information.Tissue microarrays (TMAs) were constructed to enable evaluation of immunohistochemistry markers and TMAs were evaluated for correlation with clinical parameters. Epigenetic biomarkers were also evaluated and correlated with different parameters in triple negative breast carcinoma (TNBC).Results: Biomarkers evaluated on TMAs successfully correlated with clinical outcome in breast carcinoma patients and an artificial neural network (ANN) model was constructed to predict modular grade in breast cancer patients. This ANN model enabled the subdivision of Grade 2 breast carcinomas into Grade 1 and Grade 3 carcinomas and correlated with patient survival. Epigenetics models were also able to stratify TNBC patients into prognostic groups. 3 However, an ANN was not able to be successfully developed to predict lymph node status and similarly epigenetics was also not able to predict lymph node status. An analysis of a breast cancer cluster also showed that the cluster tumours were similar to sporadic breast cancers and a distinctive pattern was not identified.The project has enabled the identification of markers to subclassify breast carcinomas into a molecular grade and ...