A classification system of day 3 human embryos using deep learning

Wu, Chongwei; Yan, Wei; Li, Hongtu; Li, Jiaxin; Wang, Hongkai; Chang, Shijie; Yu, Tao; Jin, Ying; Ma, Chao; Luo, Yahong; Yi, Dongxu; Jiang, Xiran

doi:10.1016/j.bspc.2021.102943

Cited by 19 publications

(8 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ten out of 20 studies (50%) performed the classification with a solely image-based input system ( Rad et al , 2018 , 2019 ; Kragh et al , 2019 ; Chavez-Badiola et al , 2020 ; VerMilyea et al , 2020 ; Bormann et al , 2020a , 2021 ; Coticchio et al , 2021 ; Liao et al , 2021 ; Wu et al , 2021 ). These studies used a CNN as a backbone to create predictions.…”

Section: Resultsmentioning

confidence: 99%

Embryo selection through artificial intelligence versus embryologists: a systematic review

Salih,

Austin,

Warty

et al. 2023

Human Reproduction Open

View full text Add to dashboard Cite

STUDY QUESTION What is the present performance of artificial intelligence (AI) decision support during embryo selection compared to the standard embryo selection by embryologists? SUMMARY ANSWER AI consistently outperformed the clinical teams in all the studies focused on embryo morphology and clinical outcome prediction during embryo selection assessment. WHAT IS KNOWN ALREADY The ART success rate is ∼30%, with a worrying trend of increasing female age correlating with considerably worse results. As such, there have been ongoing efforts to address this low success rate through the development of new technologies. With the advent of AI, there is potential for machine learning to be applied in such a manner that areas limited by human subjectivity, such as embryo selection, can be enhanced through increased objectivity. Given the potential of AI to improve IVF success rates, it remains crucial to review the performance between AI and embryologists during embryo selection. STUDY DESIGN, SIZE, DURATION The search was done across PubMed, EMBASE, Ovid Medline, and IEEE Xplore from 1 June 2005 up to and including 7 January 2022. Included articles were also restricted to those written in English. Search terms utilized across all databases for the study were: (‘Artificial intelligence’ OR ‘Machine Learning’ OR ‘Deep learning’ OR ‘Neural network’) AND (‘IVF’ OR ‘in vitro fertili*’ OR ‘assisted reproductive techn*’ OR ‘embryo’), where the character ‘*’ refers the search engine to include any auto completion of the search term. PARTICIPANTS/MATERIALS, SETTING, METHODS A literature search was conducted for literature relating to AI applications to IVF. Primary outcomes of interest were accuracy, sensitivity, and specificity of the embryo morphology grade assessments and the likelihood of clinical outcomes, such as clinical pregnancy after IVF treatments. Risk of bias was assessed using the Modified Down and Black Checklist. MAIN RESULTS AND THE ROLE OF CHANCE Twenty articles were included in this review. There was no specific embryo assessment day across the studies—Day 1 until Day 5/6 of embryo development was investigated. The types of input for training AI algorithms were images and time-lapse (10/20), clinical information (6/20), and both images and clinical information (4/20). Each AI model demonstrated promise when compared to an embryologist’s visual assessment. On average, the models predicted the likelihood of successful clinical pregnancy with greater accuracy than clinical embryologists, signifying greater reliability when compared to human prediction. The AI models performed at a median accuracy of 75.5% (range 59–94%) on predicting embryo morphology grade. The correct prediction (Ground Truth) was defined through the use of embryo images according to post embryologists’ assessment following local respective guidelines. Using blind test datasets, the embryologists’ accuracy prediction was 65.4% (range 47–75%) with the same ground truth provided by the original local respective assessment. Similarly, AI models had a median accuracy of 77.8% (range 68–90%) in predicting clinical pregnancy through the use of patient clinical treatment information compared to 64% (range 58–76%) when performed by embryologists. When both images/time-lapse and clinical information inputs were combined, the median accuracy by the AI models was higher at 81.5% (range 67–98%), while clinical embryologists had a median accuracy of 51% (range 43–59%). LIMITATIONS, REASONS FOR CAUTION The findings of this review are based on studies that have not been prospectively evaluated in a clinical setting. Additionally, a fair comparison of all the studies were deemed unfeasible owing to the heterogeneity of the studies, development of the AI models, database employed and the study design and quality. WIDER IMPLICATIONS OF THE FINDINGS AI provides considerable promise to the IVF field and embryo selection. However, there needs to be a shift in developers’ perception of the clinical outcome from successful implantation towards ongoing pregnancy or live birth. Additionally, existing models focus on locally generated databases and many lack external validation. STUDY FUNDING/COMPETING INTERESTS This study was funded by Monash Data Future Institute. All authors have no conflicts of interest to declare. REGISTRATION NUMBER CRD42021256333

show abstract

Section: Resultsmentioning

confidence: 99%

Embryo selection through artificial intelligence versus embryologists: a systematic review

Salih,

Austin,

Warty

et al. 2023

Human Reproduction Open

View full text Add to dashboard Cite

show abstract

“…Our group has already proposed a deep ensemble learning (EL) model 9 for classifying embryos in a subset of 699 images. The subset did not include the 35 images of the independent dataset with poor quality.…”

Section: Resultsmentioning

confidence: 99%

“…Convolutional neural network (CNN) is a typical method of automated extracting features by use of 2D or 3D convolution in a learning step, and it has achieved great success in computer vision and image processing. [6][7][8] Inspired by the remarkable successes of CNNs, several CNN-based systems [9][10][11][12][13][14][15] have been proposed for classifying and assessing human embryos. Wu et al 9 in our group first employed the DenseNet169, Inception V3, ResNet50, and VGG19 to classify the embryos.…”

Section: Introductionmentioning

confidence: 99%

“…[6][7][8] Inspired by the remarkable successes of CNNs, several CNN-based systems [9][10][11][12][13][14][15] have been proposed for classifying and assessing human embryos. Wu et al 9 in our group first employed the DenseNet169, Inception V3, ResNet50, and VGG19 to classify the embryos. Then, they combined the features extracted from all four networks for better classification by using a logistic regression model.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Cascaded networks for the embryo classification on microscopic images using the residual external‐attention

Guo

Liu

Gong

et al. 2022

Int J Imaging Syst Tech

Self Cite

View full text Add to dashboard Cite

Embryo assessment and selection are usually based on the visual morphological analysis by expert embryologists. Although the embryologist assessment has been routinely used in clinical practice, it is highly dependent on the embryologist's experience and is very time‐consuming. Therefore, objective and efficient methods for automated embryo evaluation are in high demand. We proposed a framework of cascaded networks to hierarchically extract and integrate the microscopic image features for embryo classification. The cascaded networks consisted of a coarse network and a refined network. The coarse network produced a classification activation mapping (CAM) with the highest classification probability, which indicated the most discriminative regions of embryo classification. The refined network extracted and integrated the image features again by using both the CAMs and the corresponding original images. In addition, the residual external‐attention block (ResEA) was used in the refined network to better capture long‐range dependencies. Our cascaded networks were trained on a dataset of 7728 microscopic images of day 3 embryos from 1800 couples and evaluated on an independent testing dataset of 734 microscopic images. The accuracy, sensitivity, specificity, precision, and F1‐score were employed to evaluate the performance of our cascaded networks. Compared with the coarse network and the refined network, respectively, the cascaded networks without the ResEA improved the classification results of embryos. The ResEA block helped the cascaded networks to further improve all five metrics for better embryo classification. Our proposed cascaded networks also achieved better classification results than a junior embryologist did. The cascaded networks hierarchically make full use of image features for more effective learning, and the ResEA further improves the performance of embryo classification.

show abstract

“…Most of these models take images as input, for instance, to evaluate sperm motility, concentration, and morphology for selecting high-quality sperm for fertilization [9][10][11] or for diagnosing male infertility [12][13][14], to help identify and distinguish sperm and debris in testicular sperm samples [15,16], or to examine the quality of oocytes [17]. Models have also been developed to use embryo images or time-lapse videos to grade embryos [18,19] and to predict treatment outcomes such as implantation [20], pregnancy [21], and live birth [22][23][24].…”

Section: Introductionmentioning

confidence: 99%

Testing the reproducibility and effectiveness of deep learning models among clinics: sperm detection as a pilot study

Wang,

Jin,

Jiang

et al. 2024

Preprint

View full text Add to dashboard Cite

Background: Deep learning has been increasingly investigated for assisting clinical in vitro fertilization (IVF). The first technical step in many tasks is to visually detect and locate sperm, oocytes, and embryos in images. For clinical deployment of such deep learning models, different clinics use different image acquisition hardware and different sample preprocessing protocols, raising the concern over whether the reported accuracy of a deep learning model by one clinic could be reproduced in another clinic. Here we aim to investigate the effect of each imaging factor on the reproducibility of object detection models, using sperm analysis as a pilot example. Methods: Ablation studies were performed using state-of-the-art models for detecting human sperm to quantitatively assess how model precision (false-positive detection) and recall (missed detection) were affected by imaging magnification, imaging mode, and sample preprocessing protocols. The results led to the hypothesis that the richness of image acquisition conditions in a training dataset deterministically affects model reproducibility. The hypothesis was tested by first enriching the training dataset with a wide range of imaging conditions, then validated through internal blind tests on new samples and external multi-center clinical validations. Results: Ablation experiments revealed that removing subsets of data from the training dataset significantly reduced model precision. Removing raw sample images from the training dataset caused the largest drop in model precision, whereas removing 20x images caused the largest drop in model recall. by incorporating different imaging and sample preprocessing conditions into a rich training dataset, the model achieved an intraclass correlation coefficient (ICC) of 0.97 (95% CI: 0.94-0.99) for precision, and an ICC of 0.97 (95% CI: 0.93-0.99) for recall. Multi-center clinical validation showed no significant differences in model precision or recall across different clinics and applications. Conclusions: The results validated the hypothesis that the richness of data in the training dataset is a key factor impacting model reproducibility. These findings highlight the importance of diversity in a training dataset for model evaluation and suggest that future deep learning models in andrology and reproductive medicine should incorporate comprehensive feature sets for enhanced reproducibility across clinics.

show abstract

A classification system of day 3 human embryos using deep learning

Cited by 19 publications

References 34 publications

Embryo selection through artificial intelligence versus embryologists: a systematic review

Embryo selection through artificial intelligence versus embryologists: a systematic review

Cascaded networks for the embryo classification on microscopic images using the residual external‐attention

Testing the reproducibility and effectiveness of deep learning models among clinics: sperm detection as a pilot study

Contact Info

Product

Resources

About