Progressive Neural Architecture Search

Liu, Chenxi; Zoph, Barret; Neumann, Maxim; Shlens, Jonathon; Wei, Hua; Li, Lijia; Li, Feifei; Yuille, Alan; Huang, Jonathan; Murphy, Kevin

doi:10.1007/978-3-030-01246-5_2

Cited by 1,592 publications

(1,320 citation statements)

References 19 publications

Supporting

Mentioning

1,313

Contrasting

Unclassified

Order By: Relevance

“…architecture depends on the difficulty and size of the dataset at hand. While these findings may encourage an automated neural architecture search, such an approach is hindered by the limited computational resources [19], [20], [21], [22], [23]. Alternatively, we propose an ensemble architecture, which combines U-Nets of varying depths into one unified structure.…”

Section: Table Imentioning

confidence: 99%

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zhou

Siddiquee

Tajbakhsh

et al. 2020

IEEE Trans. Med. Imaging

2,606

1,038

View full text Add to dashboard Cite

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations:(1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the samescale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects-an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

show abstract

Section: Table Imentioning

confidence: 99%

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zhou

Siddiquee

Tajbakhsh

et al. 2020

IEEE Trans. Med. Imaging

2,606

1,038

View full text Add to dashboard Cite

show abstract

“…The progressive neural architecture search (PNAS) investigated the use of the Bayesian optimization strategy SMBO to make the search for CNN architectures more efficient by exploring simpler cells before determining whether to search more complex cells [71]. Similarly, NASBOT defines a distance function for generated architectures, which is used for constructing a kernel to use Gaussian processes for BO [72].…”

Section: Neural Architecture Searchmentioning

confidence: 99%

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

2020

View full text Add to dashboard Cite

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

show abstract

“…This is often achieved by jointly training a fully connected layer for dimension control and reordering for each modality, together with the scalar weights for fusion. A recent study [126] employs neural architecture search with progressive exploration [127]- [129] to find suitable settings for a number of fusion functions. Each fusion function is configured by which layers to fuse and whether to use concatenation or weighted sum as the fusion operation.…”

Section: A Simple Operation-based Fusionmentioning

confidence: 99%

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Zhang

Yang

et al. 2020

IEEE J. Sel. Top. Signal Process.

286

View full text Add to dashboard Cite

Deep learning has revolutionized speech recognition, image recognition, and natural language processing since 2010, each involving a single modality in the input signal. However, many applications in artificial intelligence involve more than one modality. It is therefore of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, a technical review of the models and learning methods for multimodal intelligence is provided. The main focus is the combination of vision and natural language, which has become an important area in both computer vision and natural language processing research communities.This review provides a comprehensive analysis of recent work on multimodal deep learning from three new angles -learning multimodal representations, the fusion of multimodal signals at various levels, and multimodal applications. On multimodal representation learning, we review the key concept of embedding, which unifies the multimodal signals into the same vector space and thus enables cross-modality signal processing. We also review the properties of the many types of embedding constructed and learned for general downstream tasks. On multimodal fusion, this review focuses on special architectures for the integration of the representation of unimodal signals for a particular task. On applications, selected areas of a broad interest in current literature are covered, including caption generation, text-to-image generation, and visual question answering. We believe this review can facilitate future studies in the emerging field of multimodal intelligence for the community.

show abstract

Progressive Neural Architecture Search

Cited by 1,592 publications

References 19 publications

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Contact Info

Product

Resources

About