2018
DOI: 10.1007/978-3-030-01261-8_19
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Hard Example Mining from Videos for Improved Object Detection

Abstract: Important gains have recently been obtained in object detection by using training objectives that focus on hard negative examples, i.e., negative examples that are currently rated as positive or ambiguous by the detector. These examples can strongly influence parameters when the network is trained to correct them. Unfortunately, they are often sparse in the training data, and are expensive to obtain. In this work, we show how large numbers of hard negatives can be obtained automatically by analyzing the output… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 67 publications
(47 citation statements)
references
References 64 publications
(121 reference statements)
0
45
0
2
Order By: Relevance
“…We argue that if the baseline network is easily able to detect the person in a composite image, then it is an easy example and may not boost the network's performance when added to the training set. A similar metric has been proposed by previous works [19,40,51,54] for evaluating the quality of real data.…”
Section: Comparison With Previous Cut-paste Methodsmentioning
confidence: 96%
“…We argue that if the baseline network is easily able to detect the person in a composite image, then it is an easy example and may not boost the network's performance when added to the training set. A similar metric has been proposed by previous works [19,40,51,54] for evaluating the quality of real data.…”
Section: Comparison With Previous Cut-paste Methodsmentioning
confidence: 96%
“…In their case, the unlabeled data was from the same domain as the labeled data, and pseudo-labeling was done by selecting the predictions from the baseline model using test-time data augmentation. Jin et al [25] use tracking in videos to gather hard examplesi.e. objects that fail to be detected by an object detector (false negatives); they re-train using this extra data to improve detection on still images.…”
Section: Related Workmentioning
confidence: 99%
“…where "hard" label y i ∈ {0, 1} and the model's predicted posterior p i ∈ [0, 1]. This is similar to the method of Jin et al [25], which assigns a label of 1 for both easy and hard positive examples during re-training. Figure 2: (a) Pseudo-labels from detection and tracking: 1 In three consecutive video frames, high-confidence predictions from the baseline detector are marked in green, and faces missed by the detector (i.e.…”
Section: Training On Pseudo-labelsmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, methods such as [10] and [11] exploit temporal consistency in videos to automatically mine new examples using an existing weak detector. [12] demonstrates how sharing weights between an autoencoder-like ladder network on unlabelled data with the feature extractor used for labelled data can leverage the unlabelled data during training to increase performance.…”
Section: Related Workmentioning
confidence: 99%