Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval 2016
DOI: 10.1145/2911996.2912036
|View full text |Cite
|
Sign up to set email alerts
|

The ImageNet Shuffle

Abstract: This paper strives for video event detection using a representation learned from deep convolutional neural networks. Different from the leading approaches, who all learn from the 1,000 classes defined in the ImageNet Large Scale Visual Recognition Challenge, we investigate how to leverage the complete ImageNet hierarchy for pre-training deep networks. To deal with the problems of over-specific classes and classes with few images, we introduce a bottom-up and top-down approach for reorganization of the ImageNet… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 69 publications
(12 citation statements)
references
References 26 publications
0
10
0
2
Order By: Relevance
“…ing strategies. The SOMHunter and VIRET systems relied on the same BoW variant of the W2VV++ model [33,35], a query representation learning approach employing visual features obtained from deep networks trained with a high number of classes [43,44]. For more details about the employed W2VV++ variant and used similarity for each system, we refer to [35].…”
Section: Text Search Vbs 2020 Witnessed Various Search Models Based On Different Text-image Match-mentioning
confidence: 99%
See 1 more Smart Citation
“…ing strategies. The SOMHunter and VIRET systems relied on the same BoW variant of the W2VV++ model [33,35], a query representation learning approach employing visual features obtained from deep networks trained with a high number of classes [43,44]. For more details about the employed W2VV++ variant and used similarity for each system, we refer to [35].…”
Section: Text Search Vbs 2020 Witnessed Various Search Models Based On Different Text-image Match-mentioning
confidence: 99%
“…Many kinds of concepts such as objects, actions and scenes are extracted from the videos. For the concept extraction, VIREO uses ResNet152 [21] trained on several datasets such as ImageNet [12], ImageNet shuffle [43], OpenImage [29] and Place-365 [70] datasets, and the P3D network [49] trained on the Kinetics dataset [5] to get the concepts [46]. IVIST uses the SCAN model [31] that looks into the latent alignments in images and sentences and predicts the similarity between them.…”
Section: Text Search Vbs 2020 Witnessed Various Search Models Based On Different Text-image Match-mentioning
confidence: 99%
“…Therefore, appropriate concept selection is necessary to characterize the dynamic evolution of time series based concepts. According to [56,67], the recognition performances of activities can be enhanced accordingly if more appropriate concepts are utilized. This is more obvious when the original concept detections are less satisfactory [56].…”
Section: Attribute-based Everyday Activity Recognitionmentioning
confidence: 99%
“…Chúng tôi sử dụng ba mô hình DNN được đánh giá là tốt nhất hiện nay bao gồm Alexnet [6] , UvANet [7], VGG [8] trên dữ liệu chuẩn VSD 2014 với gần 62,18 giờ video. Kết quả thực nghiệm cho thấy việc sử dụng DNN cho kết quả tốt hơn 13% so với việc sử dụng đặc trưng cấp thấp, trong đó với mô hình VGG 19 cho kết quả cao nhất là 48,12 %.…”
Section: Hìnhunclassified
“…Trong nghiên cứu của nhóm Mettes [7] thay vì sử dụng một phần dữ liệu của ImageNet để huấn luyện mạng như Alexnet, thì nhóm sử dụng toàn bộ dữ liệu đã được tổ chức lại gồm 14 triệu ảnh với 21,814 lớp. Kết quả của quá trình huấn luyện là các mô hình UvANet, theo nhóm tác giả nghiên cứu đánh giá thì đây là mô hình cho kết quả tốt nhất cho bài toán phát hiện sự kiện trong video.…”
Section: B Một Số Nghiên Cứu Sử Dụng Dnn Cho Lĩnh Vực Thị Giác Máyunclassified