Machine perception is a key challenge towards autonomous systems. Especially in the field of computer vision, numerous novel approaches have been introduced in recent years. This trend is based on the availability of public datasets. Logistics is one domain that could benefit from such innovations. Yet, there are no public datasets available. Accordingly, we create the first public dataset for scene understanding in logistics. The Logistics Objects in COntext (LOCO) dataset contains 39,101 images. In its first release there are 5,593 bounding-box annotated images. In total 151,428 instances of pallets, small load carriers, stillages, forklifts and pallet trucks were annotated. We also present and discuss our data acquisition approach which features enhanced privacy protection for workers. Finally, we provide an in-depth analysis of LOCO, compare it to other datasets (i.e. OpenImages and MS COCO) and show that it has far more annotations per image and also a considerably smaller annotation size. The dataset and future extensions will be available on our website (https://github.com/tum-fml/loco).
This paper proposes a scalable approach for synthetic image generation of industrial objects leveraging Blender for image rendering. In addition to common components in synthetic image generation research, three novel features are presented: First, we model relations between target objects and randomly apply those during scene generation (Object Relation Modelling (ORM)). Second, we extend the idea of distractors and create Object-alike Distractors (OAD), resembling the textural appearance (i.e. material and size) of target objects. And third, we propose a Mixed-lighting Illumination (MLI), combining global and local light sources to automatically create a diverse illumination of the scene. In addition to the image generation approach we create an industry-centered dataset for evaluation purposes. Experiments show, that our approach enables fully synthetic training of object detectors for industrial use-cases. Moreover, an ablation study provides evidence on the performance boost in object detection when using our novel features.
Autonomous robots in logistics are a promising approach towards a fully automated material flow. In order to use their full potential however, they must be able to extract semantic information from logistics environments. In contrast to other application areas of autonomous robots (e.g. autonomous driving, service robotics) the logistics domain lacks a common dataset and benchmark suite covering multiple sensor modalities and perception tasks. This paper conceptualizes a framework for artificial perception research in logistics that aims to close this gap in a sustainable, data-driven way. Our framework consists of three components: (1) A foundation, based on logistics-specific standards, concepts and requirements. ( 2) An open dataset, covering multiple sensor modalities and perception tasks and (3) a standardized benchmark suite. As shown in other research areas, a common and open platform for datadriven research facilitates novel developments and makes results comparable and traceable over time.
Object detection (OD) methods are finding application in various fields. The OD problem can be divided into two sub-problems, namely object classification and localization. While the former aims to answer the question what class a given object belongs to, the latter focuses on locating an object within a given image. For localization, both implicit representations, which border the object and its features (e.g. bounding boxes, polygons and masks), and explicit representations, which describe the object's pose in an image (e.g. 6D pose, keypoints), are used. The 2D pose is a simple, yet effective representation that has so far been overlooked. In this paper, we therefore motivate and formulate the use of 2D poses for object localization. Furthermore, we present RetinaNet-2DP, an anchor-based convolutional neural network (CNN) that is capable of detecting objects using 2D poses. To do so, we propose the idea of Anchor Poses and the Gaussian Kernel Distance as a similarity metric between poses. Experiments on the DOTA dataset and two robotics use cases from industry emphasize the performance of the network architecture and more generally demonstrate the potential of the proposed localization representation. Finally, we critically assess our findings and present an outlook of future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.