Real time object scanning using a mobile phone and cloud-based visual search engine

Zhong, Yu; Garrigues, Pierre; Bigham, Jeffrey P.

doi:10.1145/2513383.2513443

Cited by 40 publications

(23 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As seen in both related work [24] and our experiments, blind people usually have worse phototography skills than sighted people, so an accessible camera interface with few restric tions and rich guidance is crucial to help them take better photos. Existing camera interfaces often fail to provide assis tance, resulting in poor performance of photo-based assistive applications.…”

Section: Discussionmentioning

confidence: 62%

“…al's key frame extraction algorithm [24], we created an panorama interface for RegionSpeak which has no restriction that needs visual inspection. Users of RegionSpeak can move the camera in any direction, and the key frame extraction al gorithm will detect substantial changes in view port and alert users to hold their position to capture a new image.…”

Section: Interface Detailsmentioning

confidence: 99%

See 1 more Smart Citation

RegionSpeak

Zhong

Lasecki

Brady

et al. 2015

Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Self Cite

View full text Add to dashboard Cite

Blind people often seek answers to their visual questions from remote sources, however, the commonly adopted singleimage, single-response model does not always guarantee enough bandwidth between users and sources. This is es pecially true when questions concern large sets of informa tion, or spatial layout, e.g., where is there to sit in this area, what tools are on this work bench, or what do the buttons on this machine do? Our RegionSpeak system addresses this problem by providing an accessible way for blind users to (i) combine visual information across multiple photographs via image stitching, (ii) quickly collect labels from the crowd for all relevant objects contained within the resulting large visual area in parallel, and (iii) then interactively explore the spa tial layout of the objects that were labeled. The regions and descriptions are displayed on an accessible touchscreen in terface, which allow blind users to interactively explore their spatial layout. We demonstrate that workers from Amazon Mechanical Turk are able to quickly and accurately identify relevant regions, and that asking them to describe only one region at a time results in more comprehensive descriptions of complex images. RegionSpeak can be used to explore the spatial layout of the regions identified. It also demonstrates broad potential for helping blind users to answer difficult spa tial layout questions.

show abstract

Section: Discussionmentioning

confidence: 62%

Section: Interface Detailsmentioning

confidence: 99%

RegionSpeak

Zhong

Lasecki

Brady

et al. 2015

Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…These include ThirdEye [5], VIZWIZ [3] and LendAnEye [13], and others [6,[13][14][15]. These solutions make use of the integration between human resource and the information technology.…”

Section: Related Workmentioning

confidence: 99%

“…Indeed, they can ask anyone for the unknown object but there were some confidential data and situations that he can't share with any strangers except close friend or a family's member [6].…”

Section: Introductionmentioning

confidence: 99%

Privacy-preserving Twitter-based Solution for Visually Impaired People

Abdraboo¹,

Gaber²,

Wahed³

2017

ijacsa

View full text Add to dashboard Cite

Abstract-Visually impaired people is a big community all over the world. They usually seek help to perform their daily activities such as reading the expired date of food cans or medicine, reading out PIN of a certain ATM Visa, identifying the color of clothes or differentiate between the money notes and other objects with the same shape. A number of IT-based solutions have been proposed to help and assist blind and/or visually impaired people. Generally speaking, these solutions, however, do not support Arabic languages nor protect blind users' privacy. In this paper, Trusted Blind Society (TBS) mobile application is proposed. It is an android application which allows blind users to recognize their unknown surroundings by utilizing two concepts: social networks sites and friendsourcing. These two concepts were employed by allowing family members and the trusted friends, who are registered on Twitter, to answer blind users' questions on a real time. The solution is also bilingual, supports (Arabic/English) and allows screen reader using Android talk-back service. The performance of the TBS system was evaluated using loader.io to check its stability under the heavy load and it was tested by a number of blind volunteers and the results showed good performance comparing to most related work.

show abstract

“…These descriptors are invariant to translation, scaling and rotation of objects and partially invariant to changes in illumination. Relatively quick calculation of the image features allows the development of systems for object recognition that work nearly in real-time [11]. Computer-vision-based techniques have a higher functionality, but also drawbacks such as: (1) high cost of server-side hardware and software; (2) still low recognition accuracy of such techniques resulting in safety concerns for use by BVI persons [9,12]-recognition depends not only on the descriptor used but also on the training data, training algorithm and type of classifier; (3) such systems are sensitive to the illumination; (4) intensive camera use of mobile devices quickly shortens battery life; (5) intensive network traffic, especially in systems where images are processed entirely by software from the server side, involves paying a higher price for mobile data transfer; and (6) it is still hard to extract detailed descriptions of objects from images.…”

Section: Introductionmentioning

confidence: 99%

Blind-environment interaction through voice augmented objects

Ivanov

2014

J Multimodal User Interfaces

View full text Add to dashboard Cite

This article presents an Java-based mobile service that enables blind-environment interaction through voice-augmented objects. To make this possible, it is necessary to tag the object with an associated radio frequency identification and record its voice-based description. The blind users can later use the service to scan surrounding augmented objects and verbalize their identity and characteristics. We use a user centred design in order to guarantee the accessibility of the service for visually impaired and blind people. The required hardware is a near field communication-enabled mobile phone with built-in accelerometer. The client-side application does not require pushing any buttons, browsing any menus, or touching any screens to select and activate any of supported modes: registration, calibration, voice recording, physical object identification, delete voice recording(s), cloud-based file sync and share. Twelve visually impaired individuals (aged 31-84, 6 men and 6 women) have tested the service in two different scenarios: (1) a test based on comparison with a PenFriend labeling unit, and (2) a users' experience test. The results show that selected tangible, multimodal interface (object touching, phone shaking and tilt, voice output) can be used very easily (58 %) or easily (33 %) by blind and visually impaired users who have had no previous experience with other mobile services. Most of participants from the test group agreed that the service could be useful for their daily activities. The service can be used both at home and in public buildings for voice description of objects such as food,

show abstract

Real time object scanning using a mobile phone and cloud-based visual search engine

Cited by 40 publications

References 11 publications

RegionSpeak

RegionSpeak

Privacy-preserving Twitter-based Solution for Visually Impaired People

Blind-environment interaction through voice augmented objects

Contact Info

Product

Resources

About