Indoor Guiding System for the Visually Impaired

(1)

Indoor guiding system

for the visually impaired

B y K H A N G , M i n s o o & L E E , L o k Y i n

S u p e r v i s e d b y : D r . D e s m o n d Y . C . T S O I ( C S E )

Over 2.2 billion suffer from

visual impairment

Facing challenges in day to

day activities

Hard to seek and retrieve

item indoor

Motivations

Solution

Application

Object detection and stereo

vision to locate items

Voice interface to receive

command from user

Physical guiding robot

generates sound signal to

navigate user to target item

Customized neural network

models to recognize user's

voice and detect their

necessity items.

Navigate user to any items

indoor with a convenient

sound guidance

Inform user of the item height

System Flow

System Setup

(2)

Methodology

Guiding Car

Voice Recognition

Object Detection

Navigation

Application

Design and build a agile mechanical structure with battery system

Develop a PID positional controller using optical rotary encoder

Utilize Bluetooth module to form a USART communication with the central computer

Stereo Vision

Develop a pipeline by incorporating an audio convert library LibROSA and convolutional neural network (CNN) by tensorflow keras

Simplified the training process to achieve a high accuracy regardless of the accent / language (User Customization)

Implement Retinanet for its high accuracy and lower demand in training images

Increase training data set size from small user-provided images through data augmentation and segmentation with COCO public image data set

Determine the height position of objects after it has been detected by the object detector

Separate the object and the background to obtain training data automatically

(User Customization)

Translate the positional data received from object detection module

Create a real-time collision avoidance path planning system by Dynamic Window

Approach

User Customization

Operation

Only a small number of images containing target objects provided by the user are needed to

generate an effective object detector through our customization pipeline. Through the use of heavy data augmentation and automatic labeling by stereo vision, the user only needs to wait for at most one to two days of training the model prior to its deployment. The final trained model is customized to detect the target objects and fine-tuned for his/her usage.

Voice command can also be customized to different users. By recording and transforming the speech signal into different domains, a small yet effective CNN model can be retrained for each user's voice commands in a short duration of time. Through this customization procedure, the voice recognition model is able to recognize the personalized phrasing, language and ascent of the user, making our system user-friendly and easy to use.

The implementation of our user-friendly guiding procedure is visualized in the figure above. The user's voice command of a target necessity is recorded by the system's microphone and recognized by our voice recognition model when the volume is above a certain threshold. Subsequently, after the system has located the guiding car and the user, the guiding car will then approach the user followed by emitting the "beep"

sound signal. The guiding car then guides the user by navigating itself slowly towards the target necessity while maintaining the "beep" signal. Upon reaching the target

destination, it will emit a long "beep" signal to inform the user that the target necessity is in its proximity. Then, the system will inform the user of the approximate height of the target necessity such as "near shoulder level" or "near waist level". The user is now able to retrieve their target necessity without the assistance of another individual.