Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.

Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals

TAVAKOLI YARAKI, ALI
2023/2024

Abstract

Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.
2023
Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals
Deep Learning
Blindness
YOLO (You Only Look)
Audio Feedback
Computer Vision
File in questo prodotto:
File Dimensione Formato  
Tavakoli Yaraki_Ali.pdf

accesso aperto

Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66485