Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.
Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals
TAVAKOLI YARAKI, ALI
2023/2024
Abstract
Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.File | Dimensione | Formato | |
---|---|---|---|
Tavakoli Yaraki_Ali.pdf
accesso aperto
Dimensione
1.71 MB
Formato
Adobe PDF
|
1.71 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/66485