Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals

Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.

Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals

TAVAKOLI YARAKI, ALI

2023/2024

Abstract

Object detection is a challenging Computer Vision (CV) application, particularly for assist- ing blind individuals. With the rapid advancement of Deep Learning (DL), algorithms like Convolutional Neural Network (CNN) have significantly improved video analysis and image understanding for this purpose. Blind individuals face substantial challenges when navigating indoor or outdoor environments, underscoring the pressing need for assistive technologies. In this thesis, a system has been developed to address this need, integrating the You Only Look Once (YOLO) object detection algorithm with audio guidance to aid blind users. The solution utilizes YOLOv8’s State-Of-The-Art (SOTA) deep convolutional neural network architecture to detect objects in the user’s environment, providing spatial information and counting pro- cesses through audio feedback. The system, equipped with a text-to-speech engine, converts all the information into verbal instructions, in some cases acting as a virtual assistant shape program. This context-aware feedback, available in multiple languages, has been optimized for webcams as a real-time scenario, images, and videos. The system has shown promising results, enhancing the autonomy and quality of life for blind users, a significant step towards addressing the challenges they face in daily environments.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Seeing with Sound: Object Detection and Localization by YOLOv8 and Audio Feedback for Blind Individuals
			
	Parola chiave
	
				Deep Learning
Blindness
YOLO (You Only Look)
Audio Feedback
Computer Vision
			
	Relatore
	
				BATTISTI, FEDERICA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Tavakoli Yaraki_Ali.pdf accesso aperto Dimensione 1.71 MB Formato Adobe PDF Visualizza/Apri	1.71 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66485