From Text to Vision: Adapting Adversarial Techniques from LLMs to Build VLM-Resistant CAPTCHAs

CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.

From Text to Vision: Adapting Adversarial Techniques from LLMs to Build VLM-Resistant CAPTCHAs

MUSA, TEA

2025/2026

Abstract

CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				From Text to Vision: Adapting Adversarial Techniques from LLMs to Build VLM-Resistant CAPTCHAs
			
	Abstract in italiano
	
				CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. 
Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. 
This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users.
The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.
			
	Parola chiave
	
				VLM
LLM
CAPTCHA security
Adversarial defenses
			
	Relatore
	
				BRIGHENTE, ALESSANDRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Musa_Tea.pdf Accesso riservato Dimensione 1.02 MB Formato Adobe PDF	1.02 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108164