CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.

CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.

From Text to Vision: Adapting Adversarial Techniques from LLMs to Build VLM-Resistant CAPTCHAs

MUSA, TEA
2025/2026

Abstract

CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.
2025
From Text to Vision: Adapting Adversarial Techniques from LLMs to Build VLM-Resistant CAPTCHAs
CAPTCHA systems are a core security mechanism used across more than one million websites to distinguish human users from automated bots. Image-grid CAPTCHAs, the most common deployed variant, present a grid of photographs and ask the user to select every cell that contains a given object, a task designed to be quick and intuitive for people. Vision-Language Models (VLMs) pose a direct threat to this design. Modern VLMs can read the CAPTCHA prompt, inspect the grid, and return a correct answer in a single forward pass, without any CAPTCHA-specific training. When a VLM can reliably solve the challenge that is supposed to block bots, the CAPTCHA ceases to function as an access-control barrier. This thesis addresses this limitation by exploring the central research question: How can known adversarial weaknesses of Large Language Models~(LLMs) be transferred to the visual domain to defend CAPTCHAs against VLM-based solving? To answer this, we propose a defensive framework that embeds adversarial perturbations directly into the CAPTCHA image, adapting attack strategies from the LLM security literature as visual defenses. We design six perturbation techniques: prompt injection, typographic attack, instruction conflict, phantom answer, authority escalation, and context overflow, each targeting a different stage of the VLM processing pipeline, aiming to degrade VLM accuracy while keeping the challenge natural for human users. The research question is evaluated through a fully automated benchmarking platform covering quality-gated CAPTCHA generation across twelve object categories, adversarial perturbation under seven conditions, automated evaluation against two production-grade VLMs (Qwen2.5-VL-7B-Instruct and GPT-4o), and a human usability study, with performance assessed using exact-match accuracy, F1 score, solve time, and perceived difficulty.
VLM
LLM
CAPTCHA security
Adversarial defenses
File in questo prodotto:
File Dimensione Formato  
Musa_Tea.pdf

Accesso riservato

Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108164