Expressivity of In-Context Learning in Large Language Models

In-context learning (ICL) drives much of the practical utility of large language models (LLMs), but its limitations—particularly on tasks requiring algorithmic reasoning—lack a precise characterization. Theoretically, transformer networks with unlimited chain-of-thought tokens in their output should be able to simulate any learning algorithm, but recent work has found that LLMs fall far short in practice. In this paper, we contribute to the growing body of work examining this discrepancy by evaluating the ICL capabilities of several LLMs (ChatGPT, DeepSeek, Qwen, and Llama) on a suite of formal language recognition tasks, which provide a controlled testbed for assessing reasoning ability grounded in the theory of computation. Our experiments span a range of language classes, namely sub-regular, regular, deterministic context-free, context-free, and context-sensitive languages. Bearing in mind recent work showing that a transformer network’s expressive power increases with the number of padding tokens in its input, we test several ways of encoding exemplars that result in varying numbers of input tokens. To test the role of chain-of-thought, we also test prompts that require the model to produce an output immediately after reading the input and prompts that permit unrestricted reasoning before a label is produced. We find that pretrained LLMs perform very poorly on these reasoning tasks in all cases, only successfully learning the language of binary strings that begin with a 1. Also, contrary to expectation, adding padding and chain-of-thought tokens does not consistently improve accuracy. Still, ICL with pretrained LLMs is consistently more accurate than training a small transformer from scratch on the same data, suggesting that pretraining imbues transformers with a learning mechanism that is at least more sample efficient than training from scratch. These results reveal a disconnect between theoretical models of transformer capacity and the practical behavior of LLMs in ICL.

Expressivity of In-Context Learning in Large Language Models

LORENZON, NICOLA

2024/2025

Abstract

In-context learning (ICL) drives much of the practical utility of large language models (LLMs), but its limitations—particularly on tasks requiring algorithmic reasoning—lack a precise characterization. Theoretically, transformer networks with unlimited chain-of-thought tokens in their output should be able to simulate any learning algorithm, but recent work has found that LLMs fall far short in practice. In this paper, we contribute to the growing body of work examining this discrepancy by evaluating the ICL capabilities of several LLMs (ChatGPT, DeepSeek, Qwen, and Llama) on a suite of formal language recognition tasks, which provide a controlled testbed for assessing reasoning ability grounded in the theory of computation. Our experiments span a range of language classes, namely sub-regular, regular, deterministic context-free, context-free, and context-sensitive languages. Bearing in mind recent work showing that a transformer network’s expressive power increases with the number of padding tokens in its input, we test several ways of encoding exemplars that result in varying numbers of input tokens. To test the role of chain-of-thought, we also test prompts that require the model to produce an output immediately after reading the input and prompts that permit unrestricted reasoning before a label is produced. We find that pretrained LLMs perform very poorly on these reasoning tasks in all cases, only successfully learning the language of binary strings that begin with a 1. Also, contrary to expectation, adding padding and chain-of-thought tokens does not consistently improve accuracy. Still, ICL with pretrained LLMs is consistently more accurate than training a small transformer from scratch on the same data, suggesting that pretraining imbues transformers with a learning mechanism that is at least more sample efficient than training from scratch. These results reveal a disconnect between theoretical models of transformer capacity and the practical behavior of LLMs in ICL.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Expressivity of In-Context Learning in Large Language Models
			
	Parola chiave
	
				NLP
Formal Languages
LLMs
In-context Learning
Expressivity
			
	Relatore
	
				SATTA, GIORGIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Lorenzon_Nicola.pdf Accesso riservato Dimensione 4.22 MB Formato Adobe PDF	4.22 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/99633