Abstract
Large language model agents require robust instruction conflict resolution mechanisms that can handle arbitrary privilege levels across diverse real-world scenarios, revealing current models' limitations in managing complex hierarchical instructions.
Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.
Community
We introduce Many-Tier Instruction Hierarchy (ManyIH) and the ManyIH-Bench benchmark, which evaluates whether LLM agents can dynamically resolve conflicting instructions across arbitrarily many privilege levels. Our findings reveal that current frontier models still struggle significantly as the complexity of instruction conflicts scales.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Hierarchical Alignment: Enforcing Hierarchical Instruction-Following in LLMs through Logical Consistency (2026)
- Multi-User Large Language Model Agents (2026)
- ClawArena: Benchmarking AI Agents in Evolving Information Environments (2026)
- FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications (2026)
- Benchmark Test-Time Scaling of General LLM Agents (2026)
- ReCUBE: Evaluating Repository-Level Context Utilization in Code Generation (2026)
- IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.09443 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper