InteractWeb-Bench

🔔News

😄 [2026-04-25]: Releasing Website
🔥 [2026-02-16]: Research Beginning

Introduction

With the advancement of multimodal large language models~(MLLMs) and coding agents, the website development has shifted from manual programming to agent-based code synthesis. Existing benchmarks rely on idealized assumptions, especially for well-structured, information-rich inputs and static execution settings. In contrast, real-world usage is constrained by a critical bottleneck: the semantic misalignment between ambiguous, low-quality instructions from non-expert users and model understanding, which results in a failure mode that we term blind execution. To address this gap, we introduce InteractWeb-Bench, the first multimodal interactive benchmark for low-code website generation under non-expert user conditions. InteractWeb-Bench introduce user agent and persona-driven instruction perturbations to systematically simulate diverse user behaviors, including ambiguity, redundancy, and contradiction. We develop an interactive execution environment for agents, featuring a unified action space comprising Clarify, Implement, Verify, and Submit, enabling iterative intent refinement, code synthesis, and visual feedback–based validation. Extensive experiments and analysis reveal that frontier MLLM-based agents remain trapped in blind execution, exposing limitations in intent recognition and adaptive interaction.

Overview

placeholder

BibTeX


          placeholder

InteractWeb-Bench

Can Multimodal Agent Escape Blind Execution
in Interactive Website Generation?

🔔News

Introduction

Evaluation Framework

Overview

BibTeX

InteractWeb-Bench

Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?

🔔News

Introduction

Evaluation Framework

Overview

BibTeX

Can Multimodal Agent Escape Blind Execution
in Interactive Website Generation?