In January 2026, the MCP Apps spec landed. It lets MCP Servers embed interactive HTML UIs — charts, forms, dashboards, maps — directly inside conversations in Claude, VS Code, and other AI hosts.
As a frontend engineer, the moment I saw "interactive HTML inside AI chat," I was hooked. Text-only tool outputs always felt limiting. MCP Apps finally close the gap between what the LLM produces and what the user actually sees and touches.
I spent some time digging into the spec and built mermaid-mcp-app — an interactive Mermaid diagram tool I now use daily. This post walks through MCP Apps using that project as the running example.
What Are MCP Apps
Three Roles
The architecture has three actors:
-
MCP Server — The backend. Registers tools via
registerAppTooland HTML resources viaregisterAppResource. A tool's_meta.ui.resourceUrifield binds it to a specific UI. -
Host — The AI interface — Claude Desktop, VS Code Copilot, etc. When the LLM calls a tool, the Host executes it, fetches the HTML from
resourceUri, and renders it in a sandboxed iframe. -
View — The frontend app running inside that iframe. It uses the
Appclass to open a postMessage channel with the Host, receives tool results, and can call server tools or push messages back into the conversation.
All View ↔ Server communication goes through the Host. The wire format is JSON-RPC 2.0 over postMessage.

Lifecycle
Four phases:
-
Discovery — The Host connects to the server, reads the tool list, and flags tools carrying
_meta.uimetadata. It can pre-fetch HTML resources here for caching and security review. -
Initialize — Once the LLM calls a tool, the Host spins up a sandboxed iframe, loads the HTML, and completes a handshake via
ui/initialize. Both sides exchange capabilities: the View declares which display modes it supports; the Host provides theme, container dimensions, and other context. -
Interactive — The Host pushes tool input and tool result into the View, which renders the UI. From here, the View can call server tools via
callServerTool, send messages back to the conversation viasendMessage, or silently sync state to the LLM viaupdateModelContext. -
Teardown — When the conversation ends or the user closes the UI, the Host sends
ui/resource-teardownso the View can clean up.
Beyond Display — Two-Way Communication
Rendering LLM output as a nice visual is table stakes. The real shift is this: after users interact with the UI, results flow back to the LLM.
The spec provides two APIs for this:
sendMessage — Acts as if the user typed something in the chat box. The message shows up in the conversation immediately.
await app.sendMessage({
role: "user",
content: [{ type: "text", text: `I modified the diagram:\n\`\`\`mermaid\n${code}\n\`\`\`` }],
});
updateModelContext — A silent sync. No response is triggered, but the LLM will see the data the next time the user sends a message. Each call overwrites the previous — only the latest snapshot is kept.
await app.updateModelContext({
content: [{ type: "text", text: `Current Mermaid source:\n\`\`\`mermaid\n${code}\n\`\`\`` }],
});
| sendMessage | updateModelContext | |
|---|---|---|
| Triggers LLM response | Immediately | No |
| Visible in conversation | Appears as user message | Hidden |
| Multiple calls | Each one is independent | Last call wins |
| Best for | Completed actions that need the LLM to respond | Continuous state sync — the user decides when to ask |
These two APIs change what an MCP App fundamentally is. It's not "a container for displaying LLM output" anymore. It's a full interactive component: input goes in, the user acts, output flows back to the LLM.
Think about it: sendMessage after a user fills out a form so the LLM can review it. updateModelContext after the user selects a region on a map, silently syncing coordinates until they ask "what restaurants are nearby?" A code editor where the LLM reviews changes the moment the user finishes editing.
Beyond these two core APIs, the View also has access to callServerTool (invoke server-side tools), openLink (ask the Host to open external URLs), downloadFile (the iframe sandbox blocks direct downloads, so the Host handles it), and requestDisplayMode (switch between inline, fullscreen, or picture-in-picture).
Building One: The Mermaid MCP App
Enough theory. Let me walk through the mermaid-mcp-app I built — an interactive Mermaid tool that embeds diagrams directly in conversations, with drag-to-pan, scroll-to-zoom, and a split-view editor for live syntax editing.
30-Second Setup
Add this to your Claude Desktop or VS Code MCP config:
{
"mcpServers": {
"mermaid": {
"command": "npx",
"args": ["-y", "mermaid-mcp-app", "--stdio"]
}
}
}
Restart Claude Desktop and ask it to "draw a user authentication flowchart." The diagram appears inline — draggable, zoomable, with an editor for modifying the syntax on the spot.
There's also a packaged Desktop Extension (.mcpb) on GitHub Releases. Double-click to install — no terminal needed.
Server Side
The server handles three things:
Registers the main tool. registerAppTool defines render-mermaid, with _meta.ui.resourceUri binding it to the HTML resource:
registerAppTool(server, "render-mermaid", {
title: "Render Mermaid Diagram",
inputSchema: {
code: z.string().describe("The Mermaid diagram syntax to render"),
theme: z.enum(["default", "light", "dark", "forest", "neutral"]).optional(),
},
_meta: {
ui: { resourceUri: "ui://mermaid/view.html" },
},
}, async ({ code, theme }) => ({
content: [{ type: "text" as const, text: JSON.stringify({ code, theme: theme ?? "default" }) }],
}));
Serves the HTML resource. registerAppResource exposes the Vite-bundled single HTML file as a ui:// resource. The server sends data only — all rendering happens client-side inside the iframe.
Internal tools for draft persistence. Two additional tools — save-mermaid-draft and get-mermaid-draft — let the View persist user edits to server memory and restore state when the iframe is recreated. These are internal; only the View calls them via callServerTool, not the LLM.
View Side
The View is a standard frontend app. It uses the App class to set up the postMessage channel:
const app = new App(
{ name: "MermaidViewer", version: "1.0.0" },
{},
{ autoResize: true },
);
app.ontoolinput = (params) => {
// LLM calls the tool — Host sends arguments before the server processes them
handleMermaidData(params.arguments);
};
app.ontoolresult = (params) => {
// Server finishes processing — Host sends the full result
const data = JSON.parse(params.content[0].text);
handleMermaidData(data);
};
await app.connect();
ontoolinput and ontoolresult fire at different moments. The former triggers when the LLM decides to call the tool (arguments are ready but the server hasn't processed yet); the latter fires after the server returns. For tools that don't need server-side computation — like mermaid-mcp-app — both carry essentially the same data, so you can use ontoolinput for early rendering. For tools that do compute on the server, the two payloads will differ.
There's also ontoolinputpartial. As the LLM streams tool arguments, the Host patches the incomplete JSON into a valid shape and pushes it to the View. Not useful for Mermaid (half-written syntax won't render), but great for text-heavy UIs that want progressive rendering.
Interactivity in Practice
When the user opens the split-view editor and modifies the Mermaid code, two things happen:
Auto context sync — Each edit triggers a debounced updateModelContext call, silently syncing the current source to the LLM. No immediate response, but the next time the user asks a question, the LLM already knows the latest diagram state.
Send to AI (⌘ Enter) — Calls sendMessage with the full modified source as a user message. The LLM sees it and responds right away.

Gotchas I Hit During Development
Iframe Sandbox Restrictions
Every View runs inside a sandboxed iframe with a default CSP (Content Security Policy). Libraries that rely on eval() won't work out of the box. If your UI needs external resources, you have to declare them via connectDomains, resourceDomains, and frameDomains. The official map example hit exactly this wall — CSP blocked eval() calls used for binding parsing.
Viewport Size Management
The spec gives you two mechanisms for iframe dimensions:
-
autoResize — Pass
autoResize: trueto the SDK. It attaches aResizeObserverto detect body changes, temporarily setshtml.style.heighttomax-contentto measure natural content height, then notifies the Host viaui/notifications/size-changed. Content drives height; the Host follows. -
containerDimensions — Query the container constraints anytime via
app.getHostContext()?.containerDimensions:
interface HostContext {
containerDimensions?: (
| { height: number } // fixed: Host controls height, View should fill it
| { maxHeight?: number } // flexible: View decides height, but within a ceiling
) & (
| { width: number } // fixed: Host controls width
| { maxWidth?: number } // flexible: View decides width
);
}
Wait for Layout to Settle Before Measuring
The iframe's dimensions shift for all kinds of reasons during loading. If you measure at that point for fit-to-container logic, you'll capture stale values. Use a ResizeObserver with a debounce — "wait until dimensions stop changing" — before reading the final size.
Draft Persistence Is Your Problem
The Host can destroy and recreate your iframe at any time — conversation scrolls out of view, user switches tabs, Host updates its UI. The Mermaid code your user spent five minutes editing? Gone. No warning.
My solution: register two internal tools on the server (save-mermaid-draft / get-mermaid-draft). The View calls callServerTool to persist state to server memory on every edit. When the iframe is recreated, it loads the last saved state automatically.
Where the Ecosystem Stands
Host support is still limited. Confirmed so far: Claude Desktop, VS Code Copilot, Goose, Postman, and MCPJam. The spec requires servers to provide a plain-text fallback, so Hosts that don't support MCP Apps fall back gracefully — the tool doesn't break.
The spec keeps moving. SEP-1865 was consolidated in November 2025 from the MCP-UI community and OpenAI Apps SDK experience. It's in Final status now but still has active PRs. Future additions on the table include external URL support, built-in state persistence, and View-to-View communication.