Beyond Components: Designing Generative UI for MCP Apps… — Transcript

Ruben Casas explores the evolution of generative UI for MCP apps, discussing static, declarative, and future dynamic interfaces powered by advanced AI models.

Key Takeaways

  • AI models have significantly advanced in generating high-quality UI code rapidly.
  • Current UI paradigms are mostly static, but new approaches like declarative UI offer more dynamic and personalized experiences.
  • There are competing visions for future interfaces: ubiquitous chat versus centralized super apps (MCP apps).
  • Declarative UI uses AI-generated descriptors to render static components dynamically, improving personalization.
  • The UI for interacting with advanced AI systems is still evolving, with no definitive solution yet.

Summary

  • The talk covers the rapid evolution of AI models like GPT 5.2 and Opus 4.5 in generating high-fidelity UI code.
  • Ruben reflects on the transition from simple copy-pasting code from ChatGPT to AI producing better front-end code than developers.
  • He questions why UI remains mostly static despite advanced AI capabilities and explores new interface paradigms beyond chat.
  • Discussion on two potential UI futures: chat integrated everywhere versus a super app (MCP app) that consolidates UI interactions.
  • Explanation of static UI generation where agents orchestrate tool calls to predefined static components.
  • Introduction of declarative UI, where agents generate JSON/YAML descriptors mapped to static components for more dynamic and personalized UI.
  • Examples include AGUI protocol, Goose Auto Visualizer, Netflix’s personalized UI, and Vercel’s JSON Render tool.
  • Declarative UI still relies on static components but allows for more flexible and personalized rendering driven by AI-generated descriptors.
  • The current challenge is defining the new interface language for interacting with advanced AI-powered systems, akin to inventing the GUI in the 1970s.
  • Ruben emphasizes the ongoing exploration and uncertainty about the final form of AI-driven UI and invites the audience to consider future possibilities.

Full Transcript — Download SRT & Markdown

00:07
Speaker A
[music] Hello, everybody. So, I know I am the person standing between you and your lunch. But this is going to be a very interesting talk that combines the previous two talks into the future.
00:27
Speaker A
And that's what I want to talk about today. So, back in November 2022, what we used to do was we used to go to ChatGPT and ask ChatGPT to create a component. And we would just copy-paste. You have to ask for a reply in code blocks. Then, you know, again, fix it, repeat. And this is what I call the poor man's by coding.
00:42
Speaker A
And we have come a long way. It kind of worked. It was very exciting.
00:51
Speaker A
You could get models to actually build some UI for you. And I'm sure that was not going to write better code than me, right?
00:57
Speaker A
And then things improved very, very rapidly, very, very fast. What happened last year, and if you are aware of what happened in the last months of 2025, was this acceleration, an incredible inflection point where things changed. And it will
01:07
Speaker A
go down in the history books as things changed very fast all at once. And this is in part because of the release of two very important models, which were 5.2 ChatGPT, sorry. It was, yeah, a GPT 5.2 and Opus 4.5.
01:25
Speaker A
And they were not just very good at most of the tasks, long horizon tasks.
01:42
Speaker A
They were also very good at high fidelity UI generation. And they were producing very good working UI.
01:49
Speaker A
Sometimes thoughtful, sometimes really, really good. And also very fast. Now, I experienced this when I tried one of these models, tried to rewrite my blog. I know people have used this in more creative ways, but I just tried,
01:58
Speaker A
you know, a single prompt, rewrite my blog. And then it did this, which I didn't ask for.
02:14
Speaker A
It created a nice search box with a blur animation, with accessibility out of the box.
02:20
Speaker A
And then that's when I realized that in the space of three years from when ChatGPT was released to today, we went from, you know, few lines of code is great.
02:28
Speaker A
It can run. Oh. And now it can write better front-end code than me.
02:41
Speaker A
And you know, I don't mind. No ego. It's just reality. So, here's the question. If these models are so good at writing UI code, why are we still stuck in this mainly old paradigm of mostly static UI?
02:49
Speaker A
And where is that Jarvis moment that we've been talking about earlier? Where are my floating UI windows that appear and disappear? And why are we not there yet?
03:08
Speaker A
So, my name is Ruben Casas. I am a staff engineer at Postman. And I've been looking at UI and generative UI for the past year, and I've been working with MCP apps as well.
03:19
Speaker A
And today I want to show you what we're doing today and where we're going in the future.
03:29
Speaker A
So, the news are we have a new computer. And as Andrej Karpathy put it, interacting with this new computer is like talking to the terminal. You have direct access to this operating system.
03:33
Speaker A
And the GUI has not been invented yet. It's like we are in the '70s, where everything was just text.
03:47
Speaker A
And we have a super intelligence, but we don't have a mature interface language. And today we are still trying to figure out what is this new interface for this computer.
03:52
Speaker A
And people ask, is it chat? I'll show you what we're doing today. And actually, this was a very recent tweet last week, where people were complaining that most SaaS companies have been adding chat to their homepages, and
04:05
Speaker A
everybody's just putting chat everywhere. And that's fine. I don't have a problem with chat. It's not the final UI. It's okay for now.
04:19
Speaker A
But the question is, if it's not chat, then what is the interface for this computer?
04:30
Speaker A
On the other hand, as we have seen with MCP apps, there is another thought, which is we will have one app or super app to rule them all. And this is where MCP apps comes in, where
04:36
Speaker A
instead of putting all of these chat windows into your homepages and to every single app that you use, we will have a super app like ChatGPT or Claude or Gemini, where you will be interfacing with most of the UI and the websites
04:50
Speaker A
that we have today. And this is good. This is the way we are using MCP apps today to render third-party UI inside one agent environment.
05:03
Speaker A
Now, these two options both could be valid. And I believe these are part of the evolution towards finding out what is that new interface for that computer. And to be honest, I don't know which one is going to be the final
05:14
Speaker A
one. Consumers will tell us. But one thing is that these are two different questions.
05:29
Speaker A
The question is where does the UI run? In this case, is it third-party UI, a super app, or in this case, chat everywhere?
05:35
Speaker A
But most interesting is what is the model generating. And this is what I want to talk about today in terms of how are we generating this UI? And we have seen this. We have mostly static, declarative, and generative UI.
05:44
Speaker A
And I'm going to describe briefly this one. So, we have, to start with, the static components way of running UI, which is what most agents do today.
05:58
Speaker A
The agent is just an orchestrator. The agent makes a tool call via MCP apps or direct agent tool call. Then we will have some parameters and data passed to predefined static components that have been created by developers. And
06:10
Speaker A
this is very similar to what we have been doing for the past 20 years with UI. And then the client renders the component.
06:27
Speaker A
And if you see here, it's very similar to just getting a server to send some data, and then the UI will be rendered by the client. But in this case, the agent will be generating that data and the props to do this. And some
06:35
Speaker A
examples that we have today are the AGUI protocol. They have an SDK where you can register a client tool that maps to a React component. The tool call will receive some props. Those props will be mapped to a static component that then
06:47
Speaker A
will be rendered to the user. Another example is Goose. Goose is an MCP client where you can try most of the MCP features. And Goose has this really interesting feature called Goose Auto Visualizer, where you can just pass
07:04
Speaker A
any type of data to Goose, and Goose will try to match that data, organize it, and then pass it to a set of predefined components that the Goose team have created. In this case, we have a few interesting components that you
07:19
Speaker A
can use to visualize your data. So, that's the static way. That's the most common way of generating UI today.
07:32
Speaker A
But I have seen an evolution recently where we call now declarative UI. And declarative UI takes it to the next level. So, we will still have some predefined static components that developers build, and it contains your design system and all these
07:41
Speaker A
components that you have. But instead of the agent just passing the props and the data, the agent uses a descriptor that could be either JSON or YAML. Or I've seen Python as well with fast MCPs, where they have a
07:59
Speaker A
descriptor in Python that maps to these predefined static components. And then you have this translation rendering engine that takes those descriptors and converts them into the final UI.
08:13
Speaker A
How is this different? Well, in this case, it is more dynamic. They are still static components, but it's more personalized.
08:26
Speaker A
And if you look at this and you think that this might look familiar as well, it's because it is not new.
08:36
Speaker A
Netflix has been doing this for a long time since the personalization and server-driven UI era, where when you go to the Netflix homepage, you will get a UI that is completely personalized to you. But that's still mapped to the
08:41
Speaker A
Netflix components and UI elements. Another very good tool that I've seen recently is JSON Render. JSON Render is being built by Vercel. And it is a way to map your components using JSON and also YAML.
08:56
Speaker A
They released the YAML support recently. And create all of these very dynamic, very good UI interactions that you can use today.
09:13
Speaker A
But now JSON Render still, they say, is constrained to your static components. And yes, they're still static components. The LLM is not generating these components. The LLM is generating the JSON. However,
09:23
Speaker A
But now JSON Render still, they say, is constrained to your static components. And yes, they're still static components. The LLM is not generating these components. The LLM is generating the JSON.
09:40
Speaker A
However, I think in at this point in time, declarative generative UI is probably the perfect balance today in terms of flexibility and consistency.
09:51
Speaker A
Because you would still want your design system. You still want to have uh, predictability of what the UI is going to be generated, also faster and also potentially cheaper at this point. Uh, so you don't create and use a lot of
10:04
Speaker A
tokens to create the UI. But as I mentioned at the beginning, why why are we still stuck here?
10:10
Speaker A
And what's the next level? I think the next level will be uh, generative components.
10:15
Speaker A
And generative components uh, goes into like the the premise I I put at the beginning where the models are good at writing front-end code. They are good at writing React. They're good at React creating in JavaScript, CSS.
10:27
Speaker A
And the question is why we don't let them just write that on demand at runtime.
10:35
Speaker A
What could possibly go wrong with that, right? This model um, of generating the UI uses the agent capabilities. And in this case, you can also use a tool call, but instead of calling this uh, layout rendering engine, you can call the same
10:50
Speaker A
model with reverse sampling or you can call another model that will generate the HTML, CSS, JavaScript on demand and then it will be passed to the client.
11:00
Speaker A
I did this experiment um, I work at Postman on this experiment where I created uh, this weather agent that goes to the API, the weather API. It creates a joke. It creates the HTML, CSS, JavaScript, all in one tool call.
11:14
Speaker A
And you get presented with this random but very uh, imaginative UI where everything is created by the agent.
11:22
Speaker A
There is no component. There is no translation. So there is of course a problem with this approach. Uh, and the problem with this approach is uh, if we don't trust third-party code, well, we should not trust um, code that has been generated by LLMs and
11:39
Speaker A
then just present it to the user. Uh, generative UI and and this level of generative UI needs a distribution model.
11:47
Speaker A
And this distribution model requires a boundary, requires containment and requires a sandbox, which is what we were talking about earlier.
11:56
Speaker A
This is what I think MCP apps matter a lot because MCP apps are the best uh, delivery mechanism uh, for generative UI.
12:06
Speaker A
We have the features provided by MCP, including authentication and tool calling and message passing between the UI and the agent. Uh, it's sandboxed by default with that double iFrame. Is the default for third-party UI delivery today. Does this become the standard?
12:23
Speaker A
And one interesting thing is it's not just for for third-party UI. It can also be used for first-party UI.
12:32
Speaker A
And this is why I think what Anthropic is doing with the the visualizer feature is very interesting strategically speaking because they could have just created their own rendering um, and an architecture mechanism for delivering this in interaction in in
12:49
Speaker A
cloud, but they decided to go with MCP apps because MCP apps provide most of those um, features that I mentioned earlier uh, out of the box.
12:59
Speaker A
So if Anthropic decided to use MCP apps for their first-party UI, uh, you can ask yourselves why cannot we do the same? It is um, a very very strong protocol. And especially when the UI is being generated on the fly by the agents, by
13:13
Speaker A
the the code uh, coding models, then is the best um, mechanism for delivery. Now, today is probably not the final form.
13:26
Speaker A
And people keep saying is chat the final form? Is MCP apps the final form?
13:33
Speaker A
We're still we're still trying to figure this out. And the obvious future is probably too obvious.
13:41
Speaker A
And we said about where is my Jarvis? Where is my floating windows? And if you think about it, that's the obvious things thing that people would think how we would look like if we were generating or creating a new uh, user
13:56
Speaker A
interaction. But what I think is we don't have enough imagination yet. And and this analogy I heard recently is very interesting when when uh, radio came out in the 30s, um, the the the first um, sorry, the the
14:14
Speaker A
TV came out, the first uh, TV shows were radio shows with cameras because they could not imagine what you could do with this new technology. So this new technology that we have today is very similar when television came out
14:29
Speaker A
and we are still in the radio era where we don't know all the amazing things that we will do in the future with this new media, with this new power that we have with the with the new computer.
14:41
Speaker A
And we can see that we cannot even imagine what it's going to look like.
14:46
Speaker A
Uh, of course this is a speculative, but what do I think is actually going to happen is we are going to be moving uh, beyond components and more towards a collaboration uh, through human agent collaboration.
15:00
Speaker A
If you haven't heard about the Excalidraw MCP app, uh, definitely check it out because the Excalidraw MCP app is not just for um, output and visualization of diagrams.
15:12
Speaker A
The Excalidraw MCP app does something very interesting, which it creates a a shared artifact.
15:18
Speaker A
It creates a canvas where a human and an agent can collaborate together into a shared space where you can go back and forth with the agent and ask, you know, change this, but you can also click around, modify the UI the way that you
15:34
Speaker A
are used to. And that becomes the new way of interacting, the new way of experiencing the the agent um, powers.
15:44
Speaker A
And and at the moment again, we are very constrained to our imagination. And and I believe these agents are very very powerful for just to just use them as a orchestrator and a delivery mechanism to show me some visualizations.
15:59
Speaker A
So I believe beyond components, it will be the future of generative UI will be more a collaborative experience where yes, we will have some generative UI, but that UI is going to be super personalized and it's going to be
16:14
Speaker A
collaborative. So we we are still early. Um, we don't have the answer. People say, you know, what is what is the future of the user interaction, the user interfaces? We don't know yet.
16:29
Speaker A
But we can shape that future uh, and create this uh, new computer. And that's me. Thank you so much uh, for listening to this.
16:38
Speaker A
[applause] You can find me and ask any questions. Thank you.
Topics:generative UIMCP appsAI-generated UIdeclarative UIstatic componentsChatGPTGPT 5.2Opus 4.5JSON RenderPostman

Frequently Asked Questions

What is the main focus of Ruben Casas' talk?

The talk focuses on the evolution and future of generative UI for MCP apps, highlighting how AI models are transforming UI generation from static to more dynamic and personalized interfaces.

What are MCP apps and their role in future UI design?

MCP apps are super apps designed to consolidate multiple third-party UIs into one agent environment, potentially serving as a centralized interface for interacting with various services.

How does declarative UI differ from static UI in this context?

Declarative UI involves AI generating descriptors like JSON or YAML that map to predefined static components, enabling more dynamic, personalized, and flexible UI rendering compared to purely static component rendering.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

Or transcribe another YouTube video here →