How we Claude Code — Transcript

Workshop on how Anthropic uses Cloud Code with Claude, focusing on agent capabilities, verification, and best practices.

Key Takeaways

More capable AI agents require less upfront constraint and more flexible prompting.
Using HTML files for specs enhances verification and human readability compared to markdown.
Effective prompting and verification are crucial to optimize agent performance and reduce errors.
Modes like fast and auto mode help manage agent effort and token consumption efficiently.
Continuous integration and automated verification improve reliability of agent-driven workflows.

Summary

Introduction to Cloud Code and workshop setup with interactive coding and repo cloning.
Discussion of agent capabilities improving over time and the importance of adapting workflows accordingly.
Reference to a related talk by Tar and the blog post 'The Unreasonable Effectiveness of HTML Files'.
Emphasis on moving from markdown to richer HTML files for better verification and human interaction.
Explanation of the 'bitter lesson' and how more capable models reduce the need for hard constraints.
Guidance on prompting techniques to reduce ambiguity and improve agent performance.
Introduction to modes like fast mode and auto mode for managing agent effort and token usage.
Demonstration of verification processes, including running tests and interpreting failures with Opus 4.7.
Discussion on ergonomic engagement with agent-generated content and continuous integration testing.
Addressing concerns about token efficiency with HTML specs and benefits of richer specifications.

Chapters

Full Transcript — Download SRT & Markdown

Speaker A

Hello, hello, hello. Welcome all. Thank you for joining the workshop. We have a pretty, pretty full house here today. I'm very pleased to see that. Um, quick show of hands. Who here loves Cloud Code?

Speaker A

Okay. Yeah, you're in the right place. Yeah, lovely to meet you all. My name is Ara. I'm a member of the applied AI team. Uh, I'm an architect there. I'm here today to tell you how we cloud code at Anthropic.

Speaker A

Okay. Yeah, you're in the right place. Yeah, lovely to meet you all. My name is Ara. I'm a member of the Applied AI team. Uh, I'm an architect there. I'm here today to tell you how we Cloud Code at Anthropic.

Speaker A

So, make sure to grab that. Uh there'll be people floating around that might support you. So, if you have technical difficulties, we can we can help you out. Um anybody here in this room who hasn't used cloud code before?

Speaker A

Do a workshop and, uh, you can code along. You'll get some credits. I think there'll be a QR code in just a moment.

Speaker A

set yourselves up. There'll be a repo here as well that you can clone. Uh, there are three phases in it. We're going to work through. There's an interesting verification setup that you can work through alongside me. Uh, and

Speaker A

So, make sure to grab that. Uh, there'll be people floating around that might support you. So, if you have technical difficulties, we can help you out. Um, anybody here in this room who hasn't used Cloud Code before?

Speaker A

What we're covering today is based on uh this version of the talk that Tar gave uh in San Francisco just about a week and a half ago. Who here follows Tar on uh on Twitter? Fantastic. Great. Yeah.

Speaker A

Okay, great. Good. Good. Because this is, uh, there's more to do. There's more to, more, more to find out. I'm happy you're here. Let's get started. So, let me get my flicker. Ah, yes. Excellent. Yes, please, please, please grab the QR code,

Speaker A

to go along uh enjoy that along the way and later on uh in the in the week as well. Where are we now? Where is this all going? I think everybody probably notices that agents are becoming more and more capable. Why is that happening?

Speaker A

Set yourselves up. There'll be a repo here as well that you can clone. Uh, there are three phases in it. We're going to work through. There's an interesting verification setup that you can work through alongside me. Uh, and

Speaker A

to change our way of working. We have to change our habits. That's part of what today is about. How can you change the way that you're working with cloud code to get more out of it?

Speaker A

Is quite detailed, so you probably will want to investigate it a bit later yourselves.

Speaker A

so that's where the idea came from to to to push more and frontload more of the verification that the human would do in the spec into an HTML file because it's a more rich and more um more human

Speaker A

What we're covering today is based on, uh, this version of the talk that Tar gave, uh, in San Francisco just about a week and a half ago. Who here follows Tar on, uh, on Twitter? Fantastic. Great. Yeah.

Speaker A

There are a bunch of things around agents that are worth reading. We have a lot of engineering blogs on our web page. Um, we also have the the standard blogs. You should check all of those out. There'll be a lot of information

Speaker A

Yeah. So he published that as a blog post called "The Unreasonable Effectiveness of HTML Files." And, um, he's basically pitching that moving on from markdown files, we're going to be using HTML instead. And we're showing some of that today. Like I said, there's a repo

Speaker A

that's quite interesting and that'll be a bit more in-depth. The first thing is the more capable the models get, the more you should try to resist constraining them. Who here, by the way, has heard of the bitter lesson

Speaker A

To go along, uh, enjoy that along the way and later on, uh, in the, in the week as well. Where are we now? Where is this all going? I think everybody probably notices that agents are becoming more and more capable. Why is that happening?

Speaker A

authored by him. And um his idea was basically that you know you could spend all your time trying to with your human capabilities hard code up front and constrain the system but in end the end pouring more data and more compute at it

Speaker A

That's happening because the models are becoming more and more capable. And so if the models become more capable, it means agents can run for longer and you can give them more and more complex tasks. But that also means that we have

Speaker A

at extracting requirements from you than you are at defining your requirements. The the requirements are latent within you. Just like when you talk to your users, your users, they have an idea of they know it when they see it, but

Speaker A

To change our way of working. We have to change our habits. That's part of what today is about. How can you change the way that you're working with Cloud Code to get more out of it?

Speaker A

are in specifying it to Claude. That's another direction to take this in. So, we'll talk about that removing ambiguity, letting Claude prompt you and interview you in prompting. Then, how to then understand and plan. We used to do

Speaker A

If you're going to let your agent run for a longer period of time, then you can burn through a lot of tokens if it does the wrong thing. And you want to avoid that ideally at the beginning. And

Speaker A

seems like that format is a bit constrained it's getting too long a lot of lines of markdown file to read condense more information into HTML files and that's what we're going to do today and then how to verify

Speaker A

So that's where the idea came from to, to, to push more and frontload more of the verification that the human would do in the spec into an HTML file because it's a more rich and more, um, more human

Speaker A

That's where this is all going as well, right? The agents are going to be doing more and more of this natively and how can you set the artifacts that you produce up to natively be testable and verifiable in the way that you need.

Speaker A

Ergonomic way of engaging with the content that your agent is going to be building for you and we'll see how that works.

Speaker A

in a bit. Uh let's say we want to do a build splitting app. Uh very simple. Um you know, uh you want to you go out with friends, you want to find out who owes what. Yeah. Um let's let Claude

Speaker A

There are a bunch of things around agents that are worth reading. We have a lot of engineering blogs on our web page. Um, we also have the standard blogs. You should check all of those out. There'll be a lot of information

Speaker A

Bad prompting is when you say just make it better. And a lot of people that I watch uh using cloud code just type make it better. Um make no mistakes. Yeah.

Speaker A

There. There's a lot of interesting information about harnesses, about long-running agents. All of these things are important. Today, we'll focus on three levels. Um, and there'll be something a bit more basic, something a bit more next level, and then something

Speaker A

Right? That's what makes this good pump on the side different and better. Um, you know, focusing on the audience, for example, or uh suggesting an open-ended way to answer the question as opposed to predefining it up front and then that

Speaker A

That's quite interesting and that'll be a bit more in-depth. The first thing is the more capable the models get, the more you should try to resist constraining them. Who here, by the way, has heard of the bitter lesson

Speaker A

Who here has used fast mode? Okay, not that many people use fast mode. That's why I set it up here. Um, and who here uses auto mode?

Speaker A

By Richard Sutton? Great. Fantastic. This is, this is great. So, I'll, I'll, I'll for those who haven't raised their hands, uh, Richard Sutton is basically the father of reinforcement learning. If you're reading a book about reinforcement learning, it's probably

Speaker A

Who here is setting their effort parameter? Good, good. Our recommendation is X high, but you can also set max effort. I mean, in my case, I think I kept it at X high for this. Um, yeah, yeah, yeah.

Speaker A

Authored by him. And, um, his idea was basically that, you know, you could spend all your time trying to, with your human capabilities, hard code up front and constrain the system but in the end, pouring more data and more compute at it

Speaker A

mode, which we cycle into like this. Shift tab. Yeah, auto mode is the best.

Speaker A

Ends up getting more capability than anything that you could have come up with. And there's a similar analogy here, right? The models are becoming more and more capable. And so you should accept that the model is probably better

Speaker A

going to ask me uh you tab through these, right? Like you'll tab through these like this. Uh, and in my case, I want to uh I want it just for friends.

Speaker A

At extracting requirements from you than you are at defining your requirements. The requirements are latent within you. Just like when you talk to your users, your users, they have an idea of, they know it when they see it, but

Speaker A

question tool that you saw me use in the prompt earlier before. Let me submit those answers. and then it'll write that spec. Right? So you you saw me in my prompt explicitly referring to the ask user question tool. That's what

Speaker A

They're often not very good at articulating what they need. And likewise, you probably know what you want when you see it, but Claude is likely better at extracting what you want and what you need from you than you

Speaker A

It's going to take a while. So we'll go back to my slides. Could I go back to the slides, please? Great. So, let's say we have a plan. Generates a plan. We've answered some questions. It It's getting better at extracting turn by turn from

Speaker A

Are in specifying it to Claude. That's another direction to take this in. So, we'll talk about that removing ambiguity, letting Claude prompt you and interview you in prompting. Then, how to then understand and plan. We used to do

Speaker A

what the thing is going to look like. You can even use a screenshot with it.

Speaker A

This all with, with markdown files. We still do it with markdown files. A colleague of mine once said the markdown file is the lingua franca of the AI native software development life cycle. Thought it was pretty poetic, um, but it

Speaker A

get more than about 200 lines long, it's unlikely you're going to read it and certainly unlikely that your colleagues are going to read them. Before we started here, I had Claude with Opus 4.7 generate a few examples for me of what

Speaker A

Seems like that format is a bit constrained. It's getting too long, a lot of lines of markdown file to read, condense more information into HTML files and that's what we're going to do today and then how to verify

Speaker A

So we have them here. The prompt I used is in the repo that you'll have access to. I asked proud give me a few different directions, four different design directions, explore them, generate them as HTML and let me

Speaker A

Not test actually but just to verify in this context, um, how to make verification native to the thing itself so that the agent can drive it alongside a human or eventually not headlessly as well.

Speaker A

Uh, and then this one, this one might look like this. Completely different aesthetic, right? And I click around and and this is much better for me to give feedback to Claude on than it would be to just try to infer from a markdown

Speaker A

That's where this is all going as well, right? The agents are going to be doing more and more of this natively and how can you set the artifacts that you produce up to natively be testable and verifiable in the way that you need.

Speaker A

that. That's very good. Um, especially when you're when you're doing front end, it's really hard to articulate like the thing is slightly off or there's a misalignment here and it's like you'll find that you run them into the limits

Speaker A

So, we'll do an example. By the way, has everybody had a chance to get, grab that QR code and, uh, set themselves up? Very good. There'll be a link to the repo eventually as well. I think that comes

Speaker A

Great. So, a few examples. Tar has a lot more in his repo which you'll find in the repo as well online. So, so far we've covered letting Claude extract information from you interactively as an interviewer because the longer you let an agent run, the

Speaker A

In a bit. Uh, let's say we want to do a bill splitting app. Uh, very simple. Um, you know, uh, you want to, you go out with friends, you want to find out who owes what. Yeah. Um, let's let Claude

Speaker A

Then what's a better and more efficient and ergonomic form factor? Um it would be the HTML file.

Speaker A

Interview us around doing this. So in this case, I'm actually going to, yeah. Before I do that, give you an example of how you would do this and how you could do this better. Yeah. What's good prompting? What's bad prompting?

Speaker A

I think we have uh I think there'll be one slide where you'll get to see the actual slides to to engage with uh the actual URL to engage with. So what you want to do is make it part of the artifact and that's

Speaker A

Bad prompting is when you say just make it better. And a lot of people that I watch, uh, using Cloud Code just type make it better. Um, make no mistakes. Yeah.

Speaker A

React app um with with components that you will have seen before, but remixed in a way to make it easy for Claude as an agent to extract the data contracts in the DOM.

Speaker A

Yeah. Yeah. That's not good prompting. Um, you want to encourage Claude to extract from you specific details. Give, give the domains. Don't overspecify the outcome but specify the areas that you are interested in.

Speaker A

get run through that get a recording and that's then put on S3 or shared somewhere else with colleagues and that's how you turn these verification steps into something that is generated by the agents and then that you have

Speaker A

Right? That's what makes this good prompting different and better. Um, you know, focusing on the audience, for example, or, uh, suggesting an open-ended way to answer the question as opposed to predefining it up front and then that

Speaker A

to go to that link there'll be a repo this is the repo under CWC workshops cloud with code workshops and there's the one for today's session which is how we claw code.

Speaker A

Will prompt Claude to iteratively interview. All right. So, um, I've got a cloud open here. Two different clouds.

Speaker A

Claude. The second one is generating four different HTML design directions that we could explore uh for me to then feed back on or for me to decide. And then thirdly, we have a a verification framework here. Uh and it's in a

Speaker A

Who here has used fast mode? Okay, not that many people use fast mode. That's why I set it up here. Um, and who here uses auto mode?

Speaker A

the readme for phase three and then also in the verification detail as well which is here. So if you want the deep dive on how this is set up, you can read this and it's all provided in the repo. It's

Speaker A

Oh, I'm very happy that you all use auto mode. You need to be using auto mode. If you're not using auto mode, you need to be using auto mode. It makes it so much easier. Um, yeah, use auto mode. Good.

Speaker A

I can add an item test. It's hard to see. There we go. Very good. And then I could take it off and drop it as well. And I could clear the finished items. And so many things are happening here in terms of the state.

Speaker A

Who here is setting their effort parameter? Good, good. Our recommendation is X high, but you can also set max effort. I mean, in my case, I think I kept it at X high for this. Um, yeah, yeah, yeah.

Speaker A

going to do this in a human readable way. Um but then that same way of doing it in a human readable dashboard we're going to verify uh in an agent first way uh and let claw do that separately and

Speaker A

So, just to touch on how those look, right? Like we've got forward slash effort, which is the effort parameter. Forward slash fast, which is the fast mode, which I've turned it on, and then, uh, we have, uh, the auto

Speaker A

repo versus here but the the principle is the same. So human readable, agent driven if you're doing it from from plot code or somewhere else and then also just generally headless like you might do it in in CI. We'll have a look at it

Speaker A

Mode, which we cycl

Speaker A

done and active. Um the component itself here is publishing its state to the DOM.

Speaker A

Uh and so if I change my if I change my state here if I might add test again.

Speaker A

See that updates and I drop it. It updates again. So this is what the agent can read later as opposed to having to scrape the DOM.

Speaker A

We can just if you publish the state here separately from the act internals, you'd be able to run the verification independently of what whatever the state of the app is. Um and that's how this is set up to work. Uh and then you can run

Speaker A

this further down the line as well. Each one of them gets this, right? If you look at um if I if I pull up the That's a bit hard to read. Let's do that here. Um, every every component gets one

Speaker A

of these, right? There'll be schemas, there'll be fixtures, um, the known states and then then the invariance. Uh, there are particular ones here that that always have to roll, always have to hold. You'll test them with probes and

Speaker A

we'll do one example which we've hardcoded. is we've hardcoded an example that will fail the verification and we'll let the human verified dashboard catch it but then we'll also let the agentic way of doing it cach it as well

Speaker A

and we can let cloud code find it and diagnose it for us. So that same approach also works here.

Speaker A

So I've got the dashboard defined here and what we see here is there we go. Um like I said we've got the different schemas the the invariants that we've defined and we can run these right individually. We can see how they would

Speaker A

execute here. Got an example here or also I could run all of them. In fact I'm going to do that now. I run them all. One of them triggers artificially because you planted it before.

Speaker A

I'm going to scroll down and find out what that is. And it's here. Uh we will actually look at how we can replicate this for an agent as well. And we would do that like this. go here and we can run

Speaker A

basically the manifest of all the different verification steps that are defined here at state in the DOM. We can get them in the same way that we have them here. We expand that. It's getting a little bit small.

Speaker A

as an example like that and then like that as well. Great. If we do that then we can also actually run them. Uh just specifically here uh I'll do it manually first. Let me reply.

Speaker A

Uh, we play them all. Close that. In this case, we're running this not to perform the verification, but to provide the evidence of the verification. Um, we can later on record this. Um, we can record these as clips which would just

Speaker A

be videos that we capture and then we would uh store them, share them with colleague, put them on S3 or whatever it might be.

Speaker A

Let me pause that here. There we go. So, here's our summary. We have one that's deliberately failed.

Speaker A

I'll explain that in a second. But in each case, you can also see the details of how this was done.

Speaker A

The key thing here is that we want to um it's important to include the probes uh to to push off the happy path and then a lot of this will be generated by Claude for Claude, right? So there's

Speaker A

there's a way of scaling this further. Um in the end, we'll go to the one that didn't work. It's the one where we hardcoded that the sums don't match. In this case, the the state doesn't match.

Speaker A

Uh 3 plus uh uh 3 + 4 does not equal 10 in this case.

Speaker A

And we'll get the same result if we run verify all from from the claw here as well. We'll let Claude do that in a moment as well.

Speaker A

There we go. Then it's a little bit hard to see that there's pass and fill. So there we go. Okay, good.

Speaker A

We can change this, right? I can The point here is that I'm I'm trying to show that the the idea is to let something agent native be read as the DOM contract here. Um that's what we established earlier that the state is

Speaker A

managed here or well not managed but at least um viewable to an agent that it can then run the verification end to end itself. And we can let Claude do that afterwards as well.

Speaker A

In this case, we can also make a change that will break more. Let me see if I can break the chain. I'll break the contract but not the app. And I'll do that here.

Speaker A

So I could change for example here under is that under apps total stats. I'll delete that. I'll do undo that afterwards as well.

Speaker A

And if we now we run, then all of these ones at the bottom here will fail.

Speaker A

And we'll get the same when we run this again from here as well. Oops.

Speaker A

There we go. All of these are failing. Not because we broke the app, but because we broke the contract.

Speaker A

That clock can then natively verify Excellent. Great. I want Claude to tell me what's going on with these ones. At least, well, not the ones that I broke, but I'll undo the ones that I broke, but I'd like it to

Speaker A

tell me what's going on with this one. And so let me correct the change that I made before.

Speaker A

Control Z. Auto save. Rerun. And there we go. So we've done it manually and we've shown how we can match what the agent would see versus what we would do. But we can let Claude run this headlessly itself as well. So I'll do that and I'll

Speaker A

pull that open. Let it run. This one's broken deliberately. And the rest works. Great. So we have cloud running here. I mentioned fast mode before. I mentioned auto mode which is great. I mentioned forward/goal.

Speaker A

In this case, what we're going to do is let Opus 4.7 help us find out why that particular verification failed.

Speaker A

I've already connected the playright MCP for this uh and it's going to use that and it's going to run. Ska got rejected.

Speaker A

4 plus 3 does not equal 10. Uh yes, if you were to run bun verify uh run bun verify then you would actually pass the tests in this case. Uh because we've deliberately put the verification in wrong just to demonstrate that then the

Speaker A

the test matrix itself will actually pass. Um, but the idea here is to separate out what you could do as a human versus then what you could do v agent directly from the browser using basically these the commands that you

Speaker A

saw and then also what you could run headlessly directly from the CLI. If you wanted to do it um like this as well then you could record the outcome that we talked about.

Speaker A

Yes. So here I' in this in particular setup you've actually got the ability to to show how you would record these u the same delayed running that we've shown you could run and record as evidence basically to show that it would work um

Speaker A

and you could store that uh and it could run like that this is very common right now. Um the the cloud code team uh records basically all the code changes that they do like this um all the front end changes at

Speaker A

least uh especially on the around the at the pace of the shipping that we have at the moment.

Speaker A

Are people able to pull up the the repo and get the verification setup working?

Speaker A

Yeah, very good. Very good. Yes, you can store them. I mean like you can just put them in S3 or whatever or short share them with colleagues or um in our case we have a version we have a

Speaker A

we have a internal this is more automated than what I'm showing here. Uh in our case we do record them. Um not certain for how long or in what context but it's part of a regular cadence.

Speaker A

And then you get the in fact you get the I should have triggered that.

Speaker A

Did I see that? There we go. So you can you have each clip you could download all of them or download them ind them individually and then that's basically the bundle that that that proves the verification worked. Um,

Speaker A

so I guess to bring it around to what we've covered so far and then and where it's going, three different surfaces, the the human surface, right? the um and then the the agent first from the browser um mapping

Speaker A

on the same way and then you could also run it in CI with just run bun verify.

Speaker A

The objective here at the end is to figure out how do you embed the verification into the artifact itself.

Speaker A

There's more detail. I encourage you to check out the repo itself um and actually run a few examples yourself and run a few tests, right? You can change the code, see what breaks, rerun it yourselves. Um it's quite detailed and

Speaker A

it's what Tar and and team use in the cloud code team uh for their work already. This is a this is this came uh pretty quickly um just a week and a half ago into this demo. Um so encourage you

Speaker A

to check this out. What's new really is the remixing and the the new arrangement of of of of primitives that you're already familiar with that you're already using just to make it available to the agent first.

Speaker A

That concludes really what I wanted to cover. Um I encourage you to spend more time on the repo. There's great documentation there and you can actually um get a lot out of it with Opus 4.7. Opus 4.7 works really well because it has a

Speaker A

better vision model. That's where this really excels. Um if you use Sonnet, I I wouldn't recommend that. So try using Opus 4.7 for it. Um try using fast mode for it. Fast mode is great, costs more, but it's great for iterating quickly on

Speaker A

specs. People will sometimes ask, well, isn't a HTML spec um more token inefficient? And the answer tends to be no. Uh and the reason is that in the long term you iterate less if you have a good and rich HTML spec

Speaker A

even if on oneoff instances you spend more tokens to generate it. So um you can even try it with fast mode. So that would be it. Uh that's my recommendation to you. I enjoyed uh speaking to you and

Speaker A

thank you for your attention.

Topics:Cloud CodeClaudeAnthropicAI agentsHTML specsVerificationPrompt engineeringFast modeAuto modeApplied AI

Frequently Asked Questions

What is the main focus of this workshop?

The workshop focuses on how Anthropic uses Cloud Code with Claude, emphasizing agent capabilities, verification processes, and improving workflows with richer HTML specifications.

Why does the workshop recommend using HTML files over markdown for specs?

HTML files provide a richer, more human-readable format that allows for better frontloading of verification tasks and improved interaction with AI agents compared to markdown.

What are fast mode and auto mode in the context of Cloud Code?

Fast mode and auto mode are operational modes that help manage the agent's effort and token usage, optimizing performance and efficiency during longer agent runs.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

App Store Google Play

Or transcribe another YouTube video here →