Don’t Build Agents, Build Skills Instead – Barry Zhang … — Transcript

Barry Zhang and Mahesh Murag discuss why building skills, not agents, is the future of AI, focusing on code as a universal interface.

Key Takeaways

Building skills instead of agents addresses the lack of domain expertise in AI agents.
Code is the key universal interface that enables scalable and modifiable agent capabilities.
Agent skills package domain knowledge in reusable, composable folders with scripts.
The skills ecosystem is rapidly expanding and supports integration with existing workflows.
Skills empower enterprises and developers to customize AI agents for specific needs efficiently.

Summary

Agents today have intelligence but lack domain expertise needed for real work.
The traditional approach of building separate agents per domain is inefficient.
Code serves as a universal interface to the digital world, enabling more scalable agent design.
Cloud Code is a general-purpose coding agent that inspired the shift to building skills.
Agent skills are organized folders containing composable procedural knowledge and scripts.
Skills allow agents to absorb domain expertise, learn over time, and improve consistency.
The skills ecosystem has rapidly grown with thousands of skills created in weeks.
Skills can include complex software, executables, scripts, and assets, making them highly versatile.
Skills are compatible with existing tools like Git, Google Drive, and can be shared easily.
Skills are categorized into foundational, third-party, and enterprise-specific, enabling broad adoption.

Chapters

Full Transcript — Download SRT & Markdown

Speaker A

[music] All right, good morning, and thank you for having us again. Last time we were here, we were still figuring out what an agent even is. Today, many of us are using agents on a daily basis. But we still notice gaps. We still have slots, right? Agents have intelligence and capabilities, but not always the expertise that we need for real work. I'm Barry. This is Mahes. We created agent skills. In this talk, we'll show you why we stopped building agents and started building skills instead.

Speaker A

A lot of things have changed since our last talk. MCP became the standard for agent connectivity. Cloud Code, our first coding agent, launched to the world and our cloud agent SDK now provides a production ready agent out of the box. We have a more mature ecosystem and we're moving towards a new paradigm for agents. That paradigm is a tighter coupling between the model and a runtime environment. Put simply, we think code is all we need. We used to think agents in different

Speaker A

A lot of things have changed since our last talk. MCP became the standard for agent connectivity. Cloud Code, our first coding agent, launched to the world, and our cloud agent SDK now provides a production-ready agent out of the box. We have a more mature ecosystem, and we're moving towards a new paradigm for agents. That paradigm is a tighter coupling between the model and a runtime environment. Put simply, we think code is all we need. We used to think agents in different

Speaker A

Think about generating a financial report. The model can call the API to pull in data and do research. It can organize that data in the file system. It can analyze it with Python and then synthesize the insight in old file format all through code. The core scaffolding can suddenly become as thin as just bash and file system which is great and really scalable. But we very quickly run into a different problem and that problem is domain expertise. Who do you want doing your taxes? Is it

Speaker A

domains would look very different. Each one would need its own tools and scaffolding, and that means we'll have a separate agent for each use case, for each domain. Well, customization is still important for each domain. The agent underneath is actually more universal than we thought. What we realized is that code is not just a use case but the universal interface to the digital world. After we built Cloud Code, we realized that Cloud Code is actually a general-purpose agent.

Speaker A

They can do no more slow. They can do amazing things when you really put in the effort and give proper guidance, but they're often missing the important context up front. They can't really absorb your expertise super well, and they don't learn over time. That's why we created agent skills. Skills are organized collections of files that package composable procedural knowledge for agents. In other words, they're folders. This simplicity is deliberate. We want

Speaker A

Think about generating a financial report. The model can call the API to pull in data and do research. It can organize that data in the file system. It can analyze it with Python and then synthesize the insight in a file format, all through code. The core scaffolding can suddenly become as thin as just bash and file system, which is great and really scalable. But we very quickly run into a different problem, and that problem is domain expertise. Who do you want doing your taxes? Is it

Speaker A

and are pretty ambiguous and when the model is struggling, it can't really make a change to the tool. So, it's just kind of stuck with a code start problem and they always live in the context window. Code solves some of these issues. It's self-documenting. It is modifiable and can live in the file system until they're really needed and used. Here's an example of a script inside of a skill. We kept seeing Claude write the same Python script over and over again to apply styling to slides.

Speaker A

going to be Mahesh, the 300 IQ mathematical genius, or is it Barry, an experienced tax professional, right? I would pick Barry every time. I don't want Mahesh to figure out the 2025 tax code from first principles. I need consistent execution from a domain expert. As agents today are a lot like Mahes. They're brilliant, but they lack expertise.

Speaker A

just to indicate that he has the skill. When an agent needs to use a skill, it can read in the rest of the skill.md, which contains the core instruction and directory for the rest of the folder. Everything else is just organized for ease of access. So that's all skills are. They're organized folders with scripts as tools. Since our launch five weeks ago, this very simple design has translated into a very quickly growing ecosystem of thousands of skills. And we've seen this

Speaker A

They can do no more slow. They can do amazing things when you really put in the effort and give proper guidance, but they're often missing the important context up front. They can't really absorb your expertise super well, and they don't learn over time. That's why we created agent skills. Skills are organized collections of files that package composable procedural knowledge for agents. In other words, they're folders. This simplicity is deliberate. We want

Speaker A

really excited to see people like Cadence build scientific research skills that give Claude new capabilities like EHR data analysis and using common Python bioinformatics libraries better than it could before. We've also seen partners in the ecosystem build skills that help Claude better with their own software and their own products. Browserbase is a pretty good example of this. They built a skill for their open- source browser automation tooling, stage hand. And now

Speaker A

something that anyone, human or agent, can create and use as long as they have a computer. These also work with what you already have. You can version them in Git, you can throw them in Google Drive, and you can zip them up and share with your team. We have used files as a primitive for decades, and we like them. So why change now? Because of that, skills can also include a lot of scripts as tools. Traditional tools have pretty obvious problems. Some tools have poorly written instructions

Speaker A

are using skills as a way to teach agents about their organizational best practices and the weird and unique ways that they use this bespoke internal software. We're also talking to really large developer productivity teams. These are teams serving thousands or even tens of thousands of developers in an organization that are using skills as a way to deploy agents like cloud code and teach them about code style best practices and other ways that they want their developers to work internally.

Speaker A

and are pretty ambiguous, and when the model is struggling, it can't really make a change to the tool. So, it's just kind of stuck with a cold start problem, and they always live in the context window. Code solves some of these issues. It's self-documenting. It is modifiable and can live in the file system until they're really needed and used. Here's an example of a script inside of a skill. We kept seeing Claude write the same Python script over and over again to apply styling to slides.

Speaker A

So, as this ecosystem has grown, we've started to observe a couple of interesting trends. First, skills are starting to get more complex. The most basic skill today can still be a skill.md markdown file with some prompts and some really basic instructions, but we're starting to see skills that package software, executables, binaries, files, code, scripts, assets, and a lot more. And a lot of the skills that are being built today might take minutes or hours to build and put into an agent.

Speaker A

So we just asked Cloud to save it inside of the skill as a tool for his version, for his future self. Now we can just run the script, and that makes everything a lot more consistent and a lot more efficient. At this point, skills can contain a lot of information, and we want to protect the context window so that we can fit in hundreds of skills and make them truly composable. That's why skills are progressively disclosed. At runtime, only this metadata is shown to the model,

Speaker A

MCP is providing the connection to the outside world while skills are providing the expertise. And finally, and I think most excitingly for me personally, is we're seeing skills that are being built by people that aren't technical. These are people in functions like finance, recruiting, accounting, legal, and a lot more. Um, and I think this is pretty early validation of our initial idea that skills help people that aren't doing coding work extend these general agents

Speaker A

just to indicate that it has the skill. When an agent needs to use a skill, it can read in the rest of the skill.md, which contains the core instruction and directory for the rest of the folder. Everything else is just organized for ease of access. So that's all skills are. They're organized folders with scripts as tools. Since our launch five weeks ago, this very simple design has translated into a very quickly growing ecosystem of thousands of skills. And we've seen this

Speaker A

a file system and the ability to read and write code. This agent, as many of us have done throughout this year, can be connected to MCP servers. And these are tools and data from the outside world that make the the agent more relevant and more effective. And now we can give the same agent a library of hundreds or thousands of skills that it can decide to pull into context only at runtime when it's deciding to work on a particular task. Today, giving an agent a new capability

Speaker A

be split across a couple of different types of skills. There are foundational skills, third-party skills created by partners in the ecosystem, and skills built within an enterprise and within teams. To start, foundational skills are those that give agents new general capabilities or domain-specific capabilities that it didn't have before. We ourselves, with our launch, built document skills that give Claude the ability to create and edit professional quality office documents. We're also

Speaker A

effective for professionals in each of these domains. We're also starting to think about some of the other open questions and areas that we want to focus on for how skills evolve in the future as they start to become more complex. We really want to support developers, enterprises, and other skill builders by starting to treat skills like we treat software. This means exploring testing and evaluation, better tooling to make sure that these agents are loading and

Speaker A

really excited to see people like Cadence build scientific research skills that give Claude new capabilities like EHR data analysis and using common Python bioinformatics libraries better than it could before. We've also seen partners in the ecosystem build skills that help Claude better with their own software and their own products. Browserbase is a pretty good example of this. They built a skill for their open-source browser automation tooling, Stagehand. And now

Speaker A

refer to either other skills, MCP servers, and dependencies and packages within the agents environment. We think that this is going to make agents a lot more predictable in different runtime environments. and the composability of multiple skills together will help agents like Claude elicit even more complex and relevant behavior from these agents. Overall, these set of things should hopefully make skills easier to build and easier to integrate into agent products, even those besides claude.

Speaker A

Claude is equipped with this skill, and with Stagehand can now go navigate the web and use a browser more effectively to get work done. And Notion launched a bunch of skills that help Claude better understand your Notion workspace and do deep research over your entire workspace. And I think where I've seen the most excitement and traction with skills is within large enterprises. These are company and team-specific skills built for an organization. We've been talking to Fortune 100s that

Speaker A

to do useful things. And as you interact with an agent and give it feedback and more institutional knowledge, it starts to get better and all of the agents inside your team and your org get better as well. And when someone joins your team and starts using Claude for the first time, it already knows what your team cares about. It knows about your day-to-day and it knows about how to be most effective for the work that you're doing. And as this grows and this ecosystem

Speaker A

are using skills as a way to teach agents about their organizational best practices and the weird and unique ways that they use bespoke internal software. We're also talking to really large developer productivity teams. These are teams serving thousands or even tens of thousands of developers in an organization that are using skills as a way to deploy agents like Cloud Code and teach them about code style best practices and other ways that they want their developers to work internally.

Speaker A

specifically as a concrete steps towards uh continuous learning. When you first start using cloud, this standardized format gives a very important guarantee. Anything that cloud writes down can be used efficiently by a future version of itself. This makes the learning actually transferable. As you build up the context skills makes the concept of memory more tangible. They don't capture everything. They don't capture every type of information. Just procedural knowledge that cloud can

Speaker A

So all of these different types of skills are created and consumed by different people inside of an organization or in the world. But what they have in common is anyone can create them, and they give agents the new capabilities that they didn't have before.

Speaker A

already create skills for you today using our skill creator skill and we're going to continue pushing in that direction. We're going to conclude by comparing the agent stack to what we have already seen computing. In a rough analogy, models are like processors. Both require massive investment and contain immense potential, but only so useful by themselves. Then we start building operating system. The OS made processors far more valuable by orchestrating the processes,

Speaker A

So, as this ecosystem has grown, we've started to observe a couple of interesting trends. First, skills are starting to get more complex. The most basic skill today can still be a skill.md markdown file with some prompts and some really basic instructions, but we're starting to see skills that package software, executables, binaries, files, code, scripts, assets, and a lot more. And a lot of the skills that are being built today might take minutes or hours to build and put into an agent.

Speaker A

unique points of view. We hope that skills can help us open up this layer for everyone. This is where we get creative and solve concrete problem for ourselves, for each other, and for the world just by putting stuff in the folder. So skills are just the starting point. To close out, we think we're now converging on this general architecture for general agents. We've created skills as a new paradigm for shipping and sharing new capabilities. So we think it's time to stop rebuilding agents and

Speaker A

But we think that increasingly, much like a lot of the software we use today, these skills might take weeks or months to build and be maintained. We're also seeing that this ecosystem of skills is complementing the existing ecosystem of MCP servers that was built up over the course of this year. Developers are using and building skills that orchestrate workflows of multiple MCP tools stitched together to do more complex things with external data and connectivity. And in these cases, MCP

Topics:AI agentsagent skillscode as interfaceCloud CodeAnthropicdomain expertisesoftware automationdeveloper productivityAI ecosystemcomposable skills

Frequently Asked Questions

Why do Barry Zhang and Mahesh Murag suggest building skills instead of agents?

They argue that while agents have intelligence, they often lack domain expertise needed for real work. Building skills allows agents to absorb and apply domain-specific knowledge more effectively.

What role does code play in the new paradigm for AI agents?

Code is considered the universal interface to the digital world, enabling agents to perform tasks by calling APIs, organizing data, analyzing it, and synthesizing insights in a scalable and modifiable way.

How are agent skills structured and used?

Agent skills are organized folders containing scripts and procedural knowledge. They are composable, easy to share, and can be versioned with tools like Git, allowing agents to use them to gain new capabilities.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

App Store Google Play

Or transcribe another YouTube video here →