Yao Shunyu: Let Me Go a Little Crazy! Training Models a… — Transcript

Yao Shunyu discusses AI model development, differences in Silicon Valley AI experts, and the evolving challenges in AI innovation.

Key Takeaways

AI model capabilities are becoming homogenized, shifting focus to problem definition and user experience.
Top-down mechanisms in AI development are unique and difficult for many companies to implement.
Reliability, detail orientation, and responsibility are key traits for success in AI research.
Career paths in AI can be diverse, with backgrounds in physics and computer science converging.
Friendly rivalry and collaboration can coexist among leading AI researchers.

Summary

Yao Shunyu, a researcher at Google DeepMind, shares insights on AI model training and industry dynamics.
There are two prominent Yao Shunyus in Silicon Valley with overlapping careers but different academic backgrounds.
Yao Shunyu transitioned from theoretical physics to AI, working at Anthropic, Gemini, and now DeepMind.
The discussion covers the unique top-down model training approach at Anthropic and challenges faced by other companies like OpenAI and Gemini.
Startups focus on making strategic bets, while big companies have fundamentally different AI development strategies.
AI model capabilities have become commoditized, making differentiation based on user experience and application more important.
The industry is shifting focus from whether AI can do something to defining the right problems to solve.
Yao emphasizes traits like reliability, attention to detail, and responsibility as crucial in AI research.
The two Yao Shunyus maintain a friendly relationship despite frequent comparisons.
The video touches on the evolution of AI benchmarks and how model performance differences are less clear on paper now.

Full Transcript — Download SRT & Markdown

Speaker A

English subtitles were generated by AI and are for reference only. Hello everyone, I'm Xiaojun. Today our guest is Yao Shunyu, a researcher at Google DeepMind. There are two famous Yao Shunyus in Silicon Valley. One previously worked at OpenAI, then jumped ship to Tencent to become their Chief AI Scientist.

Speaker A

He's been on our show before. Today I've invited the other Yao Shunyu. He was previously at Anthropic. Now he's at Google DeepMind. We'll start by talking about the recent series of massive model changes. So next is my interview with Shunyu.

Speaker A

Anthropic as a company, it's able to implement this kind of relatively top-down mechanism. It is something quite unique. But is this difficult for other model companies?

Speaker A

Very difficult. For example, OpenAI can't do it. And Gemini also finds it difficult. Big companies and startups, their strategies are fundamentally different. Because for startups, what's important is making bets. I have to bet on something. I think everyone right now is basically

Speaker A

everyone is a surfer. Fundamentally, it's a wave, not the surfer. But anyway, it just feels like this AI thing doesn't really require much brains. Doesn't require much brains. Really doesn't require much brains. Then what does it require?

Speaker A

I think in this industry, the most important trait is being reliable, being detail-oriented, and taking responsibility for what you do. These are the most important traits. Aren't there two Yao Shunyus in Silicon Valley?

Speaker A

Why don't you first introduce yourself to everyone and then explain to everyone the difference between the two Yao Shunyus? Ah, sure, yeah. So my name is Yao Shunyu. And obviously there's also a friend with an almost identical name (Yao Shunyu, Chief AI Scientist at Tencent, former OpenAI researcher).

Speaker A

And our main career paths also have some overlap (overlap). So it might look very difficult to tell us apart. Yeah, and I used to study physics. I did my undergrad at Tsinghua. I worked on condensed matter theory back then.

Speaker A

Then later went to Stanford to do theoretical high-energy physics and quantum information and black hole-related areas. After leaving Stanford, went to Berkeley, briefly stayed for two weeks as a postdoc (postdoctoral researcher). Then quit and went to Anthropic. Stayed at Anthropic for a year.

Speaker A

Around late September to early October last year, joined Gemini. Yeah, and if everyone insists on telling us apart, I think the biggest difference is that Shunyu, he has always been doing CS from the start, computer science-related stuff, while I actually

Speaker A

in a sense came to this halfway. Yeah, I mainly did theoretical physics before. Yeah. Are you two good friends?

Speaker A

You guys seemed to have known each other since college, and you were in the same year, right? (Yes) What kind of person is he?

Speaker A

What kind of person are you? Evaluate him. Evaluate yourself too. (Hahaha) Yeah, yeah, we knew each other since undergrad because we were in the same year in undergrad at Tsinghua. But he, of course, he studied computer science from the start.

Speaker A

So he was in that Yao Class, the computer science experimental class. And I studied physics. So I was in the Ji Class. Yeah, and later he went to Princeton. I went to Stanford. This might also be another somewhat puzzling point,

Speaker A

which is, it seems like in the general world people think Stanford is where computer science people should go and think Princeton is where physics people should go. But we happened to do the opposite. Haha. So that might also have caused some confusion.

Speaker A

And we really are quite different. I think he's a much more interesting person than I am. I think I've also learned a lot from him in the past as well. I've been able to learn things that are quite different from my own strengths.

Speaker A

For example, he probably spends a lot of time thinking, like in AI. He spends a lot of time thinking about human-AI interaction and also some product-related things. And I think, for me, he's a very different kind of friend.

Speaker A

And I've also learned a lot from him. When you were in Silicon Valley, how often did you meet?

Speaker A

Do you still call each other frequently now? How frequently? We did meet quite frequently when we were in Silicon Valley, maybe every few weeks. But it seems like we mainly met just to hang out. Hahaha. Doing what?

Speaker A

Well, it was really just purely for fun, like going out for a walk and chatting about random stuff and sometimes having a meal, playing cards or something like that. Right, haha, right. And after he went back, we actually still

Speaker A

often call each other. What did you talk about in the most recent call? I think it was one or two weeks ago. Ah, how did you know?

Speaker A

Uh, probably just every few months. Then we catch up a bit, share recent updates, yeah. Has he tried multiple times to get you to join him?

Speaker A

Uh, hmm, ha, maybe he does, I guess. But I don't think it matters. It doesn't matter, hahaha. Why don't you go?

Speaker A

I think for myself, I haven't figured it out yet. Yeah, I think it's mostly my own reasons. And then I didn't join any Chinese companies either. And I think the main reason is around September or August-September last year.

Speaker A

I think when I left, left Anthropic, and when deciding where to go after leaving, my biggest motivation was I wanted to learn something different. Yeah, for me, I probably didn't consider, no no, more seriously consider being able to lead a project

Speaker A

or lead a project or something. I was more, at that time, more focused on prioritizing learning something new. So that's why I chose to go to Gemini, right? I noticed you two are always being compared and discussed together.

Speaker A

Is it more of a bother or more enjoyable for you? I don't really feel anything about it. And because I'm not really someone who pays attention to social media, so I really don't feel anything about it. Yeah. Because Shunyu

Speaker A

he said last year AI has entered the second half, entered the second half. This became a very famous viewpoint. What do you think of today's AI? What stage is it at?

Speaker A

Can you give it a definition? Yeah, for me, I might not see so clearly what the first half means, what the second half means, or rather, this definition has never been particularly clear to me. For me, AI has indeed entered a stage

Speaker A

where I think everyone has started to worry less about one thing, whether AI can do it, and more about whether the problem itself is well-defined. Yeah, I think this is a huge difference.

Speaker A

For example, I think a year ago or maybe early last year, at that time I was at Anthropic and what everyone was worried about was like, "Hey, OpenAI's reasoning is so strong, do we have a chance to catch up?

Speaker A

And how likely are we to surpass them?" Everyone was still very worried about this.

Speaker A

I think now, at least among, at least among Gemini, OpenAI, and Anthropic, these three, I don't think any of them is really worried about not catching up.

Speaker A

Mm-hmm. And I think what might be harder for everyone now is figuring out what to actually do.

Speaker A

This is something that I think is a bet, but also I think it's also something that requires a lot of human insight. Yeah.

Speaker A

So that also means model capabilities have been leveled out, right? They've become homogenized, commoditized.

Speaker A

So there's not a huge difference between the models. In terms of good versus bad, there's not a huge difference.

Speaker A

But they need to differentiate. I think from the actual user experience, you can feel the differences between these three companies' models.

Speaker A

But the hard part is in the past, you could see this difference on paper too.

Speaker A

What do you mean by "on paper"? "On paper" means, like, publicly available, there are many kinds of benchmarks, these standardized measurement frameworks.

Speaker A

And for example, people used to look at SWE-bench. Yeah, yeah, yeah, you could look at SWE-bench.

Speaker A

And for math, back then people would compare things like simpler ones, AIME, and harder ones like IMO. Back then it felt like you could tell just from the numbers.

Speaker A

"Hey, this model seems stronger at reasoning," "that model seems stronger at coding," "that model is stronger at this." Now, on paper, everyone is actually pretty close.

Speaker A

And when you look at the numbers on paper, like looking at SWE-benc...

Speaker A

A slightly higher number around there or a slightly lower one is mostly just noise It's mainly just noise rather than signal.

Speaker A

Yeah. But on the other hand, in actual usage, people can still experience the differences.

Speaker A

I think, mm-hmm. From what I personally know, Claude is still the more general-purpose in terms of this tool-using agent, the best-performing one.

Speaker A

And in pure coding maybe Codex has caught up a bit recently narrowing the gap a little.

Speaker A

And Gemini might be better at pure reasoning and in some more everyday usage scenarios it might still be better for now.

Speaker A

And then in in coding and agents, it's still in a state of catching up.

Speaker A

Mm-hmm. These capabilities— are they deliberately choosing which direction to prioritize or is it simply a matter of good versus bad?

Speaker A

Is it a capability issue or a prioritization issue? I think there is actually an element of prioritization involved.

Speaker A

Especially in the past, it was mainly about prioritization. When everyone could see the differences on paper, prioritization was definitely the dominant factor.

Speaker A

Because maybe like Claude has always valued this tool-use capability more. Mm-hmm. And including coding.

Speaker A

So maybe OpenAI also placed a lot of emphasis on reasoning for a while. Yeah, and of course now they're starting to focus on coding too.

Speaker A

So back then, prioritization definitely accounted for most of it. Because if you're more willing to prioritize something it means you can spend more effort building the right infrastructure.

Speaker A

the right infrastructure building the right data and especially data it's something that in a sense takes a lot of time and effort right so back then, it was definitely driven by willingness but at this point I think both factors are at play

Speaker A

because well, on paper, everyone looks pretty similar and even if you do some more internal testing the numbers become not that different and then the harder thing becomes how you define your problem define the behavior you want mm-hmm

Speaker A

and when this thing isn't defined very clearly a lot of the model differences actually come from things that you wouldn't even imagine right by 'things you wouldn't imagine,' I mean I mean, things you wouldn't imagine if you ask me now

Speaker A

it's hard for me to give you a very clear answer maybe after some time, looking back I'll be able to give a clear answer but I can give an example of something you wouldn't imagine like, if we go back

Speaker A

maybe one, two, or even three years back then, if you went online to collect pre-training data you'd see models learning to write code of course, there wasn't this agentic way of writing code back then it was just writing a piece of code (mm-hmm)

Speaker A

and you'd find that models wrote code very well but back then, people didn't know why but the unexpected reason behind this might be if you just randomly collect from the web without any data filtering naturally the quality of code data would be a bit higher than others

Speaker A

because if you look at web pages you'll find GitHub's quality is significantly higher than other normal web pages before we get into today's topic I'd like to talk about some recent news about our models you see everyone's been talking about OpenClaw recently

Speaker A

mm-hmm as a frontline researcher what do you think of this new product form what discussions are happening around you what's interesting is I feel like the discussion outside the industry seems more intense than inside the industry oh, no one inside the industry is talking about it?

Speaker A

people inside are talking about it but I think for industry insiders it's not really, um a particularly surprising thing oh, what do you mean like, maybe inside the company some people have already done similar experiments or demos like this

Speaker A

it's just that it wasn't packaged as a product and seriously marketed polished and launched right and of course, the reality is if you look at OpenClaw the earliest version of the code on GitHub actually, that code was in a sense

Speaker A

not particularly clean but I think what's important is it showed everyone this possibility mm-hmm, and after showing this possibility the OpenClaw author himself joined OpenAI and then then probably these model labs or some larger startups will catch up quickly

Speaker A

and polish this into a truly usable product mm-hmm (right), so I understand actually, before OpenClaw was released people at Google were already working on this it just hadn't been released yet because big companies have longer processes right my my my

Speaker A

at least personally that's the impression I've gotten What we're seeing is exactly that. Right.

Speaker A

So behind this product form, similar to OpenClaw, what does that inherently tell us? At this point in early this year, I think, actually, technically speaking, it doesn't really prove much.

Speaker A

I mean, this OpenClaw product, of course it relies on many things the model can do, but those capabilities weren't actually only ready by early this year.

Speaker A

I think maybe last year, like when Opus released 4.5 (Claude series), and then, and then...

Speaker A

of course back then, Opus was actually ahead of OpenAI and Gemini 3 in terms of tool use capabilities.

Speaker A

So I think at that point, doing this thing, it was already something you could demonstrate.

Speaker A

And actually, it didn't blow up immediately upon release. It only went viral some time after the launch.

Speaker A

Hmm. So, for me personally, technically it's not really not really something so surprising. It's a natural overflow of model capabilities.

Speaker A

Right, right, right, I'd say so. But I think the surprise for everyone might be that perhaps nobody had realized this before.

Speaker A

It made everyone realize this could actually be done. Realize what? Realized that you can, like, let the model do very...

Speaker A

I mean, you can control many different models and do many different things, and then aggregate all of that, and after aggregating, do this kind of very, very, very long-horizon task.

Speaker A

This kind of work. I think maybe previously, people hadn't widely reached a consensus on this.

Speaker A

This thing showed everyone this kind of possibility. You see, what went viral early last year was Manus, and what went viral early this year is OpenClaw.

Speaker A

So from Manus to OpenClaw, what changed? Is it a change in model capabilities, or a change in the product?

Speaker A

This is also something I've never really understood. Hmm. Like, What is the qualitative difference between Manus and OpenClaw?

Speaker A

It's something I actually haven't quite figured out myself. To be honest, haha. OK. Hmm like or in other words, maybe OpenClaw went viral, but if you were to ask me retroactively, why Manus couldn't do this, I don't understand why Manus couldn't do it.

Speaker A

Maybe they just didn't get it right. But you see, whether it's Manus or OpenClaw, they both chose to sell.

Speaker A

Manus was sold to Meta (Note: This acquisition has since been revoked; our program was recorded before the revocation).

Speaker A

OpenClaw was sold to OpenAI. What does this phenomenon tell us? Why did they both sell?

Speaker A

I think, hmm, my own feeling is that for something to survive long-term, it still needs to have some moats.

Speaker A

The moat is the model. I think at least for now, many moats are on the model side.

Speaker A

But whether product-side moats will emerge in the future, I think that's hard to say.

Speaker A

Because everyone... This is all an age-old topic in the market. Many people talk about this.

Speaker A

Things like data flywheels and such. For now, I don't think there's any scenario that has truly formed a data flywheel.

Speaker A

Even purely AI-native application scenarios. I think currently, besides agentic coding, other than writing code, there's no scenario that is truly AI-native.

Speaker A

became hugely successful because in a sense chatbots are actually an extension of search A chatbot is an extension of search Right, that's why it's not independent of search It's because because think about it the most common way people interact with chatbots is

Speaker A

I have a question and they ask the chatbot and that's essentially what search has always done But what it offers something far better than search is it becomes very interactive (交互的) It has interactivity You can ask follow-up questions

Speaker A

and it can even help you summarize some of the information you get through it helping you distill it into a condensed answer to your question Right, this is something search could never give you before Mm-hmm (right) But of course it's not exactly the same need

Speaker A

But in terms of demand from a broad demand perspective it's fairly similar to the demand that existed before Manus and OpenClaw I think they're the most famous wrappers right now But wrappers ended up being sold to model companies (Note: Meta's acquisition of Manus was later reversed; our show was recorded before the reversal)

Speaker A

Doesn't that show that wrappers still can't escape the grip of model companies The escape velocity isn't enough It's not fast enough, is it I think I think for wrappers to survive in the current environment there are two approaches I can roughly imagine

Speaker A

One approach is what you just said Escape fast enough That is, my growth is fast enough that by the time model companies catch on I've already captured significant user mindshare And when model companies catch up to your product

Speaker A

by that time I've already evolved my own model I think Cursor is trying to take this path (mm-hmm) So Cursor, in this AI-native scenario is pretty much the fastest-growing startup I can think of Even a company like that

Speaker A

is feeling a strong sense of crisis right now How strong is that sense of crisis Anyway, my feeling is that for Cursor its relationship with Anthropic right now has entered a very delicate phase It's like They used to be close, seamless partners

Speaker A

Anthropic provided the model Cursor provided the product Later Anthropic developed Claude Code itself Claude Code has become very successful And then Cursor is now trying to build its own model So Cursor is working hard training its Composer So

Speaker A

I don't even think we need to talk about the future It's already happening right now They're already in a fairly competitive relationship (mm-hmm) If they lose in this competition I think it would be quite problematic Because when it comes to coding

Speaker A

when it comes to coding at its core it's essentially a professional need serving professional users It's a productivity tool A common scenario with productivity tools is winner takes all I think this applies whether to Cursor or to Anthropic

Speaker A

or for any company doing coding It's probably something they're all quite worried about Mm-hmm (right) So that's what I was saying That's one path (It has to be fast) That is, you grow fast enough You grow like crazy before anyone even thinks about acquiring you

Speaker A

just grow wildly By the time they want to acquire you you're big enough Another way is for the market to be small enough so small that model companies can't even be bothered I think Midjourney is exactly that example

Speaker A

That's it The market is so small that perhaps even though you could say Gemini could make an effort to replicate what Midjourney does It might take some effort some money some data to pull it off but it's small enough

Speaker A

To the point where Gemini probably wouldn't want to spend much time on that It's beneath them Right haha I think that might also be a way to survive Yeah So even Cursor hasn't escaped the model's grasp today Has anyone successfully escaped?

Speaker A

For the big ones, I haven't seen any so far For smaller ones, maybe Midjourney is an example Of course there must be other examples I just haven't seen them yet Right, smaller ones I think there will be There will be examples

Speaker A

Does Lovart count? I think they have a shot They have a shot Anyway, you can't do the general-purpose thing I think this is something the founder has to decide Whether you want to bet on something big with a one-in-ten-thousand chance of survival

Speaker A

and swing for the fences Or go with a one-percent chance of survival and lock down something small first If it were you What would you choose?

Speaker A

Hahahaha If it were me Deep down I'd definitely want to swing for the fences But honestly I genuinely think You can't get there overnight So if it were me I'd choose to secure a small win first But I'd pick a small one with huge upside potential

Speaker A

Why do you think OpenAI acquired OpenClaw? Why did Meta want to buy Manus (Note: Meta's acquisition of Manus was later revoked; our show was recorded before the revocation) Why doesn't Google acquire anyone?

Speaker A

Oh, Google did acquire someone Google bought the Windsurf team Okay Windsurf Yeah I don't get it Haha What do you mean you don't get it?

Speaker A

Honestly, it's just that I don't get it I think I think Meta's acquisition of Manus I think for them The biggest benefit was If aside from how much they spent The biggest benefit was gaining a really strong product team in Asia

Speaker A

What does being in Asia signify? Because I think on one hand Obviously everyone knows China's AI talent pool is still quite deep Although perhaps currently in terms of technology Purely from a technical standpoint Chinese AI hasn't really caught up with the US yet

Speaker A

But Obviously there are many talented AI people in China Whether in pure technology or in product In terms of product, I think China essentially has better talent than the US Right, so for them I think Manus became a

Speaker A

foothold in Singapore So they can attract some For example, from China Or from Singapore or East Asian talent And I actually haven't fully figured out How important this product itself is to Meta Or in other words Why couldn't Meta just build this product themselves?

Speaker A

But whether it's Manus or OpenClaw They were in fact born from outside teams Why weren't they built by this group of Silicon Valley researchers?

Speaker A

Have you thought about that? Yeah, I think Hmm, for me this question Actually I think once a company gets big Its burden gets bigger too Like, I might be a researcher and we can build something really interesting-looking very distinctive products

Speaker A

But once I make that product public There's a ton of responsibility that comes with it First, you can't just launch this product and tell all your users You need to go buy another computer to do this Otherwise it might gain access to everything on your computer

Speaker A

All the permissions— and crash your system Mm-hmm So for a big company Take Google, for example Google would never release a product like this Right? Mm-hmm So it takes a lot of time to polish the product And you have to make sure

Speaker A

there are no legal risks and that it won't damage your brand with users Plus, if you ship it you probably have to allocate some relatively fixed resources to serve this model or serve this product line So yeah, yeah

Speaker A

For big companies I think there's quite a lot of burden But for individuals it doesn't matter I mean, it's an open-source project anyway So what if my code is terrible Come help me write it Right? Hahaha, yeah I think whether it's Manus or OpenClaw

Speaker A

they actually point to a direction which is this is also a possible narrative for 2026 What are your thoughts on 2026 and what are your expectations I think there are really so many possibilities And for me in terms of model capabilities

Speaker A

I think Models— I sometimes really love saying this slogan which is that models should achieve train with finite context, use as infinite context (finite in training, infinite in use) In other words you use this limited this context length (context window) to train it

Speaker A

but in usage, it can use a very, very long even nearly infinite context length I think this has a chance of being realized this year And once this is achieved I think it will unlock many new applications because, to give the simplest example

Speaker A

you could potentially let this model interact with you continuously and continuously receive your information And as it runs it will continuously evaluate the current context and your conversation and possibly discard information it deems unimportant And then it becomes

Speaker A

the personal assistant everyone dreams of Yeah, I think technically speaking I think this will definitely be realized this year no matter what But of course, of course I think what people haven't reached consensus on yet is how to technically achieve this

Speaker A

Mm-hmm Obviously there are many technical paths But I think right now it's more about trying to see which path can work There might be several paths that all work Then we'll have to test them experimentally under common user scenarios

Speaker A

to see which path is the most efficient Yeah, I think we're more at this stage right now rather than a stage where no one has ideas Everyone has ideas but we need to figure out which idea is the right one

Speaker A

Standing here in Q1 2026 as a frontline researcher do you think the pace of model improvement is slowing down I think not at all (not at all) I think not at all How does its velocity curve compare to '25

Speaker A

and what's changed from '24 Mm, it's hard to say quantitatively because you need to give me a standard before I can quantitatively tell you because if the standard you give is, like I just look at some Benchmark like say SWE-bench

Speaker A

how many points it gains each month then this will definitely slow down because by definition this Benchmark maxes out at 100% Mm-hmm so the closer you get the slower it definitely gets but this doesn't necessarily mean that users feel the model's capability growth has slowed

Speaker A

because going from 50% to 60% it might feel like, hey that's a bit better but quite possibly For example, from 70% to 75% It found that the gains are even greater than from 50% to 60% Mm-hmm That's entirely possible

Speaker A

If it's from 80% to 90% Or 90% to 100%, the difference would feel even more significant Not necessarily Because maybe past Maybe around 80% to 90% Users wouldn't notice any difference It might even get worse You said it doesn't get slower at all

Speaker A

Based on what criteria? I think it's based on my personal feeling as a researcher Like My personal impression is the model's ability to learn things is getting stronger and stronger It used to take a lot of effort to get the model to learn to do something

Speaker A

But now it probably doesn't require that much effort The most important thing is you need to clearly define the problem and figure out how to build the right data (Mm-hmm) Of course, data Data is broader now including environments and such

Speaker A

And the rest often seems to fall into place naturally Right Why is the learning ability getting stronger?

Speaker A

The model's learning ability has improved I think maybe on one hand There could be many reasons But I think one reason is pre-training Actually, over the past few months I think it has been getting stronger Pre-training Right right

Speaker A

Model pre-training has actually gotten stronger in the past few months (Mm-hmm) I think this might be somewhat controversial in a sense Because a few months ago I think many people were already discussing whether this Scaling Law had reached its limit

Speaker A

Mm-hmm My experience is that it hasn't And my feeling is in the next four months I don't see any signs of it ending either Mm-hmm Why do people think it's reaching its limit?

Speaker A

I think, well I-I-I obviously don't know why people think it's reached its limit Because I myself don't feel it's reached its limit But my guess would be When someone thinks a pattern has reached its limit it's basically one of these two situations

Speaker A

Ah One situation is they feel the applicable range of this pattern has reached its limit Ah maybe Maybe Fundamentally speaking Scaling Law simply can't extend infinitely which could be true But this is just a guess That is, this person might feel that

Speaker A

the applicable range of this pattern has reached its limit Another possibility is this person feels that this pattern one of its conditions can no longer be met For example, they feel that data has already hit a wall Then I simply haven't extended it further

Speaker A

Another possibility But actually there's a third possibility The third possibility is that there's a bug somewhere in their work that they haven't noticed themselves So they think it's reached its limit Oh From my perspective From my observation I think

Speaker A

probably the vast majority of people who hit a wall it's because of the third reason It's because there's a bug What kind of bug?

Speaker A

I think There are many possible kinds of bugs For example, one possibility is When you're working on Scaling Laws Some scientific assumptions weren't quite right For example, what kind of token horizon you choose That is, for each model size, what kind of

Speaker A

expected training data volume you pick And then this amount of data Where this data comes from And then It's possible that these more scientific choices weren't made clearly That's one possibility But I think there's another possibility, which is

Speaker A

there's simply a bug Actually, I don't think this is surprising in the industry Many times Fixing a single bug The progress it brings is far greater than some fancy tricks Right And of course, there are other situations These two examples I just gave

Speaker A

are situations I've seen quite often So how do you deal with bugs How do you solve bug problems I think, right I feel like this is more of a mindset issue Because when you encounter a bug If you think it can't be fixed

Speaker A

You'll say we've hit a wall When you encounter a bug I think, oh This can definitely be fixed Then you'll feel like we haven't hit a wall yet Because everyone definitely encounters bugs I think, I think This might be like what you said

Speaker A

That is There are some things that are more about belief But for me A more important thing is the working system That is, when something is different from what you predicted Can you systematically rule out various possibilities I think this is a very important thing

Speaker A

Mmhmm This is something I think Gemini and Anthropic do well That is Especially in pre-training That is, when behavior at a certain scale might be different from what you imagined People can design reasonable what we call ablation experiments (消融实验)

Speaker A

reasonable experiments like this can help you see test whether some of your imagined possible factors are actually the real factors I think this systematic approach to problem-solving is the key Mmhmm You think Model capabilities can still improve Then its driving force

Speaker A

Data and compute Algorithms Which do you think is the main driving force I think they all contribute But in a sense Data and compute are two things that are actually very strongly correlated Data and compute, mmhmm Right because

Speaker A

When your compute goes up you'll naturally attract more data When data goes up you'll naturally need more compute Right, and then For algorithms, I think Algorithmic progress often has a phase transition That is, there's a phase where you haven't figured out what to do at all

Speaker A

At that stage, algorithms are extremely critical Because when you haven't figured out what to do at all you might have no way to scale up at all And then you might get stuck there But at a certain point

Speaker A

you might discover the most important thing in the algorithm Then it might suddenly go from completely impossible to possible And then after that, algorithmic improvements are more of a gradual improvement That is It might improve your computational efficiency

Speaker A

or the efficiency of using data Right, and then Let me give an example For example, from the perspective of language model pre-training Then this leap in algorithms Well I mean, the development of the Transformer But after the Transformer was discovered

Speaker A

It's been mostly gradual and smooth Improving its efficiency Or your use of data Or the efficiency of compute usage has been improving Right So the current drivers are compute and data I think within the relatively clear frameworks we have now

Speaker A

The main drivers are compute and data By clear framework, I mean For example, pre-training and post-training Whether it's post-training based on reinforcement learning Or based on supervised learning That is, post-training with supervised learning For example, within these two relatively clear

Speaker A

paradigms (范式) Indeed, compute and data are the main drivers But it's undeniable That in some other directions, the driving factors might be different Hmm, what do you mean?

Speaker A

To give a simple example For instance, multimodal generation Hmm Well I think it's probably something that, algorithmically speaking Hasn't been fully figured out yet So that's still a scientific problem That hasn't been solved yet Right But language is no longer a scientific problem

Speaker A

Natural language generation I think, for now Before this technical approach hits a wall I think it's relatively clear scientifically But in terms of engineering There's still so, so, so much to be done How much more do you think pre-training can improve?

Speaker A

Improving model capabilities through pre-training How much more How much further can it go Can we expect That's just how people are I mean, when you haven't hit the wall, you Don't actually know how long the road is What I can

Speaker A

What I can see is that we haven't hit the wall yet But I don't know when we'll hit it either If I really had to estimate a timeline As I just said I think four months The next four months will still see progress

Speaker A

But in the AI field No one can predict what happens after four months Hmm, so over the past few months When you look at pre-training and model capabilities You're still very excited Is this the general mindset and state around you?

Speaker A

I think so I think so Is this within a small environment at Google Or in the entire Silicon Valley environment I think it's hard to say for all of Silicon Valley Because Silicon Valley is too big a place

Speaker A

People working on products might be excited about products Right, for product people What excites them most might be something like OpenClaw Hmm But for people working on models It's probably That we get more excited about this kind of model progress

Speaker A

Hmm I think Uh For people working on models Is excitement a consensus? Over the past four months I personally think so Oh, I personally think so At least within the circle I have access to I think at Anthropic and Google, people

Speaker A

Or at Gemini, people are probably thinking more about How our AI will keep progressing And soon we'll be replaced After being replaced, what should we do?

Speaker A

Haha, rather than worrying about what to do when models hit a wall Hahaha Speaking of which Why Over the past few months Coding has been developing the fastest Why is this the case?

Speaker A

I think the coding scenario First of all, coding itself Hasn't just been developing the fastest over the past few months I think coding itself Actually From Claude 3.5 (new) Or some people out there called it Claude 3.6 (yeah)

Speaker A

After that It's been in a state of rapid development ever since And I think That was early last year Or the end of the year before That was October of the year before last October of the year before last

Speaker A

Yeah yeah It should be, maybe October or November But around that time From then on I've been in a state of rapid development I think the coding scenario has Two biggest advantages The first advantage is its reward signal (回馈信号)

Speaker A

That is, its feedback signal Is very well-defined Because For example, if you For example, something like a software engineer (软件工程师) task Often the situation is I need to write some code To implement a feature A feature (Yeah) This feature needs certain inputs

Speaker A

And produces certain outputs This is something very easy to Very easy to test So its feedback signal is very clear Your input and output match up Then it means your implementation is successful If not, then it's unsuccessful (Yeah)

Speaker A

But this is just one example In coding-related work There are many, many Many such well-defined feedback signals And another big advantage is Coding data has a very natural foundation That foundation is GitHub GitHub has aggregated over the past few, roughly

Speaker A

Decades A lot of high-quality code written by many excellent programmers And starting from that code You can build a tremendous number of environments I think these two things, from a model perspective Are why coding can be done very well

Speaker A

Of course, I think from a product perspective There's another reason Which is that coding The demand for this product Is in a sense Relatively singular It's not like when you build something like a social media app Or a game

Speaker A

Where everyone might have different tastes And it might be hard To satisfy everyone's needs Then you might need recommendation algorithms But with coding The good thing is that excellent programmers writing code Actually have fairly similar styles What kind of style

Speaker A

Clean and concise Yeah, right, good code is (Not messy) There are some shared standards For example, like you said The code is concise Structurally clear Suitable for future development And has reasonable abstractions And of course many other standards

Speaker A

But I think good programmers tend to have A fairly consensus-driven standard On this matter So from a product perspective It actually makes the coding product much simpler In your current work What percentage of code do you write with Claude Code

Speaker A

How many times more productive does it make you You just asked a question that almost got me fired Google doesn't allow using Claude Code Hahahahaha, oh right I think, for me A conservative estimate Maybe 90% of the code is model-generated

Speaker A

But it might be I need to spend a lot of time reviewing the code To see if it's written appropriately Written reasonably Whether it's really what I wanted it to write And I think after having AI-assisted tools The most important thing about writing code

Speaker A

Has become How you design it How you design the logic of your code And which files it needs to interact with Files to associate with And what things need to be done And you need to give the model

Speaker A

Maybe provide some reasonable context I mean, like For example, this code You can use it as a reference (参考) to take a look Right, actually outputting code I think models are way more capable than humans So for me

Speaker A

If you actually count How many lines of code I wrote by hand How many lines of code the model wrote I'd say conservatively, the model wrote over 90% If not conservative, maybe 99% or 100% The remaining 10% is what it can't write

Speaker A

Or why you didn't let it write Conservatively 90% Giving myself some credit Hahaha I think what it can't write And the part I can write is becoming less and less Less and less and less What was it like in the past

Speaker A

It was what it couldn't write I think Very early on, maybe about a year and a half ago At that time To be honest, on the market Only Claude was able to Actually write this kind of software engineering code

Speaker A

At that time You could still feel many flaws in the model For example Sometimes when it wrote code It would only focus on this one file It wouldn't pay much attention to multiple The relationships between multiple files And if, say, a class

Speaker A

Its definition was buried many layers deep Or it wasn't directly nested in this This direct tree structure The model probably couldn't find it Now I think this is happening less and less Hehe Really less and less As a researcher

Speaker A

Your programming workload How many times that of the past Because from the perspective of writing code It's quite hard to quantify this But if we talk about, say, running experiments And the efficiency of implementing ideas I think compared to a year or even a year and a half ago

Speaker A

It could be 20 or even 50 times faster Right, because models have really become It can be pretty insane You can open several at the same time And you have several ideas And test them simultaneously And sometimes even

Speaker A

The model can help you monitor some experiments Monitor some results and stuff So It's really quite a significant efficiency boost Right but If we talk about personal working hours I feel like it has made my working hours longer

Speaker A

Why is that It's just that Because development speed has increased The more you try, the more you want to try There are more and more ideas to try So it feels like before, you might have had this situation

Speaker A

You have something Like this file You haven't seen it before You might not quite understand it yourself Then you'd definitely have to spend time finding that person And you'd schedule That person, maybe a few hours later But now it's not like that

Speaker A

You just see this file You don't understand it, just ask Claude or Gemini Gemini might tell you the result in five seconds And you just keep going Hahaha, so in terms of working hours I feel like working hours have actually gotten longer

Speaker A

And the intensity has increased too Well, Google isn't that Google anymore Is that so Not that Google where you can coast along Not that work-life balance Google I feel like in the GenAI (生成式人工智能) field No one can just coast along

Speaker A

Hahaha So what hours are you keeping these days I usually start around 9 in the morning Get to the office at 9 At 9 AM, I might first get up and check emails And look at the experiments from the night before

Speaker A

Then get to the office around 10 And then at night If I'm alone in the US I might stay until around 10 or 11 Of course, if my family is here If my wife is here I might go home a bit earlier

Speaker A

But at home I'd be working anyway So I think in the GenAI field No one is just lying around Unless You've completely lost interest in technology And have no ambition for yourself Then no one would care if you just lay there

Speaker A

But I think most people are quite self-driven They just want to do it themselves, right Do you think other fields Will have more of these Claude Code moments Where will the next explosion happen after coding You asked a good question

Speaker A

If I could see it clearly I might have gone out to start a company already Hahahaha Right, but but It's true that besides coding We can already see That many Other directions are already having a big impact But if we only talk about those directions

Speaker A

They might not be a good market direction Because Coding is special in that It itself is A very large market But if you look at some other directions They might not be Such a large market For example Some people say the next direction is

Speaker A

This kind of AI-generated content or something But AI-generated content How big is that market Right I think If you say this content Is for people to consume Then people have limited time No matter how much content you generate

Speaker A

People's time is only 24 hours a day Right Unless it completely replaces people Like replacing TV Then that would be another story Like the Vision Pro that came out before Then that would be another story But that would be

Speaker A

A bigger story So I think Besides coding Everyone is still looking for The next big market And if there is one I think there will be But it's just Not necessarily that big I think the most likely one might be

Speaker A

This kind of interactive education Or maybe You said coding is not a direction for you Because coding itself is already very big Yeah, it's already a huge market Do you think AI researchers How should we treat coding Should we use coding to validate our ideas

Speaker A

Or should we make coding itself the end goal I think there are two types of people One type is They genuinely want to make coding better Another type is They want to use coding As a means to validate AI capabilities

Speaker A

I think both are fine Both directions are fine But I think The people who genuinely want to make coding better They need to think more about products And the people who want to validate AI capabilities They need to think more about

Speaker A

How to build better benchmarks Right I think both directions are very meaningful Just their focus is different Do you think The current state of AI research Is more like A gold rush Or more like A scientific revolution I think it's a bit of both

Speaker A

Like those things AI There are many things that AI actually can't easily do But conversely, humans might do better For example, being a product manager To be honest, I think Being a good product manager Is something I currently can't figure out

Speaker A

How to train AI to do Why is that There's no standard There's no standard (no metric) Like what makes a good product I can't really figure it out There's no very objective standard You have to build it and let people use it

Speaker A

Only then do you know it's good Then everyone will say it's good Right, I think That's something with very unclear feedback signals Then I don't know how to train AI to do that Right When will programmers be completely replaced

Speaker A

Will there be such a day Mm-hmm I think that day will come But it won't come all at once It won't be like programmers are all still there And after one night The next day all programmers are fired

Speaker A

It won't be like that It will definitely be a gradual process But Everyone can already see this gradual process now Because some companies have already started laying people off Right, I think In a sense AI is a In a sense

Speaker A

Of course it's a very good thing But from another perspective It might also be A very unfortunate thing That AI is a very centralized technology It will make a small number of people stronger But will make most people lose

Speaker A

Their unique value Right, so I think For traditional software engineering The final result might be Now 1/1000 of the people do the work of everyone in the past Earning 100 times the current salary Then what advice do you have for programmers

Speaker A

I think Haha, I think maybe Embrace new things I think that's very important I think One very important thing for future programmers might be How to effectively collaborate with AI Mm-hmm like There are many things that AI might do

Speaker A

Not that well Like how to Reasonably design an implementation plan for something And how to design it So that it might align with the company's Future development Those kinds of things You might have a hard time telling a model

Speaker A

To make it understand these things Those things might still need humans to do But maybe things like specific Very specific Like the work many programmers did in the past Where your manager tells you to implement this plan And give it to me by next Friday

Speaker A

I think that kind of work Might not exist in the future Then what kind of programmers would be in that 1/1000 What are their traits 1/1000 is just a figurative number I really don't know if it would be 1/1000

Speaker A

Or 1/10,000 Or 1/100,000 Or maybe 1% Don't be so pessimistic I'm a famous pessimist So don't take it too seriously And I think Good programmers in the future First, technically speaking They will definitely be very strong Because if you're technically weak

Speaker A

There's no reason Why AI can't replace you But being technically strong might not be the only thing It won't be a necessary condition It might be a sufficient condition Another thing I think will be very important is that you have to understand how your part of the work

Speaker A

fits into a large organization or a big company how to how to adapt and integrate into it (Mm-hmm) This might also be an important thing Mm-hmm and And of course there might be many other things For example whether this person's planning ability is strong enough

Speaker A

If their planning ability is strong they can definitely take this big very complex thing and break it down into many relatively smaller things and hand them over to different AIs to do But right now these three abilities seem important

Speaker A

Things that AI might not be able to fully do yet doesn't mean it won't be able to in six months Maybe in six months you come ask me I find that the last thing AI can already do Then only two things remain

Speaker A

Another six months later Maybe the remaining two can also be done Then maybe my answer would become more pessimistic So No one can predict what will happen in six months I can only speak from the current perspective That past Spring Festival

Speaker A

Another thing many people paid attention to was Seedance Will Seedance make Google anxious I think actually Possibly yes But this anxiety Hasn't reached me yet Maybe it gives the Google DeepMind team responsible for multimodal generation some pressure But if you ask me

Speaker A

I think I might not think they have much to be anxious about Like I think It doesn't reflect any paradigm shift More importantly, I think ByteDance whether it's the product effect or possibly in terms of data and such

Speaker A

These details are done very very well I think indeed ByteDance has historically had a relatively strong advantage in multimodal generation But I think at least personally I haven't experienced that it's a paradigm shift Then maybe It's not enough to make everyone very anxious

Speaker A

Right but there is definitely pressure Right Does Seedance's product capability come from model capability Or product capability I haven't worked At ByteDance So I don't know the specific details But if you ask me to guess I think the model probably accounts for the majority

Speaker A

Mm-hmm What does good model capability come from Comes from data Because there probably isn't fundamental innovation in algorithms I think algorithms First of all because multimodal belongs to what we just said, still belongs to that scientific problem Multimodal generation belongs to scientific problems

Speaker A

Right, multimodal generation Still belongs to a relatively scientific problem (Has multimodal understanding been solved) Compared to generation it's definitely more systematic Has a more systematic understanding But compared to text tokens Definitely still not that The paradigm isn't that fixed yet

Speaker A

I think in generation it might be Because it's still something where the paradigm hasn't been fixed Maybe each company uses somewhat different techniques big or small differences And um right now we can mostly just see In terms of effects

Speaker A

Maybe ByteDance and Google DeepMind are In terms of effects The two that do it better Mm-hmm, so it might also come from details Done better Right if you ask me to guess I would guess data Data If you ask me to guess I'd guess data but

Speaker A

I haven't worked at ByteDance either So I'm just guessing blindly haha (Mm-hmm) What do you think about Wu Yonghui going from Google to ByteDance (ByteDance large model team Seed lead) Who am I to judge haha To evaluate Yonghui I think I think

Speaker A

Of course, I haven't worked with Yonghui in the past, so actually I can't really give a very good assessment or an objective evaluation But I think after I joined Gemini, I saw more of Yonghui's good side I think, by looking at him,

Speaker A

sneaking a peek at his past code commits and the projects he's led, my feeling is that he's one of the few people I've met at such a high level and also very senior yet still has very strong technical skills

Speaker A

I think that's extremely rare So I think I think I'm probably not yet at the level to evaluate Yonghui at that level But if you ask me I think Yonghui is extremely strong You say, taking a snapshot in Q1 2026

Speaker A

Do you think the capability gap between Chinese and US models is widening or narrowing?

Speaker A

How far apart are they? I think Um If we take a snapshot right now and look at the development trends over the past year or the past year and a half Obviously the gap between China and the US is getting smaller and smaller

Speaker A

But whether this gap will eventually close completely or even if China surpasses the US I think that's an open question I think for Chinese AI researchers and research institutions, it's also an opportunity And I think one very real thing is

Speaker A

that China is indeed at a significant disadvantage in terms of actual compute resources It's at a big disadvantage But this significant disadvantage may have actually forced out some interesting things For example, Chinese model companies are actually quite good at distilling from others

Speaker A

Right Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them I think distillation itself is actually an open secret But I think there are different ways to approach distillation

Speaker A

There's brute-force distillation and smart distillation two different approaches Um What do you mean by brute-force distillation?

Speaker A

To give the simplest example of brute-force distillation: It's taking a bunch of tokens generated by Claude and forcibly training on them If you do something like this I feel First, it's not very ethical from a business standpoint And intellectually, it's rather foolish

Speaker A

Because the companies doing this essentially demonstrate one thing they don't even know what they want to do The only thing they can do is copy others and make their model look a bit better on the benchmarks Right, but essentially it shows that

Speaker A

they don't even know what they should be doing That's brute-force distillation But actually, distillation also involves some very interesting scientific questions For example, is there a possibility that Just a random example Like, could it be that in my process of generating

Speaker A

my own training data pipeline I use other models as assistants Or the answers generated by my own model use other models as their evaluators This is actually, I think, commercially a bit of a gray area But from a technical perspective, it's quite interesting

Speaker A

Because if you think about it, in a sense Chinese labs may have become pioneers in Multi-Agent (multi-agent) training Oh And it's true Multi-Agent Because if they use models from different companies with these smarter approaches and integrate them into a single training system

Speaker A

each model's distribution might be very different The distribution of their language is very different This is true Multi-Agent It might be more so than for example, using several Geminis together It's something more technically interesting So I think, for me, the distillation of intelligence

Speaker A

I don't know, commercially whether it'll end up being clearly wrong or clearly right But technically it's actually quite interesting Which companies are you referring to with these two types of distillation?

Speaker A

Can we bleep out the names in post-production? (Sure) Hahahaha First of all, I haven't worked in a Chinese lab（实验室） So I don't know exactly who But my feeling is XXX probably used hard distillation And XXX might have done hard distillation before

Speaker A

But later they probably gradually tried to shift toward soft distillation I think it's fairly obvious The one that probably distills less is ByteDance I feel like ByteDance's model is still quite distinctive Hmm, what makes it distinctive?

Speaker A

For example, this model How smart would you say it is? I think Doubao is definitely not as smart as Gemini or Claude But first of all, Doubao For example, Doubao's voice generation is extremely good Wait, is that difficult?

Speaker A

Technically, Doubao is indeed the best at it Because I find that for life questions I just want to ask Doubao Because it's so fast But other models Why don't they optimize this product feature?

Speaker A

I think it still has to do with their user base In the US I think people are more focused on how to improve work efficiency Don't you have life questions?

Speaker A

I do in my life First of all I personally am indeed pretty boring in my personal life So I don't have many interesting life dilemmas to ask Doubao The questions I have more often in life are all technical ones

Speaker A

Asking a smart model like Gemini is the best Hahahaha Right I don't have this urge to open Doubao at midnight for late-night emotional support It's not just emotions, but many things Like when you're cooking Hmm You might run into some problem

Speaker A

You might need someone to tell you right away But you don't have such a person Hmm those I think it's probably more of a data issue And probably for US companies the main priority right now is intelligence or work efficiency

Speaker A

Someday in the future Will it become these daily matters? I think it's possible The fact is If you ask about these daily topics actually you'll find that Gemini from generation to generation does better and better Hmm Actually, many of my friends

Speaker A

including myself in the past When I was at Anthropic before I might ask Claude to write code But for daily lookups I would ask Gemini, right Have you used Doubao?

Speaker A

I've actually only used it once or twice I noticed you guys don't really use it much Hmm, first of all Is it a pecking order thing?

Speaker A

(There's an intelligence pecking order) Hahaha, no no, not that serious I just think first of all It's like people in China trying to use American models There are some complicated things involved Oh Me using Chinese models in the US

Speaker A

is actually quite complicated too Second, I simply don't have the motivation for it Especially since I think in my life Work is work When I'm relaxing, I just find different work to do So for me My best companions are Claude and Gemini

Speaker A

But it might not be like that for others So it might just be my personal thing The one or two times I used Doubao myself It was because someone showed me the Doubao phone Hahahaha right So what do you think of the Doubao phone?

Speaker A

I think it's a great idea Personally, in terms of results They actually did a pretty good job Of course, what I don't know is Technically, how well optimized it is I mean, it I think it executes some tasks in real time

Speaker A

From a results perspective, there's no problem But I don't know how much overhead it has If that overhead is very, very large Then it's probably a technical issue that needs to be solved. Mm-hmm.

Speaker A

Because you don't want, you know Your model to book a high-speed train ticket for you And end up costing more than the ticket itself That would definitely be unacceptable Right so Technically speaking I personally don't know how mature it is

Speaker A

And from a product perspective For everyone, it's still quite Can't say surprising But it's something that gets people pretty excited And I think Apple probably wanted to do something like this before It's just that Apple's own models haven't been that great

Speaker A

Apple doesn't seem to care much about its AI strategy Now, I think Apple definitely cares about AI strategy Because Siri, the phone assistant Was in Apple's product launches A very, very important highlight But their own models didn't catch up

Speaker A

Now they might be trying to do this through a partnership with Gemini To try to make it happen As for whether they care about it now First of all, I don't know If you ask me to guess, I'd definitely say they care

Speaker A

But if you ask me to explain Why from the outside it doesn't look like they care that much My only guess is that If from the outside it looks like you care a lot And you still can't pull it off

Speaker A

Then you just look stupid Ah Saving face Ah, right, hahaha (I don't care) Then let's talk about Doubao's model You just said Doubao's model is quite distinctive Can you be more specific?

Speaker A

One is that its voice is really well done That's the first point I think the voice is really well done It's the most distinctive thing I can feel I mean, I think the voice quality might be To put it politely, probably one of the best in the world

Speaker A

To put it bluntly I think it's simply the best in the world Mm. Is that hard?

Speaker A

Mm I haven't gotten to that level myself So I don't know if it's hard or not But I think it might be something that takes a lot of effort Whether in terms of data or various optimizations Is it a product thing or a model thing?

Speaker A

It has to be a model thing It might also include some product aspects But it's definitely a model thing Right. And then I think that's one aspect And on the other hand On the other hand, I don't have that much personal experience

Speaker A

Because I haven't actually used it that much So it's probably more from Feedback from friends and family That is Hey, this Doubao model is just fun to talk It's just fun to chat with Haha right But I think that

Speaker A

Is more of some subjective feedback I think one is the voice And another is that it Generates very fast, which is also very important Because many models Are showing you their chain of thought But I'm talking about trivial things in your daily life

Speaker A

I don't want to see its chain of thought Right. I don't think this is technically difficult It's just that maybe People haven't spent more time on it yet On this And the fact is If you try Gemini 2.5 Pro and Gemini 2.5 Flash

Speaker A

You'll find Gemini 2.5 Flash When completing the same problem It's already much faster than before And much less fluff So I don't think this is a Mm-hmm, in my view it's not a technical difficulty It's more about when to pay attention to it

Speaker A

And do something about it I think maybe it's now Right now these American companies Are all still in the stage of Working hard to push the upper limits of intelligence forward And ByteDance Of course it's also pushing the upper limits

Speaker A

But I think It might just be doing very well in user optimization too Also doing quite well Recently there's another topic That Chinese robots are very hot right now At the Spring Festival Gala I don't know if you have any observations about this

Speaker A

I've watched some performances Also searched for some prices on Amazon I was really surprised they're so cheap Haha, did you buy one No haha I wouldn't have any use for it even if I bought one But indeed I used to

Speaker A

I don't know, in my mind I thought humanoid robots And Of course at the software level there's nothing really But mainly hardware I thought for hardware to be this mature It would probably cost something like Several million dollars or something

Speaker A

But it seems when I checked The price is much cheaper than that I think this still reflects China's hardware industry chain Still has a lot of advantages But I Don't really know if it As a As a robot

Speaker A

In terms of hardware I think it's indeed very very strong And from the software perspective I haven't quite figured it out I think robot models Are also something with relatively large disagreement right now Right What do you mean

Speaker A

What I mean is I think robot models are probably more in the Feature engineering era Like you have a given environment A given scenario You optimize for that scenario People know how to do that Mm-hmm but doing RL

Speaker A

Doing reinforcement learning Doing reinforcement learning Building appropriate virtual environments Still virtual This kind of This kind of data Then you do training Can improve But it doesn't have strong generalization I think this is Whether there is generalization Is actually a watershed for many AI directions

Speaker A

A deterministic scenario A very single scenario Can you do this well This wasn't solved just in recent years It could be done more than ten years ago Like language is also language In this era before Transformer-like architectures It wasn't completely impossible

Speaker A

Right, back then You could also train very strong models to do translation Mm-hmm You could train a very strong model To do semantic analysis But what you couldn't do is I can improve all abilities across the board By improving at one level

Speaker A

Mm-hmm I think this is a watershed And I think language models After Transformer and GPT Entered that kind of stage Crossed a threshold Where you can improve all abilities by improving at one level And you might train at one point

Speaker A

It will abstract this ability And generalize it to all related things But I think robots haven't reached that stage More still before that stage Where I have a single scenario A single thing Then I can optimize for that

Speaker A

So what do you think About these robotics teams in Silicon Valley And there are also a lot of robotics people inside Gemini Mm What do you think That direction is a bit...

Speaker A

What would you call it Is it a sub-direction of yours Or a parallel direction Or what I think In the past, it was quite a parallel direction But now, for robotics I think people are also trying To see if they can leverage language models

Speaker A

As a base model And then train something like For example, VLA (Vision-Language-Action model) Especially multimodal models Right, right, right, and um So now It has become something closely related To the language model track Mm And personally, my feeling is

Speaker A

They will become very important in the future But they haven't found their own path yet But what they're doing is really interesting I highly recommend everyone go check out Robotics labs They're way more interesting than language model labs

Speaker A

Language model labs Feel like normal offices But robotics labs, they really Have people controlling these robots Collecting all kinds of data And watching the robot in like Shelves picking up all sorts of items and stuff Doing things like that

Speaker A

I think it's a very interesting thing Which one did you go to Ah, I went to Wait, Gemini's own lab No, not Gemini Google DeepMind's own lab I've been to see it And also that Dyna I've also been to see

Speaker A

They have a clothes-folding robot Right, their scenario might be a bit more narrow Like folding clothes Is one robot, maybe doing some other things Like pouring water and stuff Right, like that Your intuitive feeling Where does robotics progress compare to in LLM years

Speaker A

It hasn't reached the GPT-1 moment yet, right Definitely not I think it definitely hasn't, right Mm It's like everyone still hasn't Figured out how to scale up I think for me Whether it's robotics or multimodal generation Neither has reached that point

Speaker A

Then let's get into today's main topic We're still very interested in you And chat about How you went from someone who studied physics Into the world of AI Mm Where did you grow up How did you grow up

Speaker A

I I was born in Ningxia In a very, very small city Called Dawukou See, that confused expression of yours Already shows how small this city is Mm This city existed in the past because of a coal mine Also because of Shitanjing

Speaker A

A coal mine And then this city came into being Right, so I was born there But I Went to Shanghai with my parents during elementary school And so The latter half of elementary school and my middle and high school were in Shanghai

Speaker A

Then I went to Beijing for undergrad What I just mentioned Undergrad in Beijing Then PhD in the US Right You had good grades since you were young, right You got into university through physics competition And studied theoretical physics at Tsinghua and Stanford

Speaker A

Right, I didn't get in through physics competition Hahaha I think I was quite mediocre when I was young Hahahaha Ah first of all The middle school and elementary school I attended were both nobodies Hahaha I think I The middle school I attended at that time, competitions

Speaker A

Were not something you should consider It was that kind of middle school Called Shangnan Middle School East Campus Another school that makes everyone confused A school that leaves people baffled Okay, since we're here, which elementary school was it

Speaker A

What was the elementary school called (Dezhou Second Village Elementary School) My context management ability is too strong I can't even remember what it's called actually Hahaha mm-hmm right And right It was that middle school It was um In a small environment within one class

Speaker A

There were still some classmates who wanted to do things properly But overall I think that middle school was in a relatively laid-back state Right and I think maybe my grades were okay (What do you mean by okay) Okay means at that time the situation was

Speaker A

Shanghai high schools had so-called At that time there were so-called four top schools Like Shanghai High School Then Hua Er Jiao Tong and Fudan affiliated high schools Right And at that time the situation was I could get into these four schools

Speaker A

But couldn't get into the best classes in these four schools But at that time I really wanted to do competitions Because I had never done competitions before You started competitions in middle school I didn't do competitions in middle school

Speaker A

Oh, I never did competitions in middle school Why did you want to do competitions if you never did them Because I never did them So I wanted to do them How did you get that idea (Hahaha, that's just how I am)

Speaker A

My personality is I always love doing things I'm not good at Hahahaha right And at that time I hadn't done competitions But I knew about them So I felt that compulsory education Not compulsory education, but before going to college I should give it a try

Speaker A

So but then My grades weren't good enough for that So Going to the four top schools, the best four schools I couldn't get into their competition classes Then I discovered there was a slightly worse school That school was Gezhi High School

Speaker A

A slightly worse school But that school had a competition class And I felt this competition class In today's terms it's an underdog Hahahaha Impressive In the words of that time, I felt like the barefoot aren't afraid of those wearing shoes

Speaker A

Hahahaha I think, mm-hmm Worth a shot So actually at that time, back then At that time Shanghai still had this so-called early admission system Where before the high school entrance exam You could sign a contract with a school

Speaker A

And then you would reserve a spot at that school in advance And then go directly there And then it was very natural to go And then go do competition high school So you were actually between the regular classes of Shanghai's four top schools

Speaker A

And the competition class of Gezhi High School Without hesitation Chose Gezhi High School's competition class Of course I can't say I can't say that when I made the choice Getting into the best four high schools Was a sure thing

Speaker A

Although my score was indeed enough later At that time the high school entrance exam hadn't happened yet Right right but at that time I felt Even if I could get in I should go to an underdog place and take a gamble

Speaker A

Why Because I wanted to do this What was your purpose for wanting to do competitions I think the main thing at that time was wanting to experience it I felt I hadn't done it I had to find an opportunity to do it

Speaker A

Why did you have to do it First, I felt it was indeed difficult Ah, it was indeed more There was just this excitement about difficulty Right It's indeed At least at that time Before I started The impression everyone gave me was

Speaker A

That this thing was much more challenging Than the stuff you learn without doing competitions Mm-hmm The people who do this seem really strong If you don't do it you're just the smoothest stone Among all the mediocre rocks So at that time I felt I should do it

Speaker A

So I went and did it Of course doing it actually brought some benefits Looking back later If I hadn't done competitions at that time I probably wouldn't have gotten into Tsinghua Oh, did you get bonus points or something

Speaker A

At that time actually The competition direct admission system had already declined significantly Only those who made the national training team could get direct admission My high school Anyway I think I wasn't at the level of making the national training team

Speaker A

So let's not talk about that But before taking the senior year competition exam By a twist of fate I went to Tsinghua for a summer camp And by a twist of fate on the last day of the summer camp

Speaker A

I heard they were doing Independent enrollment But mainly aimed at Beijing students I frantically texted the admissions office teacher Saying I wanted to take the exam with them He agreed And then he agreed to let us take the exam

Speaker A

You all or just you Just agreed Me And the few people from our high school who went together Those high school classmates from Shanghai who went to that summer camp Oh what reason did you use to convince him to text him

Speaker A

I've forgotten the specifics of that text But the general idea of that text was You give Beijing students the exam Why not give Shanghai students the exam Oh, you were quite righteous about it Did you think they were playing favorites at that time

Speaker A

I didn't think they were playing favorites I just felt they had this opportunity Why not give it to us Everyone's competing on the same playing field You were classmates at that time And so I sent this message And they actually let us take the exam

Speaker A

How many people I can't quite remember Maybe from Shanghai There were probably about seven or eight people in that exam room You sent that text Maybe Maybe other high schools had other students who sent texts too But from our high school I was the one who sent it

Speaker A

Oh so They were all Shanghai high schools Students who went to Beijing for that summer camp Students who attended the summer camp Students who attended the summer camp And then they let us take the exam And then we signed

Speaker A

That easy to talk to Right, so what I learned from that incident The most important life lesson is Be bold Haha If you don't fight for it you'll never get it Even if you fight for it you might not get it

Speaker A

But if you don't fight for it you definitely won't get it Were you nervous when you sent that text You were still in high school I can't remember anymore At that time I felt Was this a very bold thing for me

Speaker A

No, at that time I was completely thinking I have to fight for it now If I don't fight for it today I won't be able to fight for it tomorrow haha Like The day I heard about it I immediately started frantically texting

Speaker A

Frantically texting who Texting the admissions office That Tsinghua admissions office teacher Texting one person or multiple people Can't remember, probably one teacher Did he reply quickly Mm-hmm mm-hmm I think Tsinghua Just said yes I don't know if they discussed it among themselves

Speaker A

But anyway in the end they said they agreed And then we took the exam together Right So I so I Why do I feel like I've always had quite a soft spot for Tsinghua I just feel that this school is willing to give people opportunities

Speaker A

to provide equal opportunities for everyone How did you do on that exam? Well, when I came out, I felt like I totally bombed it Because I couldn't solve half a problem But later I found out others missed even more

Speaker A

So I did get in after all Hahaha yeah exactly How many of your Shanghai classmates got in that year?

Speaker A

Ah, I think two Independent recruitment Was it a score reduction or something? It lowered the cutoff to the first-tier university line Lowered to the first-tier line Oh So how did you do on the gaokao?

Speaker A

Later, sure enough, my gaokao wasn't high enough for Tsinghua But I could get into any school except Tsinghua and Peking University Oh So why Online it says you were recommended for admission I think it's just that people who didn't go to school during those years find it hard

Speaker A

hard to really understand what happened back then Because two cohorts before mine you could still get recommended admission with a provincial first prize A provincial first prize got you recommended admission What about your time?

Speaker A

In our time, with a provincial first prize you made the provincial team then represented the provincial team at the national competition and only by making the national training team could you get recommended admission I made the provincial team and went to the national competition

Speaker A

But I didn't make the national training team Right So in my year, I didn't have a recommended admission slot Oh Were you good at competitions?

Speaker A

I think I was pretty mediocre Like Isn't not being the best basically the same as being mediocre?

Speaker A

And I obviously wasn't the best So I was just mediocre What was your family's attitude toward you doing competitions?

Speaker A

What was their attitude? The best thing about my parents is they didn't really interfere much They may have tried to control me at some point but later found they couldn't Oh, how so?

Speaker A

I just didn't listen to them Oh I think most Chinese families it's already considered pretty good when kids discuss things with their parents I usually just informed them Haha, informed them of what?

Speaker A

Informed them, oh, I'm going to the independent recruitment exam Yeah and Including filling out applications for high school and college My parents might not have even seen my application forms Oh, they're pretty laid-back, huh?

Speaker A

I think they just when you can't understand what someone is doing the best thing is to not meddle I think my parents understood this very well Yeah hahaha So you're pretty rebellious, huh?

Speaker A

I think I am Pretty My personality is I really care about what I want to do If it's something I've figured out I want to do Don't try to stop me And I'll definitely do my absolute best But if it's something I don't want to do

Speaker A

Forcing me won't help, I won't do it. Right Are you very competitive? Pretty strong Yeah, but I think I'm more competing with myself pushing myself, I guess Not really willing to compete with others Oh right Of course, if

Speaker A

well it's something I think is important and you also think it's important then I definitely have to outdo you, hehe So then you got to Tsinghua, that was even more amazing You studied quantum physics, why?

Speaker A

Yeah, I was doing condensed matter theory at the time Why did you choose this major?

Speaker A

A twist of fate Looking back now Of course I can come up with some very reasonable-sounding explanations But honestly, going back to that time I think it was just a twist of fate So at that time we were in the Jixian class

Speaker A

And the Jixian class had a very good tradition First of all, although the Jixian class was in the physics department It didn't restrict what students could do Actually 2/3 of the students in the Jixian class wouldn't do physics

Speaker A

Ah And for Why did you enter this class Uh At that time the entire Tsinghua physics department was Jixian class Maybe not anymore now Anyway it was at that time And another good tradition it had was It encouraged students to learn through practice

Speaker A

So it encouraged students To enter research labs as early as possible And learn through research And at that time I really wanted to do theory Was it because you found it difficult It feels like you have a fascination with difficulty

Speaker A

Maybe it's also a kind of illness I can talk more about this later What are the bad consequences of this illness Hahaha Right and then then right Then I wanted to do theory And of course the Jixian class

Speaker A

Or what we call the Xuetang class Had a smaller class And then the Teacher recommended saying hey The Institute for Advanced Study is a great place Tsinghua Institute for Advanced Study The research institute founded by Mr. Chen-Ning Yang

Speaker A

Is a great place So I went there to find a teacher And there happened to be A teacher who was still young at that time called Called Wang Zhong, he was my undergraduate teacher Mm-hmm, at that time he didn't have many students either

Speaker A

And we chatted Of course I knew nothing But he was quite patient And gave me Gave me some papers to read And after reading I discussed with him Later I discovered condensed matter theory Especially the project we were doing at that time

Speaker A

Was related to topological insulators And these kinds of directions Actually Was a direction very suitable for undergraduates to get started with It didn't require too much background knowledge You only needed to know The most basic thing is you need to know quantum mechanics

Speaker A

Statistical mechanics Solid state physics Which are actually very very easy to learn Basic knowledge But it might really test The depth of your understanding of this knowledge So for undergraduates It's actually a particularly good direction Where you can get started quickly

Speaker A

And do some actual projects And then we did some work together Among which possibly The work in open quantum systems Looking back now is still quite important work Right and then In a sense I think looking back now

Speaker A

Doing that work Doing research during that period Is actually very very similar to doing AI now It's more that you have an idea You have an understanding And at that stage you can You can do a numerical experiment

Speaker A

To verify whether your idea and understanding are correct You find AI is actually the same AI is also you have an idea You have an understanding You design some experiments To verify whether your understanding is correct And then you design some model

Speaker A

Training pipeline To implement your ideas Right so actually these two are very similar Can you talk about your non-Hermitian system research Ah, I can talk about it I'll try to speak in human terms But it's also possible I'll actually be talking nonsense

Speaker A

So those who don't want to listen can skip ahead Hahahaha Slide the progress bar You can set two markers on the progress bar Right and then right Non-Hermitian systems are like this One of the most basic assumptions of quantum mechanics is

Speaker A

An isolated system Its evolution is described by unitary evolution Unitary evolution is kind of nonsense Sorry What unitary evolution means is It's a linear process And this linear process Can be described by an operator Called the Hamiltonian Ah, the Hamiltonian, in a certain sense

Speaker A

It's somewhat like the energy of the system But not exactly It's somewhat analogous to So It determines how the system evolves over time And if it's An isolated system This Hamiltonian will be a Hermitian matrix A Hermitian matrix is one where you transpose it

Speaker A

And then take the complex conjugate And it's the same as the original But real systems The vast majority are not isolated systems For example, you Me, as a human being Definitely have to exchange information with the outside world

Speaker A

And exchange matter Materials are the same If you put a piece of material there Unless you create an extremely high vacuum You always have to interact with the substrate You have to exchange with the external environment So real systems

Speaker A

Are mostly not isolated systems And isolated systems Won't be described by a unitary process And the corresponding Hamiltonian Won't be Hermitian either Hamiltonian That's where the term 'non-Hermitian' comes from It's essentially for studying open quantum systems Quantum systems that exchange with the outside world

Speaker A

Their behavior And at that time, something very puzzling was discovered We were initially trying to study Some topological phenomena in these open quantum systems And then we found The theoretical results from hand calculations Just couldn't match the numerical results no matter what

Speaker A

More precisely The hand calculation result Assumed the system Had periodic boundary conditions For example, on a ring Or on the surface of a torus And numerically Because it's closer to the actual situation It would calculate with open boundaries

Speaker A

For example, the behavior of a material in a square shape And these two results just couldn't be reconciled So we tried to understand this And later found The basic paradigm people used to describe Hermitian systems A fundamental paradigm

Speaker A

Is the so-called Bloch wave Which assumes the eigenstates of the system are Linear combinations of waves This Sine and cosine waves, that kind of thing Linear combinations of such waves This assumption This assumption In non-Hermitian systems, it actually

Speaker A

breaks down — it becomes wrong The fact is Later we found In non-Hermitian systems Actually, the energy eigenstates All Can potentially accumulate at one edge of the system Right, and then we systematically established this Set of descriptive methods

Speaker A

And then built a framework To describe a non-Hermitian system with open boundaries How to describe its eigenstates And thereby describe its time evolution And some dynamics So That was the work at that time And later there was a lot of

Speaker A

Because it was actually a A paradigm shift So later there was a lot of Follow-up work But later I actually switched directions So I didn't continue much in this direction Why didn't you continue with it It's hard to catch a paradigm shift, isn't it

Speaker A

It's hard to catch a paradigm shift Yes yes This is the weakness of human nature I feel like I always love challenging myself with things I don't know Hahaha especially at that time Just I don't know what I was feeling in that direction

Speaker A

Maybe looking back at that work a few years later It would become the most important work in that direction Later when you do some more work It might indeed make you more famous Get more citations Write more good journal articles

Speaker A

Find a good faculty position But it feels like for a scientific career It wouldn't be that exciting So at that time I wanted to switch to something else Switch to something I wasn't good at Do it right And then

Speaker A

So when doing my PhD I switched directions To do high energy theory High energy theory, right High energy physics, right So your undergraduate and PhD were also different Also different It's not just jumping from physics to AI Actually your undergraduate and PhD both look like physics

Speaker A

But the directions had already changed significantly Right, two directions with almost no connection Oh, that's quite amazing Including your choice of competitions Going to Gezhi High School was also quite amazing Right What kind of human nature is this

Speaker A

I think it's just To put it badly, I love torturing myself Hahaha, to put it nicely, challenging myself Hahaha Mm-hmm, are you happy being tortured I think if someone tortures themselves just for the sake of being tortured Then that person has psychological issues

Speaker A

But If a person is being tortured in order to learn more things And enrich their experiences and abilities I think it's worth it Your undergraduate teacher Teacher Wang Zhong was also an underdog, right Does he count No hahaha

Speaker A

He was doing quite well How can you say that about him haha (At that time) I just said he was very young No no no, he was very young But he My impression of him has always been He is a very sharp person

Speaker A

Very capable of seeing problems Trying to understand problems Understanding them very clearly Indeed he might not be like many teachers who are Very famous In society or very dazzling At least not at that time Now he's very famous

Speaker A

At that time he wasn't that famous yet But I think in terms of ability I think he's very strong Right, and actually he started out When he was doing his PhD he studied with Teacher Shoucheng Teacher Zhang Shoucheng

Speaker A

So People who can be chosen by Teacher Shoucheng Basically won't be too bad Mm-hmm Did he say anything about you changing directions for your PhD He didn't say anything I think he is He is someone who doesn't like to interfere with others

Speaker A

Hahahaha I don't know what he was thinking inside But I think He is someone who doesn't like to interfere with others Eh, quantum physics What kind of worldview is it as a whole It and, um I think I think the biggest difference is I think, um

Speaker A

There are many Many differences from classical physics But I think They are two corresponding concepts, right Classical physics and quantum physics They are theories at different energy and time Or spatial scales That is, essentially our world is all quantum

Speaker A

Of course right now We don't know what exists at smaller scales Right, like At smaller scales There are many different ideas For example, string theory is an idea And then look at other ideas Quantum gravity is also an idea, things like that

Speaker A

Right, but none of those can be verified Verified The effective theory at the smallest scales is quantum physics The tiniest, tiniest scales That can be experimentally verified The effective theory at the smallest scales is quantum Of course, this includes quantum mechanics and quantum field theory

Speaker A

And classical physics is When the spatial scale you're looking at and Is relatively large This quantum physics Will gradually, gradually reduce to classical physics Actually, it's more about at different scales Having different effective theories This, this thing Is actually a very profound idea in physics

Speaker A

It's what's called the renormalization group What the renormalization group says Is that The theory describing a system At different energy scales May look completely different Right, and even if they may ultimately, at the root Are all a grand unified theory

Speaker A

Of course, right now There isn't really a true grand unified theory If one exists Even if they share the same root at the origin But at different scales They may also look completely different So classical physics and Quantum physics

Speaker A

Are more like two descriptions at different scales Speaking of quantum physics There are several terms that seem related For example, the butterfly effect For example, quantum entanglement Can you talk about these I think this is something everyone can understand

Speaker A

And I don't know physics either Don't blame me, everyone I don't know quantum physics either Right, I think Quantum entanglement Is indeed something relatively well-known And quite unique to quantum physics And then it's very simple It's like, say I have two particles

Speaker A

For example, they're in an entangled state And then maybe they're actually very far apart But actually If I perform some measurement on one of them Or perturbation It will also affect the state of the other This is real

Speaker A

This is real, right What kinds of things have quantum entanglement What kinds of two objects, there are many There are many Just, there are many Actual situations It's actually When you look closely enough, enough, enough At a small enough, microscopic scale

Speaker A

The vast majority of particles may be in entangled states But practically speaking You can For example, create one spin and another spin First bring them together Then collapse them into an entangled state Then you can pull one of them very far away

Speaker A

Then it becomes an entanglement A state entangled over a long distance And I think even, I remember a few years ago There were people who specifically did experiments Putting a bacterium and some other thing Into a quantum entangled state

Speaker A

What do you mean by prepare Into a quantum entangled state This can be manually operated This is something that can be manually operated Why, how do you operate it Generally speaking It's through some Some measurements and the action of evolution operators

Speaker A

Can put it Into this state But the hard part here Is actually how to implement this experimentally This process You can imagine It's like you perform some quantum measurements And some, some so-called quantum gate operations Actually It's quite difficult

Speaker A

Which brings us back to the question just now That every system is actually not isolated You might have these two spins And you think, hey If I prepare them this way Don't I get an entangled state?

Speaker A

Then I just separate them and I'm done But the real problem is These two particles actually live in our world Other particles constantly Bump into them Or external heat disturbs them a bit And the state is gone just like that

Speaker A

So the hard part is How to actually implement this process experimentally Right, and then Another example of entanglement might be more well-known I should actually mention that example Which is Schrödinger's cat Schrödinger's cat That's a much more famous example

Speaker A

It says its state is actually a superposition Of a radioactive source emitting a particle And the cat being dead That's one state The other state is the radioactive source not emitting a particle And the cat being alive, a superposition of these two

Speaker A

So for example If you measure that radioactive source And find that it emitted a particle You know the cat is dead No matter how far apart the cat and the source are Right, so that's entanglement But the butterfly effect is a

Speaker A

Is a different thing And the butterfly effect Well the famous part of the butterfly effect Is actually from classical physics What people hear about in classical physics The butterfly effect is that famous example Where maybe a butterfly in South America

Speaker A

Flaps its wings Half a month later A typhoon hits North America But from a more mathematical formulation It says that at time At the initial moment If you make a very tiny perturbation And then measure the impact of this perturbation

Speaker A

How large it becomes in the future You'll find That this perturbation grows exponentially Right, that's mathematically A description of the classical butterfly effect But something people were puzzled about before Is how could this phenomenon exist in quantum systems

Speaker A

Because as we just said, isolated An isolated quantum system undergoes unitary evolution It's a very linear process So in a certain sense If you have one state That is, one vector and another vector With not too large an angle between them initially

Speaker A

Then after some evolution This angle shouldn't change And so there should always exist This situation where initial states are Very slightly different And in the future, bam, it grows exponentially That seems from quantum mechanics, like Something unlikely to happen

Speaker A

But as we just said Our world is actually quantum at the microscopic level And becomes classical at the macroscopic level But they're part of the same continuum How can one have it and not the other That's what people were trying to understand

Speaker A

And of course Later people gained a better understanding Which is that actually When discussing the butterfly effect in quantum systems You shouldn't discuss the change between two states This change Instead you should discuss something Called local observable（局域可观测量） That is, the change in local observables

Speaker A

That actually corresponds to what you see In classical physics, those changes So after four years of studying quantum physics What were you thinking at the time What do you think physics helped you with When you were about to graduate as a senior

Speaker A

I think the biggest benefit of studying physics as an undergraduate Is first of all Think things through clearly Reading isn't about reading a lot But about reading deeply Reading a lot doesn't mean you can discover new things But if you have

Speaker A

A perspective different from others on something That's what's more valuable To society This one thing And another thing is don't trust theory too much Don't trust pure theory too much Because I came to this conclusion Because the main reason that discovery happened at that time

Speaker A

Was because we could do numerics It started because numerics and theory didn't match Then we carefully studied that problem And discovered this thing Then why did you go study high energy physics for your PhD That's also a theory

Speaker A

This brings us back to the topic we just discussed That always loving to challenge very difficult things Sometimes also brings some bad results What bad results For example I feel like I think my PhD, for myself personally I learned a lot

Speaker A

Grew a lot But for this world It didn't produce any contribution Haha, this high energy theory direction It's difficult enough Very very difficult And um But the bad thing about it is It's actually not particularly verifiable There are no objective evaluation criteria

Speaker A

Because High energy theory has developed to the point where Experiments completely can't catch up at this stage Experiments completely can't catch up to what you're discussing in theory Whether it's energy scales Or these microscopic scales Right How does it progress

Speaker A

What does its progress depend on If not experiments One source of progress Comes from mathematical self-consistency Mm-hmm, like for example You propose a framework To describe these things Then can you be self-consistent with existing Already verified theories at lower energy scales

Speaker A

Like for example You study string theory Then naturally the question everyone asks is Can string theory at low energy Return to quantum field theory And then return to classical physics Then this self-consistency is one criterion I think this is very reasonable

Speaker A

A very scientific thing Of course there are also some unscientific factors That when this field completely lacks experiments And objective standards There definitely won't be just one framework that appears There definitely won't be just one self-consistent framework that appears

Speaker A

At this time who does well Who doesn't do well Actually depends on The subjective judgments of some old-timers in the field Did someone hurt you I wasn't hurt by anyone It's just that the longer I stayed in that field

Speaker A

The more I felt this thing was stupid, like A person's life isn't that long Why waste your own time Serving old-timers Right So it feels like spending 5 years learning a lot of knowledge Buying a big lesson This lesson is

Speaker A

This big lesson is to (do experiments) Hey, it's about doing Things with relatively objective evaluation criteria Mm-hmm, or from another perspective Or from another perspective Like Do things that can have an impact on this world So actually your undergraduate went relatively smoothly, right

Speaker A

In the quantum physics research field Very quickly You very quickly had very good academic results And it was paradigm-level change But you quickly felt it wasn't attractive anymore So You wanted to challenge something more difficult in your PhD

Speaker A

Right And during the PhD period it was actually quite lonely At least in terms of results it was like that Hahaha The outside world couldn't tell From the outside it all looks like a very glamorous resume PhD at Stanford

Speaker A

Right, I think In terms of actual research output I think No one would say my PhD papers were bad But if I'm being completely honest How much impact did they have on the world?

Speaker A

I think almost none No impact, practically zero Right, so for me personally I was really unhappy with that But I also wasn't unhappy enough to, you know worry that people would say I was slacking off I really wasn't slacking off

Speaker A

You can still meet all the external expectations Right How do you pull that off?

Speaker A

Well, this is something that You know how it really feels, right? Right exactly I think meeting external expectations Or meeting the standards of a small circle It's like training a model Once you're in that small circle And you know what their evaluation criteria are

Speaker A

It's easy to do well Even if you don't actually believe in those standards You can still meet them Mhm But deep down, you know you don't buy into them Because sometimes even when you don't believe in it And you hit those marks

Speaker A

You can fool yourself and just keep moving forward But I eventually realized I couldn't fool myself Couldn't lie to myself Mhm Right When did you realize that?

Speaker A

I think probably around The last two years of my PhD I started having that feeling But back then, I hadn't really figured it out yet Hadn't figured out what to do if not this So I spent some time

Speaker A

Exploring different directions For example At first I mostly looked into Quantum computing Or quantum information, that kind of direction Then I got a postdoc offer After getting the postdoc offer It felt more urgent Because when you're still in school

Speaker A

You can still have a student mindset After leaving school, it's your own career（事业） You have to carve out a path for yourself So at the time I felt Quantum computing and AI were probably two I think they offer young people

Speaker A

More opportunities So what was your postdoc direction? The postdoc had no direction It was basically just theoretical physics A postdoc is a very independent position You basically do whatever you want Right, it's more like In a way, it's kind of like doing charity

Speaker A

Huh? Who's doing charity? Well, there are probably some Whether it's government organizations that care about research Or private organizations They donate money To the university Or allocate funding to the school The school uses that money to hire postdocs

Speaker A

Who then do research in a department And share their research Broadly with other people in the department I think it's more about creating a kind of social atmosphere This kind of This kind of work Right, and so So there really aren't many restrictions

Speaker A

You can basically do whatever you want But I didn't actually do The postdoc for very long I was probably at Berkeley for two or three months in reality But officially, I was only there for two weeks What do you mean by officially?

Speaker A

I mean I had actually already gone there before I officially started Because I was already in the Bay Area anyway I went there before I officially started But after I officially started I only stayed for two weeks before quitting

Speaker A

What happened during those two weeks? Nothing happened in those two weeks I wasn't even planning to start the position But the people at Berkeley were just too nice They were like, uh No worries, just wait until things are settled

Speaker A

Come for as long as you can Oh, so you told them you were actually talking to Anthropic Right I told them Actually I think I might go do AI Maybe I shouldn't join Mm-hmm But Berkeley wasn't Not just Berkeley

Speaker A

I think the Bay Area Teachers at both these schools are very nice They really take care of you They felt you haven't fully finalized things yet So better hold onto the current job first Do you think physics helped you later when doing AI

Speaker A

In what ways I think in terms of hard skills there wasn't much help In terms of pure tool-based skills Actually the transfer from physics to AI Is very very little But I think if you really have to ask

Speaker A

I think maybe the main Main No Can't say it's ability It's personality Maybe Maybe physics people want to get to the bottom of things more Want to understand something more And want to do things more systematically Because we're used to this very systematic

Speaker A

Whether it's experimental methods Or theoretical methods So I think this might be A good thing But I don't think this is unique to physics people either Like Why wouldn't computer science people have this trait I know many computer science people

Speaker A

Who also have this trait Many chemistry people also have this trait Biology students also have this trait So I don't think it's unique to physics Right but actually it's quite interesting There are indeed many in this field Especially with language models

Speaker A

This kind of large scale AI There are indeed many people from physics backgrounds Who have been very successful Right especially at Anthropic this company When many people describe this generation of AI They all say it's a black box

Speaker A

Can you use a scientific perspective To understand this black box The operating principles of artificial intelligence I think Everything in this world is a black box Like even physics Something everyone thinks they understand Actually doesn't really have An understanding from its microscopic behavior

Speaker A

All the way to macroscopic manifestations Like whether it's quantum mechanics Or quantum field theory They all describe behavior at that energy scale Essentially the system is still a black box You still don't know at its most microscopic level

Speaker A

What kind of dynamics AI is the same Whether it's a black box or not Is actually all relative We indeed don't understand language models to the level of Neurosurgery-level precision It's not that I understand this behavior To the extent of

Speaker A

Saying this behavior is caused by which neuron Which artificial neuron's which activation Producing this behavior We don't have that Haven't reached that level of understanding Except in some very sparse Very small networks Like Anthropic Has this so-called Interpretability

Speaker A

Interpretability team They might do some similar work But in practically usable language models We haven't reached such understanding But it doesn't mean we have no understanding at all For example Scaling Law It describes how models at that scale

Speaker A

With model size and data improve in perplexity Under this metric get better and better Mm-hmm so you say there's no understanding at all Well if Scaling Law Doesn't count as a small part of understanding Then can we also say

Speaker A

We actually don't understand this world at all either This world is also a complete black box So Scaling Law is a scientific law It's an empirical law An empirical law Right But The boundary between empirical laws and scientific laws

Speaker A

is quite blurry For example If we look back at these thermodynamic various different laws The first law, the second law The Clapeyron equation and whatnot all this messy stuff When they were first discovered they were also empirical laws

Speaker A

It's just that later on as time went by we gradually understood their microscopic mechanisms Then they might have become scientific laws Right, I think maybe something like Scaling Law or things like that Right now it's definitely still very impressive

Speaker A

But in the future, when the technology becomes more fixed and people start to understand it more and more the microscopic process will it become a scientific law if such a definition exists I think it's possible Can you explain in scientific terms

Speaker A

this so-called intelligence emergence First of all, this term itself isn't very scientific So naturally there's no way to use scientific language to describe something unscientific Intelligence emergence?

Speaker A

Well, I think intelligence emergence to me it's more of a subjective feeling rather than an objective phenomenon When many people talk about intelligence emergence what they might have in mind is that previous language models could only do one type of thing

Speaker A

like only translation only analysis only certain things But now it seems like the model can do everything But this thing Again, I think it's like to me it's more of a technical emergence rather than a behavioral emergence It's that through research

Speaker A

we discovered how to do this kind of large-scale training and then be able to lift all capabilities across the board I think this is the more fundamental thing As for intelligence emergence itself Actually, I think, um everyone probably has a different definition in mind

Speaker A

Right Your definition is To me, there's no definition Haha, to me The only qualitative difference is whether there's been a technical breakthrough that allows us to scale up and lift all capabilities across the board This, to me is a well-defined thing

Speaker A

You ended up choosing AI between quantum computing and AI How did this shift happen Right, I think I still spent some time understanding where the bottlenecks lie in both directions I think the good thing is they both give young people opportunities

Speaker A

The good thing is both have opportunities But quantum computing seemed to you to be closer to your main path at that time, right Well, that's why I needed to understand the details Because after understanding the details, I found out it's not

Speaker A

It's the opposite Because quantum mechanics Oh, not quantum mechanics I mean quantum computing I think its main bottleneck right now is actually in the experiments It's not about how you design those algorithms or design those operators It's more about how you implement it experimentally

Speaker A

That's something I'm actually not good at It's actually quite unrelated to many things I'm interested in It's actually relatively unrelated On the other hand, the things related to me are more Like AI, as I just mentioned It's more about having an idea

Speaker A

and then you can use some numbers to verify it This numerical aspect in AI might be training a model or something like that Right and this is actually quite similar to doing physics It even is That's why I've always liked to compare this

Speaker A

With 18th century physics Make comparisons It's more like physics of that era In that era theory and experiment weren't separated There were no theoretical physicists Experimental physicists You just did physics Just did physics You could do experiments yourself

Speaker A

And also do theoretical speculation I think AI is a bit like that era So actually The distance from theoretical physics to experimental physics Is farther than directly jumping to AI Farther mm-hmm Actually farther And in terms of interest it's also farther

Speaker A

You don't like experimental physics (I think) You don't like doing experiments I think, um It's indeed not where my interest lies Mm-hmm although I'm not willing to do it myself But I am indeed very interested In knowing how other people's experiments are going

Speaker A

Hahahaha Doesn't AI require doing experiments Yes, but it's more like numerics Right it's not quite like That thing where you go to the lab and build an optical table And whatnot You also have to I think experiments are really something

Speaker A

Maybe because I don't understand I haven't reached that level So some things seem quite mystical to me For example Everyone knows how to build this optical table But some people can build it for you Some people just can't build it after 6 years

Speaker A

This is hands-on ability I just don't get it Hahahaha I sometimes think This thing is a bit mystical Oh Mm-hmm so numerics are still better Numerics are much clearer Right right right, for me Doing numerical experiments Or like AI

Speaker A

Training models And studying various different techniques To look at certain details This thing is actually um, is I can understand why it's done this way Mm-hmm but when it comes to building the table I'm completely at a loss

Speaker A

You've done it before I of course have Everyone has probably done basic Physics students definitely all Done basic experimental training But more importantly I have many friends who do experiments Whether visiting their labs And watching how they do experiments

Speaker A

Or chatting with them about how to design experiments I feel like there are many things I can't quite understand But indeed some of them do it well Some don't do it well So you say doing AI research now

Speaker A

Is like doing thermodynamics research in the 17th century What it's actually expressing is Although everyone can't very clearly Scientifically explain and understand this thing But it won't stop it from developing Right it's more like Why Comparing to thermodynamics of that era

Speaker A

In that era Everyone actually didn't understand the microscopic theory of heat Everyone didn't know what heat was Just like now we can't understand Right just like now Everyone can't understand Which matrix element in this language model Is doing what

Speaker A

Actually everyone doesn't understand But it doesn't prevent you from having some good empirical laws Like various laws of thermodynamics And various Scaling Laws now So From this perspective it is From this From the perspective of this direction Yes

Speaker A

At this level It's something like And from a researcher's perspective It's that other point I was making Theory and experiment actually go hand in hand So how did you end up interviewing at Anthropic How did your Anthropic journey unfold

Speaker A

I think the main thing was I had former colleagues at Anthropic Haha yeah Former colleagues So Anthropic actually has a lot of people from physics backgrounds especially theoretical physics backgrounds Why is that In terms of their hiring choices

Speaker A

why did they choose this group of people I think Of course, many Mmm A lot of people might come up with reasons like physicists are good at this or that But from my personal perspective I think the main reason is still connections

Speaker A

Just connections Because in Anthropic's founding team there were actually three or four fairly technical people at the time and two of them are still very much on the technical front lines in leadership Both of them came from physics backgrounds

Speaker A

And the people they might have recruited also came from physics backgrounds So it just continued that way But actually, at this stage after I joined they barely hired anymore people with no AI background at all. Right.

Speaker A

So it's also a I think it's also a product of its era Right, and then Anyway, I decided to go into AI at that point So I tried to reach out to a few places And then You only looked at

Speaker A

Anthropic? No, I also reached out to OpenAI and GDM That is, Google DeepMind But Google DeepMind because it was too slow back then Hahaha, so I didn't Just didn't end up in consideration But Too slow You mean their interview process was slow

Speaker A

But later Obviously later They made huge strides with Gemini They moved really fast after that Haha yeah And then Anthropic Well anyway What about OpenAI I reached out to OpenAI too But OpenAI probably didn't find a particularly good fit in terms of projects and people

Speaker A

And Anthropic was because I reached out at that time And then it was my first that manager my first manager And he used to do theoretical physics too And he said at the time We're trying to do reinforcement learning

Speaker A

Trying to do this kind of large-scale reinforcement learning There are many scientific questions to understand That was in '24 Around August or September At that time actually reinforcement learning wasn't as mature as it is now Back then most people didn't really know how to do it

Speaker A

Because o1 hadn't been released yet Back then, o1 was just, yeah, yeah, yeah It was just Just Everyone knew it was out there But no one had seen the results yet But Anthropic didn't actually know how to do it back then

Speaker A

They had a general idea at the time But there were many details that needed careful study So he told me, hey There's this thing Would you like to come interview And I thought, hey It might be a good opportunity

Speaker A

How did you perceive reinforcement learning back then No clue, haha You roughly know pre-training Post-training yeah exactly I roughly knew the pipeline But I didn't really know the specifics of how industrial-grade language models are trained Mm I only knew how it's done in academia

Speaker A

Right, and then So looking back, what I knew then In hindsight, it was basically nothing Right, and then, mm More than anything I felt at the time that this was an uncertain thing But it was a good opportunity

Speaker A

So I just went for it Mm Of course there was some interview prep and the interview process, right How did you prepare What did you talk about At the time Who did I interview with Anthropic, some of my later colleagues interviewed then

Speaker A

And then The interview questions weren't too hard Anyway haha right But for me I didn't know how to prepare back then either I just went through all the courses I could find Learned everything I could on my own

Speaker A

Did all the assignments I could do And then I hand-rolled a whole system myself That Andrej Karpathy He has that famous project called I think it's called nanoGPT or something Anyway, he has one where You can train a tiny GPT model inside a Google Colab Notebook

Speaker A

And I hand-rolled that And then I went to the interview And that was it Right And got the offer pretty quickly And then, right Got the offer And then Your first direction was large-scale reinforcement learning Actually, back then two teams reached out

Speaker A

Two team managers Came to talk to me One was doing evaluation Basically model evaluation And the other was doing reinforcement learning I chose reinforcement learning You chose reinforcement learning back then Because it was more unclear, right Mm-hm, and back then

Speaker A

Anthropic wasn't the big company it is now The company was actually quite small back then How many people When I joined Our big team only had about 10 people Or 10 people Or 11 people What was the big team called

Speaker A

It was called Horizon Right, and then Back then that big team So like the parallel teams to this big team What were they That big team later basically became The team that covered every aspect of reinforcement learning Right, but back then

Speaker A

Its whole larger group Was just reinforcement learning The whole larger group Well, for a startup It's hard to say what that group's goal was Because They probably had many different goals at various points But just at that stage

Speaker A

The main goal was probably doing reinforcement learning Right, and then Of course there were also teams more focused on data below that Teams more focused on environments and infra and infrastructure And teams more focused on research and algorithms

Speaker A

And the team I joined Was more on the research and algorithms side Mm, how many people did Anthropic have back then Uh, back then probably Around seven or eight hundred in total But the whole company Seven or eight hundred, right

Speaker A

What was your first impression when you joined I think I think my impression of Anthropic Has actually been pretty Pretty consistent I mean, after joining My impression of the company was that it had very strong execution It's just that

Speaker A

It's actually a relatively top-down company Right and then So after many things are decided They go all in And The atmosphere between employees in the company is actually very good Everyone Doesn't hide things And especially when I first joined it was very small

Speaker A

So Everyone knew each other So the atmosphere was very good And I think If you're doing Just doing language model related things Actually looking back now That was a very very good learning opportunity Where you could get exposed to every aspect of

Speaker A

Training this model And could find corresponding people to ask Did Anthropic at that time already have What we all know now That very firm bet Yes yes Where did this bet come from Why did this bet exist I don't know its complete source

Speaker A

One obvious source I could see Was the previous generation model After Claude 3 was released On Twitter, which might not have been called X yet Many people on Twitter were discussing That Claude 3 seems to write code better than GPT-4

Speaker A

In that era GPT-4 was still a model with a huge gap from everyone else So being able to do one important thing better than GPT-4 Was quite impressive So it was discovered through trial I think at least that's one of the reasons

Speaker A

It was very quick feedback on the market Right, this is also something I think this company is very strong at Its execution is very very strong Once it gets a signal That makes it feel very reasonable Something this company should do

Speaker A

Then it will go all in It doesn't have that redundancy of large organizations Why was its coding definitely better than GPT-4 Can't say haha Oh there is a reason There is a reason There is a reason, right But it's a random reason

Speaker A

Not because I chose this So this result happened It's a purely technical reason But Indeed, I don't I can't determine whether it was randomly tried at first Or deliberately chosen If you ask me to guess I would definitely think it was randomly tried

Speaker A

Oh A purely technical reason There was someone who did something There was indeed a certain team that did something Was it top-down Or bottom-up I think at first it might have been bottom-up But later it became a top-down thing

Speaker A

To quickly capture some market Right, internal and market signals Right right I think this is Need to quickly go all in Right right I think this is something Anthropic is very very strong at It's very very reactive Reacts very quickly

Speaker A

Where does its execution come from Comes from this person Dario Comes from his certain trait I feel like Mm-hmm Anthropic As a company It can implement this Relatively top-down mechanism Is a very unique thing Why Because Implementing top-down actually has one very difficult point

Speaker A

That the person making technical decisions Must also be the company's decision maker Mm-hmm Mm-hmm First of all you have to be technically convincing Then the researchers below will You can then Convince the researchers below to do this thing

Speaker A

On the other hand, you have to be the decision-maker at the company You have to be able to take responsibility for the company Anthropic has that going for it That is, its technical leader Is actually a cofounder of the company

Speaker A

Who are you referring to? Not Dario Amodei Like Jared Kaplan And Sam McCandlish And both of them are cofounders of the company They make this decision themselves It's their company So they have the authority to do this top-down

Speaker A

Then Dario, as CEO Does he get to say yes or no? I don't know about their decision-making discussions Hahaha okay What role did Dario play?

Speaker A

I can only say The technical leader has the decision-making power I can only say For my work at that time The person I worked with the most was Jared But is this hard for other model companies?

Speaker A

Very hard. For example, OpenAI couldn't do it <b>When Ilya was there, wasn't it possible?

Speaker A

<b>When Ilya was there, it might have been possible But Ilya later, on one hand I don't know for what reason He seemed to have lost the ability to make decisions And then he left So...

Speaker A

What about other companies? I think other companies all find it pretty difficult Even Gemini finds it pretty difficult But I think Gemini has a completely different playbook It's a bit different That is, um I think big companies and startups

Speaker A

Their playbooks are fundamentally different Because for startups, what's important is to make bets That is, I have to bet on something If I want to bet It means there's risk So that means I can make decisions very quickly

Speaker A

And push decisions through strongly So perhaps in this situation Top-down is a big advantage, I think So I think organizationally, Anthropic Has an advantage over OpenAI But as a big company It might have a different mindset Because a big company's mindset might be

Speaker A

Not only can I minimize the gambling aspect But I can also have reserves in every area And then if anything succeeds I can catch up And if I succeed at something myself I might even take the lead That's probably the big company mindset

Speaker A

So at Gemini Google is a very traditional Very bottom-up organization At the company level There may be some well-defined frameworks To evaluate whether your work is good or bad To guide you to do things the company needs But essentially

Speaker A

It's still you deciding what you do yourself So you think Anthropic can make bets (referring to betting heavily on coding) Because of its unique culture Organization and culture, yes This sounds like Something other companies should be able to do too

Speaker A

But it's very strangely found that Other companies find it hard to do While Anthropic can do it Yes, I think it still requires technical credibility Or the company's leaders need to have credibility I think this is actually quite difficult

Speaker A

You're not even talking about the CEO having credibility It's the #1 technical person having credibility Yes, to me I think it's very important for the #1 technical person to have credibility But at the same time The CEO may not have become an obstacle

Speaker A

Yes Is this hard? Ah, I think it depends on your This cofounding team Whether there's enough mutual trust This is also crucial I think Anthropic is also strong in this regard Very strong among startups Its cofounding team Not a single person has left the company

Speaker A

If you look at their past They are a group of people who have truly fought battles together In the past They originated from, they were all former OpenAI employees Mm-hmm right And Many of them were even Co-authors on a series of key papers

Speaker A

Co-authors, because like The Scaling Law paper Was Jared Kaplan and Sam And of course Dario And some others Maybe Tom Brown was there too I can't quite remember if Tom Brown was there And the GPT-3 paper had Tom Brown

Speaker A

And Benjamin Mann And Jared Kaplan and Sam were both there Dario was also there So they are people who have been in the trenches together I think mutual trust is still very key Mm-hmm, many companies might just be doing their thing

Speaker A

And can't even keep this small group united Then how can you expect This big company to stay united You're talking about OpenAI right Mm-hmm hahaha When you joined Anthropic What was the most important Project the company was working on

Speaker A

Did you participate in that big project Right At that time the goal was to do large-scale Large-scale reinforcement learning And use it to improve coding ability That was the most important thing at that time Mm-hmm and we were doing this

Speaker A

This team The research focus at that time was this thing This is also why this team later gradually grew bigger And became more and more important And The final result was Everyone trained this 3.7 together The Claude 3.7 model

Speaker A

Hey you said internally there was a 3.6 This is Not internally called It's from the outside Claude 3.5 actually had two versions One might be the June version Another October version, and then You can also see Anthropic this company

Speaker A

Used to have no product capability either Actually calling two models by one name Hahahaha So later outsiders to distinguish Called the later version of 3.5 as 3.6 So Anthropic followed this outside convention And called it 3.6 Called this newer model 3.7

Speaker A

So If you look at the actual product timeline of this company It's actually 3.5, 3.5new, 3.7 How could there be a 3.5new What were they thinking Haha I can only say Anthropic at that time Probably really had no product ideas

Speaker A

So your first project was 3.7 or 3.5 3.7 3.7 Or 3.5new 3.5new Actually I Didn't participate, almost didn't participate But 3.5new Already showed signs of coding Really? When you first started At the time of 3.5new Already saw Anthropic's model

Speaker A

Would be stronger than other models in agentic coding Why is that Can't say hahaha So when you went in It was exactly when They knew about this thing That management also knew about this sign Right and when they wanted to make bets

Speaker A

You had very good luck I think I think, right I think when I joined Everyone had definitely already seen This thing could be done and was important But didn't quite know how to do it And when I went in

Speaker A

I was researching with everyone how to do it Right so the method was large-scale reinforcement learning Right from the big picture perspective But of course There are many technical details that need to be researched What know-how is in here

Speaker A

Haha there are lots of NDA (Non-Disclosure Agreement) contents Hahaha Would NDAs be written in such detail Actually in principle In principle Employees cannot during their employment and after leaving Disclose any information related to the company's internals Of course in reality

Speaker A

Everyone probably has a sense of degree in their mind That is If this technology hasn't been made public Definitely won't discuss it publicly But I think although I can't discuss it publicly But I think Doing simple things cleaner than anyone else

Speaker A

Is the most critical thing What do you mean by clean You also used this word just now Right it's it's I think there are many fancy techniques For example doing reinforcement learning The simplest algorithm is Policy Gradient But that doesn't mean it's the only algorithm

Speaker A

There are other algorithms Like various complex Search algorithms and such But Are these complexities necessary And these complexities might bring you Some efficiency That is efficiency improvements But they might bring you some For example Infrastructure difficulties Then how do you trade off these things

Speaker A

These things actually need to be understood in research How to balance these different factors And choose the best path The most stable path Right and I think a lot of know-how Is actually in these These details How to handle all these aspects of details

Speaker A

Then how was coding described as important at that time I think Is it considered a branch of large language models An important branch Or what I think everyone might have different ideas For me For me There are two reasons it's important

Speaker A

One reason is One reason is What Anthropic has been talking about That coding itself Is also part of language model research If you can do coding very well It might make your research efficiency Improve by multiples Mm-hmm, forming a research flywheel

Speaker A

This is one reason For me Another reason Is because coding is actually a model Using tools and interacting with the environment A very good abstraction First of all the benefits of this abstraction What are the benefits of this abstraction

Speaker A

For example the feedback signal is clear And data is abundant And Actually it's very hard in other scenarios To find Tool-using scenarios that have both these traits simultaneously So for me this is a good abstraction Some research done in this area

Speaker A

Might be useful for more general Those abilities to use tools and interact with the environment Some useful Useful lessons What was Cursor's status at that time At that time Cursor was still a Pure product company I think in a sense

Speaker A

It seems like before I went to Anthropic During that period Claude and Cursor were both in relatively underdog states And somehow at 3.5new, which is 3.6 The outside world's 3.6 generation First the model capability went up Then Cursor discovered

Speaker A

This model Could really do this kind of Agentic coding tool It's just a shell Right but this shell wrapping this model Suddenly let the public experience Not the public The public here means the software engineering community At that time, um

Speaker A

I realized Wow, this really seems like a productivity tool So after that, it just took off So around that time Anthropic realized Cursor is a future competitor I don't know about that You'd have to ask Dario, hahahaha, alright

Speaker A

How was 3.7 made This was a watershed moment For Anthropic It was a watershed model I think for Anthropic's post-training It was a watershed Before 3.7 Post-training was in a relatively Um Small-scale And It was more like patching things up

Speaker A

That kind of state for the model People didn't value post-training, right? It's not that they didn't value it Everyone from the start For a long time No one really figured out How post-training should scale up Oh, but during that period

Speaker A

Whether OpenAI or Anthropic Or even like China's DeepSeek, right They realized how to scale this up And how to scale it up You have to find The right environment Where the feedback signal is clear enough And the environment itself is a strong data source

Speaker A

And then On top of that You can make the training very stable Then it can work Yeah, I remember back then Actually no one knew What OpenAI's secret project was Just knew it was called Strawberry Called Strawberry And then, um

Speaker A

People thought it would bring a new paradigm A new paradigm of post-training reinforcement learning But no one knew much more than that Yeah actually I think when I joined Anthropic People already had a pretty good idea About how this should roughly be done

Speaker A

The general direction of how to do it And then Later on, as time went on As I learned more and more about this field I discovered At that moment The way OpenAI was doing things And Anthropic were actually quite different

Speaker A

How so? In terms of the specific algorithms And the way they used data They were actually quite different Although both are called post-training and reinforcement learning Um, although both are called that But of course I don't think those are the fundamental differences

Speaker A

In terms of the big picture They're the same They found some Found some very regression-like Very clear signals Very objective And the data itself is relatively clean And learnable for the model And do stable reinforcement learning training on top of it

Speaker A

In the big picture, that's the direction But the specific implementations differ quite a lot But later it was proven The specific implementation Each company actually went in different directions But they all succeeded Um, and at the time OpenAI's goal wasn't coding either

Speaker A

From what I understood, the narrative was Pre-training as the first paradigm The gold mine is almost exhausted So now we're opening a second gold mine Which is post-training and reinforcement learning To let the Scaling Law continue, right I think for a long time

Speaker A

OpenAI had this idea I don't know if their thinking has changed now For me My thinking has gone through shifts Around the era of 3.7 I actually felt like I At that time I also had the feeling that pre-training was almost

Speaker A

Party is over This kind of feeling And right when you were about to join Right when I first joined And at that time when doing these 3.7 related These kinds of experiments I also once had this idea But later as my understanding deepened

Speaker A

I felt I discovered Actually there's still room to do things And um Pre-training Scaling Law It doesn't tell you to keep getting bigger It's actually a very systematic framework That can tell you what kinds of things are more effective

Speaker A

Right mm-hmm And So later discovered Actually there are still many things to do The fact is Later Anthropic And Gemini's pre-training Have also been continuously progressing OpenAI itself was stuck for a long time Haha, are they paying attention to pre-training again now

Speaker A

They should have been paying attention to pre-training for quite a while It's just recently they might have made some progress So pre-training and post-training as two paradigms Neither has reached its plateau I think neither has But you say predicting how far it will go

Speaker A

Can't do that Right I think I think reaching a plateau has Two possibilities Two possibilities One possibility is the technology itself has reached Where you still have things you want the model to do But these two technologies just can't teach it

Speaker A

Another possibility is The things you want to do have reached a plateau I think now it's the latter Right now we know oh There's a Chatbot You can teach it to do this And then there's coding You can teach it to do this

Speaker A

And then we don't know Right, don't know what else to teach it That is to say This model is still a very smart kid Right you can actually teach it many things Right but we humans as teachers Now don't know what the next thing to teach is

Speaker A

Right right Or how to reasonably teach it Using current paradigms Speaking of 3.7, what other know-how How many months did this take This finally all in all From starting training to release Probably took about four or five months

Speaker A

From when you first joined From when everyone started Doing research for this thing That probably took two or three months And then later from starting training to training completion With bumps along the way Many things to handle And there was a lot of new infrastructure

Speaker A

Actually infrastructure is really important Very time-consuming And then probably took about two months or so What important work did you do in it I don't think I did anything important Hahaha I think My personal contribution I personally My contribution to any model

Speaker A

My statement Is always I feel like I'm not that important to that thing I think more importantly I was very lucky To have the opportunity To join an important project at that time And did some things Mm-hmm, because in a sense

Speaker A

I think AI in recent years This thing itself is unstoppable It doesn't depend on whether you do it or not If you don't do it someone else can do it just as well So I think in this era

Speaker A

Actually all things that give individuals credit Are somewhat hyped Suspicious Of being hyped But indeed I think for me I am very lucky Being able to join at that stage was a big deal And, well, I learned a few things

Speaker A

So you were lucky to be there at that stage At Anthropic this company's large-scale reinforcement learning team what did you do I think around the 3.7 era, what we mainly worked on was still working on this agentic coding thing

Speaker A

how to scale this thing up or how to prepare like how to set up all kinds of environments and data including what algorithmic problems you'd run into Most of the research at the time was on this part Any tips on this?

Speaker A

Looking back, there aren't really any particularly useful tips, haha I think When it comes to technical tips this is actually something that on one hand, people are really eager to hear about but companies won't let you talk about

Speaker A

and in reality isn't very useful Why? Because a lot of algorithm design isn't actually independent independent of the algorithm itself It's very strongly dependent on your infrastructure A simple example is some companies there's a problem people often discuss

Speaker A

which is during reinforcement learning the sample（采样） machine, the one that generates these these trace（轨迹） these token(词元) — that machine and the trainer(训练器) used to actually train the model and then update the model weights — that machine these two machines might be different

Speaker A

But the difference is partly due to numerical differences and partly because of using this kind of asynchronous training architecture so naturally fundamentally they're different So different companies might have different degrees of this difference so your algorithm design will also differ

Speaker A

Some companies might have these two differences being very, very large then the biggest part of your algorithm might be how to control this and how to keep the training stable Things like the actual training effectiveness will be weighted slightly less

Speaker A

But some companies might have particularly excellent infrastructure so the difference between these two isn't that big then you can probably spend more effort on the training effectiveness So a lot of these small tips are actually not very useful

Speaker A

A lot of know-how is actually not very useful I say this because I've indeed noticed that many other labs — well, not people in these three labs probably really want to know like how Anthropic does this or how Gemini does that

Speaker A

But sometimes I'm reluctant to answer One main reason is that fundamentally I think answering this question would mislead them Modern AI training is a large system You actually need to understand all aspects of this system to have a holistic understanding

Speaker A

of what makes something useful because of what rather than saying the thing itself is useful What happened from 3.7 to 4.5?

Speaker A

Both pre-training and post-training, yes And um Of course it's just more scaling up And data Whether it's data or training the compute is at a much larger scale But I think in terms of paradigm, there wasn't anything particularly major that changed

Speaker A

How many people was it when you left Anthropic? Close to 2,000, I think More than doubled Ah um So during your time at Anthropic it happened to be going through its most dramatic transformation Ah, I probably just caught

Speaker A

the tail end of it being a small company Actually, I think after three or four months, the company already started and suddenly there were way more people Did the culture change?

Speaker A

There were still some rather chaotic phases And then Especially around the time when I left The period right before I left I think culturally, it went through some some chaos Because some people came in from outside and there was probably some conflict with the original culture

Speaker A

Oh, the previous culture was I think before, it was just pretty simple Yeah, it was very simple It was more like a small workshop Everyone was friends And everyone knew what the others were doing And No one was particularly

Speaker A

you know, doing too much self-promotion or anything like that Doing pointless things No one was doing pointless things Everyone had a lot on their plate And the company back then probably had a stronger sense of urgency And later on, people probably felt that

Speaker A

with more people this kind of culture would definitely take some hits What kind of atmosphere did it bring?

Speaker A

I think There were indeed some people I personally didn't like very much Of course, that doesn't mean they're actually bad I'm just saying I personally didn't like them I mean, I probably don't like people who talk a lot in this field

Speaker A

Like, I think 'idea is cheap' Ideas are cheap Many ideas are actually quite obvious, everyone knows them The hard part is how to implement them How to break it down into small actionable steps and actually get it done

Speaker A

I don't think I like those who spend a large part of their day on Slack, I mean Slack is a workplace software used in the US and spending a lot of time on Slack talking about grand principles I think it's just

Speaker A

not very useful, haha Why did you suddenly leave later on? Had you completed some milestone at the time?

Speaker A

How long had you been thinking about it? At the time, I think I'd been thinking about it for a month or two about a month or so a little over a month That was fast, yeah yeah I think one aspect was

Speaker A

Um, it was I actually didn't really agree with Dario's anti-China stance Ah, I think as a company CEO For him personally whatever views he holds, I think it's fine But as a company CEO I think pushing this view to such an extreme

Speaker A

was a very emotional reaction Yeah, and this was a relatively minor reason But on the bigger picture There are many companies Like I just mentioned There were some cultural shocks at the company And including myself I probably wanted to learn some different things

Speaker A

I mean, Anthropic is after all very focused And you might be doing If you really want to work on everything related to language models in all aspects And working on this kind of tool use, this Agentic stuff and coding and such

Speaker A

then Anthropic is actually great You can learn a lot But there are many things Anthropic doesn't do For example, no one at Anthropic is doing this kind of multimodal generation You want to learn but there's nowhere to learn it

Speaker A

And Anthropic probably didn't spend too much energy on this kind of more low-level engineering infrastructure Right So probably wanting to learn more things was also one of my motivations for leaving at the time What percentage was the anti-China stance?

Speaker A

Because of Dario's personal reasons I've in public Combined say 40% But this number anyway just listen to it This number just tells you It's not the main reason But it is indeed a very big reason Not controlling Not a controlling reason

Speaker A

Right not a controlling reason But it's a majority holder reason Your choice is also quite amazing Because most people When it's still an underdog Joining will create more emotional attachment Willing to accompany the company for a longer time

Speaker A

But you instead jumped to Google Because many researchers once they enter Google They feel Google doesn't give enough scope Mm-hmm So they instead want to jump to places like xAI Or smaller organizations like Anthropic Your move seems to be the opposite

Speaker A

Right I think Actually depends on what you yourself want If what you really want is I have a very clear Like you said a very clear scope And this thing Is closely related to my final product model I must get one of my ideas

Speaker A

Into this model Then Google might be a very bad place Because after all there are so many researchers So many already mature organizations Doing this thing Has a very complicated process But I think Gemini is very If what you want is research freedom

Speaker A

Freedom to explore And want to learn from broader humanity I think in this world You probably can't find a second place stronger than Gemini So So it's I think Essentially it still depends on what you yourself want But I think many people when they leave

Speaker A

Regardless of where they leave from After switching to another place The main reason they might feel unhappy Is because they didn't figure out what they wanted For example if you came to Google But told me At first you thought you wanted research freedom

Speaker A

And more motivation was learning And after you went Discovered you still wanted product impact Then you might feel very uncomfortable haha You don't pursue impact You also said this Now AI is a very large system And is a

Speaker A

Very large collaborative effort What are you pursuing in it I think it's divided into stages I think At Anthropic After experiencing too much Product-related things I might also want to change my mindset To learn some different things But you say is there any day

Speaker A

I might switch back to this mindset And want to produce some product influence That's also possible How do you quantify product influence This is very clear internally Really Hard to quantify I think Because when publishing papers there was still first author

Speaker A

This kind of lead author Now Mm-hmm actually there's no way to quantify The reality is there's no way to quantify This is also why I think in this era Actually talking about each individual's influence Is a very very ethereal thing

Speaker A

I think essentially it's still the organization that did Such a thing Or the world needs this So producing product impact is a subjective feeling At least on the model side it is At least on the model side it is

Speaker A

Right and then Of course actually you can I think you can The details are about what things you yourself have done Specific technical contributions And the effects produced technically This can be discussed objectively But more subjective things are

Speaker A

You were saying how much did this effect account for in the final product No one can really say for sure Can you describe what you did on 3.7 What kind of technical work did you do that actually had an impact on the model

Speaker A

It was mainly related to agentic coding and the environment around it And some algorithmic work as well On the algorithmic side, it was mainly about making the training more stable To be honest But I do think there were definitely some algorithmic improvements

Speaker A

but they didn't achieve particularly ideal results To be honest It's definitely better than the previous algorithms Yeah But I don't think that was my personal contribution I think it was a collective effort from everyone, haha Right, every time I ask you

Speaker A

you always say it's a collective effort It's not an era of individual heroism anymore Right, I think the era of individual heroism for language models has probably passed When was it?

Speaker A

It was the Transformer moment Right, at that point when the technology hadn't yet reached the scale-up stage The person who discovered that technology might be a hero Or a small group that discovered it might be heroes After that technology was found

Speaker A

for probably a long time from the model side, it's all been I think more about collectivism whether this group can work together whether they can toward a common goal spending their own time together and their own energy That's the most important thing

Speaker A

Rather than what each individual contributed The reason you say collectivism is because the capability actually comes from AI, is that right?

Speaker A

The reason I say collectivism is because I think AI as a field is fundamentally simple Like I don't think there's any Except maybe that leap moment where the idea might require some really deep insights In the process after that

Speaker A

many ideas are actually very trivial (微不足道的) Very stupid, basically Anyone could think of them Anyone could do them It's just that you got lucky and happened to seize the opportunity to do it Including when you described Anthropic doing coding, it seemed like there was some randomness to it too

Speaker A

But you have to seize it Right, right. But I think when it comes to coding it might still involve more than the technical stuff on the model side a bit more corporate heroism, perhaps That is, whether you can bet on it fast enough

Speaker A

Yeah, Anthropic was indeed very strong in that regard But if Anthropic hadn't done it today some other company probably would have I think so. It's inevitable So it's all about emergent capabilities in AI It's just about whether you can seize that capability

Speaker A

Whether it's a company or an individual Right right I think before usable language models before large-scale language models emerged a lot of things were not inevitable Like whether someone could invent something whether a language model could be trained at scale

Speaker A

and whether the GPT paradigm could be discovered There was a lot of uncertainty But like you said, for example if there had been no Google Brain back then Transformer might not have been discovered It might have taken many, many years

Speaker A

before another well-funded organization with talented people discovered it That would have been a huge impact But after entering that stage especially now, the situation has reversed Any organization that wants to stop AI progress can't do it Anthropic has

Speaker A

Anthropic is very concerned about AI safety But does Anthropic have the ability to stop AI development?

Speaker A

It doesn't If you stop developing Others will continue Your voice will only get smaller Right, actually right now it's It's more like this kind of situation The world is pushing us forward Rather than us pushing the world forward

Speaker A

I feel like in the future it'll be even harder for us to stop AI Haha, I think we already can't stop it I just think Trying to prevent one specific thing from happening with AI Probably isn't the right mindset to begin with

Speaker A

This also relates to what we were just talking about Because we were just talking about Anthropic One of Anthropic's very important motivations Is so-called AI safety I think when it comes to AI safety The motivation when it was founded

Speaker A

Right What does that have to do with it now The relationship now is complicated, meaning A natural Question people might ask is A company focused on AI safety Why is it now training frontier models Anthropic's explanation is that

Speaker A

First, I need to have the most cutting-edge model Only then do I have a voice to push my AI safety agenda So actually, its thinking all along has been I want to build the best model in the world

Speaker A

Everyone will have to listen to me To push forward my safety policies But from my personal perspective I think this idea is very naive Looking at this now It's not going to happen What's more likely to happen is

Speaker A

Everyone will have great frontier models And you won't be able to stop anything from happening Maybe for this issue What we should focus on and think more about now is If you really want to avoid AI Bringing about some crisis

Speaker A

What would be a more self-enforcing approach Let me give an example of a self-enforcing mechanism Like nuclear weapons, for example Nuclear weapons are also something that everyone thinks, hey This might have the power to destroy the world But with nuclear weapons, in the end

Speaker A

The way they were ultimately controlled Is multi-party control In this world There are many countries with nuclear weapons They all have the ability to destroy each other So stability is maintained through this kind of balance of power I think if you want to stop AI from doing bad things

Speaker A

Maybe Ultimately, you'll need a similar mechanism to achieve that Rather than hoping Pinning your hopes on One company setting a law to do something Mm right And it sets it itself It can only govern itself Mm, you also just mentioned

Speaker A

Anthropic has an interpretability team Right How far has their interpretability gotten In some relatively simple Relatively sparse neural networks They can do some interesting research For example Look at what a certain output Or input text or image What its internal representation looks like

Speaker A

And then maybe you invert that representation somehow What kind of thing it can output after that Doing this kind of research You also just mentioned a viewpoint That AI is essentially simple Can you describe what you mean by this

Speaker A

This is a conclusion Right, I think this is This isn't even a conclusion It's just my statement（陈述） It's my statement It could be right or wrong Oh, and my explanation for this This is your view Right, my explanation for this

Speaker A

My explanation for this statement is I think the reason it's essentially simple is That you can run experiments Like, compared to things that are fundamentally difficult Like physics, for example The difference is With that Without experimental data at that energy scale

Speaker A

You simply can't understand the theory at that energy scale But AI isn't bound by this（约束） It doesn't matter if you don't understand it It can still move forward And also right now The fact is I can do any experiment I can think of

Speaker A

It's just that possibly I need some time To scale up the compute Or get the infrastructure ready But there's no fundamental difficulty Right So I've always been saying I feel AI doesn't give people the sense That it's hitting a wall because

Speaker A

First, you can try many things Second It's not that everyone has run out of ideas With no ideas left to try More often it's that there are too many ideas Need to try them one by one Take time

Speaker A

Mm-hmm Feels like humans are so insignificant In front of these experiments Yes so I think very soon AI might start doing experiments itself How soon is very soon Within 4 months I think in the next 6-12 months AI will do experiments itself

Speaker A

I think of course this statement Is not very well-defined Sorry I said something very vague Like um AI improving itself Or speeding up its own development process This is actually already happening Right Like we discussed earlier It's already helping us

Speaker A

To achieve some of the things we want And speed up our experimental pace But I think in the next six to twelve Sorry What it currently can't do is Whether it can From start to finish complete an AI research project

Speaker A

Like not only can it write this code It can also run this experiment Run this experiment Can also see the results See the results Can also analyze the results Analyze the results Know where it did wrong Then propose new hypotheses

Speaker A

Design new code Run new experiments This chain is not yet complete But I think This chain Might be the next thing to gradually become complete Based on your various reasons At the moment you left Decided to leave Anthropic

Speaker A

What were your expectations for this company's future I think when I left I was actually quite pessimistic about this company But later obviously I was overly pessimistic Hehehe why pessimistic The reason I was pessimistic at that time was

Speaker A

I think when I left Anthropic Anthropic actually um Its main revenue source was API Selling tokens And This is a bad business Is a bad business Because this business Is only a good business for one company Which is Google

Speaker A

Because this This business eventually leads to price wars Eventually it will be price wars In price wars if you don't have the complete chain There's not much advantage But later Anthropic obviously on the product side I think indeed there were many clever ideas

Speaker A

Did many good things Whether it's Claude Code getting better and better And Claude Cowork And various Work and efficiency related things All slowly converged So it feels like it has now become more than What I thought at the time

Speaker A

If you ask me which of OpenAI and Anthropic would die first Of course they won't really die Just which would become less important first At that time I would think hey Maybe Anthropic would become less important first But later first OpenAI got punched by Google

Speaker A

Then Anthropic itself got on track So now it seems Anthropic has more advantage Haha Have you ever regretted it Mm-hmm not really I think for me personally My personal motivation was still wanting to switch places Improve myself I think for this

Speaker A

For the thing I wanted to do This choice wasn't wrong You also mentioned Anthropic's products have many clever ideas Especially this year Like Cowork and such Where does this come from I think I didn't see Cowork's development process

Speaker A

So I don't know And Claude Code I think the person, the product Might also Really have some opportunities for individual heroism Is it a researcher or a product manager Boris Cherny I think Claude Code almost At least the beginning of this thing

Speaker A

Was him wanting to do this thing himself To improve his own or colleagues' work efficiency Finally became something Important to everyone What kind of person is Boris I didn't have too much personal contact with him I mostly just saw his work, when at the company

Speaker A

He's a researcher right Right but he's mainly on the product side So Anthropic does have a dedicated product department Didn't used to be so separated Later had a separate one Right, Anthropic seems to really understand AI products Right I think

Speaker A

I think this is why When we first started talking Felt that product managers Might still be quite hard to replace with AI currently Hahaha mm-hmm Good product managers Hey he doesn't seem to be the previous generation of product managers

Speaker A

He's not the kind who arranges features and such He seems to know how to collaborate with AI Some kind of product manager Right I think the previous generation of product managers might But not entirely The previous generation also had some

Speaker A

Interaction Interaction-level changes But every interaction-level change Actually brings a very big product Like maybe Douyin Is a product with interaction-level change Then it immediately brought huge Mm-hmm opened new directions And I think Maybe Claude Code is also a product at this level

Speaker A

Claude Code and Cowork were both by Boris I don't know who did Cowork OK I already left I see Then tell me about after you arrived at Google DeepMind, has your work focus changed Work focus changed or not

Speaker A

Mm-hmm still Some changes happened And I anyway mainly focus on Doing ML coding And some relatively long horizon things These two things 其实刚才都都大概提了一嘴 Like ML coding Actually just now both were roughly mentioned Actually it mainly wants to achieve

Speaker A

The complete AI training itself process we just talked about Of course in this process There are many practical problems Many practical details to solve I think in the big picture Everyone actually has quite a consensus on how to do it

Speaker A

But still back to details There are many things to handle in details Like how to choose appropriate data How to choose appropriate feedback signals And it brings new infrastructure challenges And Now it's about slowly figuring out these things

Speaker A

Slowly figuring them out And Like long horizon Is the other thing we just talked about That is wanting to achieve That this model can Still that slogan Train with finite But use as infinite I think wanting to make this training

Speaker A

Length longer and longer and longer Might not be making a single training This segment's length keep increasing Might not be a very realistic solution But a very realistic thing is How do you under limited context Do longer work

Speaker A

Actually if you think about it Humans are actually like this Human context is actually very very short If you ask me now what I ate last night I can't remember at all Ah you might still remember Hahaha I can't remember at all

Speaker A

Because why Because it's not critical to my current scenario Right Like even if I knew what I ate last night So what So I choose to forget it So human context is essentially very short But they can selectively forget

Speaker A

And selectively retrieve To bring back these important Information relevant to the current scenario So I think that might also be for me A very interesting direction These two things are actually somewhat related Somewhat complementary Why, these two things

Speaker A

Actually both are within the large category of models using tools and with environment And different models Different people interacting Within this category The node everyone completed in the past Is Agentic coding, which is both tools and environment Environment is this virtual machine

Speaker A

Or interacting within your own computer And this thing Actually horizontally it grows different usage scenarios Then doing AI research Is actually horizontally Another scenario in this scenario This scenario Actually not only horizontally is it a new scenario Vertically

Speaker A

It also makes the scale of this thing longer Because completing a code completion or something Is a very quick thing But doing a complete AI research Or doing this kind of computer science research Is a very long process

Speaker A

Right so It's actually like a T-shape Horizontal extension Vertical extension too Is long horizon still a scientific problem Mm-hmm there are scientific problems Also engineering problems I think its scientific problems are more about How to try different solutions

Speaker A

After trying in a more scientific way To find the path we ultimately want to take This solution What are the ways Mm-hmm I might not be able to say too specifically But broadly speaking Some solutions are from the pre-train perspective

Speaker A

From the pre-training perspective Some solutions Are similar to this sparse attention Sparse attention For example DeepSeek also has some work And academia also has a lot of work And from the post-training perspective Also have post-training solutions Like for example externally

Speaker A

Like what you use every day, Cursor and such They have very strong context management Managing this context ability Like it can let the model choose I think this middle segment is unimportant Just throw it away And that segment is important so store it in some file

Speaker A

Retrieve it when needed These two broadly speaking These two solutions Both have people researching Of course the specific implementation details Are more than the examples I just mentioned The examples I just mentioned Are relatively public examples The specific implementation details

Speaker A

Of course each company has its own little secrets Well, I think ultimately it all comes down to that And then I personally spend a lot of more time on post-training approaches Because Well, first of all, because I myself

Speaker A

haven't actually spent official work time on pre-training Pre-training is more of an interest to me something I want to learn about But I myself haven't actually done that much work on it And on the other hand, I think post-training approaches

Speaker A

actually align better with my own understanding of this My understanding of this is exactly what we've been talking about whether you can train with short context but still handle long-context tasks Pre-training approaches essentially still require you to have long context

Speaker A

Training it requires the data to contain it Right yeah. Right, right. So so it doesn't quite fit my philosophy on this problem Oh right.

Speaker A

So do you think it's possible now? Training for long with short I think It's definitely possible but we're not sure which approach works best Gemini does long-context really well Why is that?

Speaker A

There are some tricks [laughter] There are some tricks that really surprised me, haha Oh, this is about pre-training, right?

Speaker A

Doing long context well definitely requires both sides But I'm just saying, for me, the pre-training side that trick still really surprised me [laughter] Right, OpenAI doesn't do it as well as Gemini on long context But there are also different opinions

Speaker A

Some people say that with this Gemini 3 generation long context actually got a bit worse and stuff like that. Right.

Speaker A

Again, when you joined Gemini it felt like people didn't have high expectations for Gemini No, I already had pretty high expectations for Gemini at the time Haha, what year and month was that?

Speaker A

I joined at the end of September last year That was before Gemini released Gemini 3 You had high expectations for it What about others?

Speaker A

I think people in the industry still had a pretty good impression of Gemini back then I mean, I think before, everyone thought Google was in real trouble under OpenAI's impact I think people's perception probably shifted with the Gemini 2.5 generation

Speaker A

Because 2.5 was clearly you could tell Google was getting the hang of it Of course, even before that, Gemini's 1.5 also had some, you know, small things where it was already pretty strong in specific areas It was clearly no longer far behind

Speaker A

But 2.5 was really truly a generation I think it was when people actually started using the model Anyway, I myself have used 2.5 quite a bit used it quite a lot You went to Gemini because you saw 2.5?

Speaker A

My going to Gemini had nothing to do with that Mainly it's because I knew what kind of atmosphere Gemini had There were a lot of people doing different kinds of research And I also knew some people actually doing really interesting research

Speaker A

And many Gemini engineers I think their technical skills are extremely, extremely strong I think I learned so, so much from them And um that's the reason for me But I think from everyone's perception I think people in the industry, after seeing Gemini 2.5

Speaker A

probably realized that Gemini was catching up So for you that wasn't a signal for you to join Gemini, right?

Speaker A

It wasn't a signal for me to join Then why did you join Gemini? Well, like I just said, Mainly because I wanted to accomplish something back then Actually, I wanted to have that But you know Gemini has strong people

Speaker A

Right? Yeah, exactly It's because when they came When they approached me, they'd definitely want me to Go talk to their people, right?

Speaker A

So from those conversations You can actually get a sense of how things are Oh, so they came to you Yeah But I think in the end it became a two-way street So hahaha Wasn't OpenAI an option for you back then?

Speaker A

If you wanted to leave Anthropic OpenAI was also an option at the time OpenAI should still have been stronger than Gemini In terms of momentum, right?

Speaker A

At that time But Though back then Weren't there all those internal politics Infighting was starting to emerge I think so So OpenAI was indeed an option for me back then And of course there were also options like xAI

Speaker A

And I think The main reason I didn't end up at OpenAI Was that I had concerns about its Culture, at least at that time I had pretty big concerns about its culture I just felt that To put it bluntly, people who actually get things done

Speaker A

There weren't as many as at Gemini Even fewer than at Anthropic Right? I really care about that Hahaha yeah So a sense of cultural and personal connection brought you to Gemini Yeah And then you also caught that Gemini 3 inflection point, right?

Speaker A

Hmm Gemini 3 should have been a major turning point for them A turning point period, right?

Speaker A

I think in terms of actual impact I think it was two things That created a major turning point for Gemini Turning it into a heavyweight player in the market The player is Nano Banana Nano Banana and Gemini 3

Speaker A

Two things back to back, which is I think if there were only Gemini 3 It probably wouldn't have had such great results Because when your market share is less than Even 10% Whether your model is slightly better or worse

Speaker A

It just spreads too slowly But what Nano Banana did was First, it went viral in the market, it was a huge hit Then a ton of people downloaded the Gemini app And then Gemini 3 was released right after

Speaker A

Retaining those users So Now it's become a major player I think if Gemini hadn't thrown this punch OpenAI's position would be really comfortable Its market share is so high that Whatever you do with the model It doesn't actually matter that much to them

Speaker A

To be honest I think when ordinary people use models Their perception of the model's capabilities Is actually very, very weak Most people don't even use the o-series models Most people just use the regular ChatGPT one Right, so I think for Genimi

Speaker A

This Nano Banana built up the user volume And then Gemini 3 retained those users Was something critical How many ChatGPT users did it actually take away?

Speaker A

Hmm, I don't know the exact numbers now But my feeling is Gemini's market share is probably around 20% But I haven't really checked the current data carefully Looking at it with hindsight These two factors Together contributed to Gemini's challenge to OpenAI today

Speaker A

So from an insider's perspective you must have known earlier What happened and why Google would undergo such changes Yeah, I think First of all, Google's technical reserves Have always been sufficient Hmm, enough talent Yeah, they've always been sufficient

Speaker A

And then Organizationally speaking It became increasingly clear later on It's having a better framework to let Everyone work together on this thing So there might slowly be some progress Right and then I think in a sense As an outsider

Speaker A

In a sense I think OpenAI saved Google's life Oh because everyone used to worry This chatbot Would completely replace search Right if this really happened Google would actually be in a tough spot But fortunately OpenAI did this thing first

Speaker A

Then made Google realize this thing is important But it didn't take this thing all the way Didn't take this thing to the extreme Didn't completely kill off search Maybe just ate some market share As a result Let Google itself catch up on chatbots too

Speaker A

Now the one in a tough spot is them What if For example there's a company, just hypothetically In a fictional world A company not only made a chatbot But also marched forward triumphantly Doing better and better Really just ate up your search in one go

Speaker A

Completely didn't give you a chance to fight back Then it would be very tough Did the chatbot not eat up search Because OpenAI didn't do it well Or why Or because it can't kill off search I think Both sides actually have reasons

Speaker A

That is first um Current chatbot interaction methods Actually won't completely eat up search Because it's stronger than search Like we said earliest just now The one point it's stronger than search Is that it has strong interactivity You can follow up

Speaker A

And It can help you condense some very complex information This is where it's very strong So this portion of usage scenarios It will indeed steal people from search but There are still some very stupid scenarios in search Where you have a very simple thing

Speaker A

You don't want to waste this time On a chatbot Like I just I just search buy rice I search buy and it's done Just Do I have to ask ChatGPT Do I have to ask which one is good

Speaker A

And it's still spinning there Spinning for half a day Then gives you a link You click again Then go to the webpage to buy Right there's no need for that So from actual usage Its current form Is not enough to completely eat up search

Speaker A

Right and Of course from another perspective It might not have reached the peak in the chatbot thing either It really let Google catch up Now it's not quite caught up yet In terms of product I think in terms of product it's not caught up

Speaker A

But in terms of model it has already caught up But if you want investors to invest in OpenAI They would say When they placed their bet They recognized clearly OpenAI is actually a product company Its moat is actually product and brand

Speaker A

Then from today's perspective It seems Google hasn't been able to in this matter Catch up Can't say surpass OpenAI Catch up to OpenAI Right I think This is actually Anyway this is all from my perspective as an outsider

Speaker A

An observer's perspective You're a commentator today Hahaha From an observer's perspective I think Google has traditionally been a bit slow with products Has always been relatively slow And so 然后所以 Do you think OpenAI has an advantage when it comes to products?

Speaker A

I think it's possible. Right. And what's one thing Google is particularly good at? Finding an extremely simple product form.

Speaker A

Everyone looks the same. Then it just competes with you relentlessly on technology. And you can't outcompete it.

Speaker A

Oh right. That's exactly what Google is good at. Because search engines are exactly like that.

Speaker A

Search is a classic example. Everyone has the same search box. One button, but it just searches faster than you.

Speaker A

And more accurately than you. There's nothing you can do about it. Mm-hmm. So that's why.

Speaker A

Like. It feels like all along. Google has been in this state of doing very well, but...

Speaker A

Wall Street never really bought into it. Everyone always wondered where this company's moat really is.

Speaker A

There's no product ingenuity. No retention mechanisms either. But it has survived until now. So what's the reason its technology is so good?

Speaker A

I think it's still about the people, right? I think it's the culture. It's said to be.

Speaker A

A place that particularly, particularly values. In the past, it particularly valued engineers. Later, it particularly valued research.

Speaker A

That's the kind of culture. So it's very well suited for. Products where technological capability spills over.

Speaker A

Capability-based products. Right, if you look at it from this angle. Then do you think OpenAI's position is secure?

Speaker A

Now? I don't think anyone's position is secure right now. Hahahaha right. I think the form of AI.

Speaker A

Still has a long way to go. Mm-hmm. We're not at any endgame yet. That's the feeling about this.

Speaker A

Right. It feels like back home there's already a bit of this sentiment. Yeah, I don't get it.

Speaker A

Like, why don't I get it? I'm really puzzled. Like. So back home, people think we're fighting over a super app.

Speaker A

A super app is zero-sum, right? I think conditioned on the chatbot thing (taking the chatbot as the condition to build on) that's the super app.

Speaker A

Then maybe there's something to fight over. But the problem is. Is this form the super app form?

Speaker A

What if someone else. Comes out with a completely different form one day. And your functionality becomes a subset.

Speaker A

Of that thing. That's quite possible, right? I don't think there's anything. I don't see anything impossible.

Speaker A

Why wouldn't the chatbot be the ultimate form? But after all these years, this is all we've seen.

Speaker A

Right, it's all just a chat box. I think on this matter, I really don't have any.

Speaker A

Rational or quantitative criteria. To explain it. More like you just feel like this whole thing is stupid.

Speaker A

Like this model clearly has so many capabilities. But the way we use it is a chatbot (Note: This video was recorded over 2 months ago, when the agent paradigm was not yet clear).

Speaker A

It just doesn't quite make sense. You know what I mean, so. We need a product manager.

Speaker A

To unlock the model's capabilities. Hahaha. Humans have only communicated with AI through chatbots until now.

Speaker A

That seems stupid to you, right? It's stupid because. Then what should we use to communicate with AI?

Speaker A

Haven't figured it out. If I had figured it out, I'd already be doing it.

Speaker A

Hahahaha. Hey, you didn't tell me. What exactly changed inside Google. To lead to what the outside world saw.

Speaker A

The rapid leap in model capabilities. Right, like I just said, it's one thing. I think the organization has more clarity now.

Speaker A

And. Once the organization is clear. Did the organization change? Right. Especially pre-training. Has become very, very clear now.

Speaker A

That is who is responsible for what And every point Who is the responsible person at every node These things are very clear Was it chaotic before It was very chaotic in the earliest days I wasn't there in the earliest days

Speaker A

But according to colleagues Based on colleagues' or people I knew's descriptions It was still more chaotic before Mm-hmm right right And now At least pre-training has also become very very clear And plus This Google Has always had This relatively strong technical background

Speaker A

And it does things relatively systematically So I feel Pre-training at Google Is a very very controllable thing Mm-hmm predictable thing You can You can know The next generation won't be bad Oh you might even know how good it will be

Speaker A

Through Anthropic's top-down management it also Mm-hmm not bad Then Google is this bottom-up It's still bottom-up right It's definitely more top-down than before Compared to the earliest days But compared to Anthropic It's still more bottom-up Like different cultures can both work

Speaker A

Right right For model training Right that's I think big companies have big company ways Startups have startup ways So big companies are You also just said It's a completely different narrative It's a different Method, what is Google's method

Speaker A

Now I think Google more says Like this kind of relatively deterministic thing Like pre-training Is already a relatively deterministic paradigm Then maybe Google will be more like Making it into an engineering project Google's engineering management ability is very strong

Speaker A

So it can slowly do it well Mm-hmm what is an engineering project Engineering project means You are actually Actually very very Very top-down organization And very clear What we need to do in the next stage Then go do this thing

Speaker A

What nodes need to be handled in between And even doing research is like Having a very clear framework Telling you how to Verify whether your results are good or bad Evaluate whether your results are good or bad Right so this is

Speaker A

Something Google is very strong at In any big engineering project in the past So pre-training Actually I think has now entered Google's comfort zone And Post-training of course has more uncertainty Then maybe post-training currently Is still more bottom-up

Speaker A

Everyone can try more broadly You say pre-train is also a kind of RL Why do you say that I think it's It's hard to say from a pure technical perspective Pre-train is pre-training Or supervised learning What is the essential difference between SFT and RL

Speaker A

Because pre-training and SFT Of course pre-training and SFT are essentially not that different That is You just take the data you get As your ground truth Then you treat that as your expert Treat that as your expert output

Speaker A

Then you align toward the distribution of that expert output Reinforcement learning might be a broader level One level, it's saying first this This original output Is also not a given expert But something I produced myself And among them there are good results

Speaker A

And also bad results So you want good results to move closer to that and bad results to move away from it, something like that So in a sense pre-training and SFT are a subset of reinforcement learning But these two things do, in this era

Speaker A

have their differences Of course, for me the biggest difference lies in the data For pre-training data what matters more is having a good distribution The distribution needs to be broad enough, or aligned well enough with the scope you want to cover

Speaker A

But data quality doesn't need to be extremely high But for post-training it's the opposite In terms of distribution, it may be much narrower But for the data it does have the quality requirements are very high Yeah right So for now

Speaker A

for me the most fundamental difference between the two is still in the data distribution rather than in algorithms or training paradigms So how do different labs organize these teams?

Speaker A

Are pre-training and post-training different? Or are they the same? Anthropic and Google are pretty similar Both of them have one team for pre-training and another team for post-training OpenAI might be more chaotic In the early days initially they had three teams

Speaker A

They had pre-training and they also had reinforcement learning the Strawberry team and they also had a post-training team And my I never worked there but my understanding is its post-training wasn't really its RL team, Strawberry and its post-training

Speaker A

are actually what other companies call post-training and product Oh so they might have divided it in a different way and sliced it up They treat the later stages as product work As part of it their post-training is actually intertwined with product

Speaker A

they're building the product Is it just that the name hasn't been updated? Not entirely Because at most companies, the product team doesn't really train models anymore They mostly communicate the desired traits the model traits, to the team training the model

Speaker A

But it seems like their post-training is in a sense its own product team but it can also train models Is that because their understanding of product is that people who train models should also build the product Yeah yeah possibly

Speaker A

It could be a good thing Yeah, but their org has also changed a lot since then So I don't know what their org looks like now You guys have released several models recently and I saw you were involved in all of them

Speaker A

Gemini 3 Deep Think Gemini 3.1 Pro Well, I think I can only say that I was fortunate to be involved Hahaha yeah Again, it all feels like collective work Hahaha yeah How did you become such a public figure now

Speaker A

getting singled out and mentioned separately every time I don't get it I actually don't think it's great Every time I see it I feel like how am I going to face my colleagues in the office tomorrow Hahaha Does it feel awkward?

Speaker A

At the office it's fine I think my colleagues are just good people Like they probably don't care too much about these things But honestly I feel like every project I've been part of whether at Google or at Anthropic

Speaker A

It would happen even without me Would all happen the same Effect wouldn't wouldn't wouldn't Get worse I am I am I think everyone now is Everyone is a surfer Essentially it's a wave Not you the surfer Mm-hmm, is the wave AI

Speaker A

Right it's AI This thing itself is this wave It will move forward Whether you surf this wave or not This wave will crash on shore Just that some people might surf this wave Some people might be a bit late

Speaker A

Didn't catch the crest of the wave Okay You were fortunate to participate in these two projects What Mainly probably some some Those small details in algorithmic design Then we would Discuss together And Some Some things on the data side

Speaker A

But things on the data side I think Might have more impact on future work Do these models have paradigm changes Mm-hmm I don't think any No change is big enough to From not knowing how to do large-scale reinforcement learning

Speaker A

To large-scale reinforcement learning That level of change No change is big enough to that extent There are definitely some small changes Can you talk about these small changes These new models There are definitely some small changes Recently I feel models are already numb

Speaker A

A bunch of domestic models And many foreign models too OpenAI you all Mm-hmm domestic GLM, ByteDance DeepSeek has been expected but hasn't released yet Kimi can you highlight the key points for everyone I think In a sense None are that worth paying attention to

Speaker A

Hey what are people competing over now Feels like chaos I think some things people are competing over Actually looking at it now In this era Already not that important Because of inertia from the past Everyone would compete for first place on various Benchmarks

Speaker A

To prove their model's basic capability is strong This thing Actually by now it has reached Public attention Those Benchmarks are somewhat maxed out Actually think about it, earliest everyone paid attention to SWE-bench Randomly everyone hit 80-something Fortunately no one exceeded 83

Speaker A

Because recently OpenAI just released a post saying they exceeded 83 Some of those problems are not well-defined Fortunately no one exceeded it Whoever exceeds it would be embarrassed Anyway And before everyone reasoned by finishing AIME then IMO After IMO what

Speaker A

Can't think of RKGI and such Benchmark then RKGI Mm-hmm before Gemini 3 Everyone probably forgot the highest At that time maybe level 10 or so And everyone was like wow Hard as climbing to heaven Then Gemini 3 made it 30-something

Speaker A

Then Claude 4.5 or 4.6 became 4.6 should have become 60-something Then Gemini 3 Deep Think hit 80-something So this is also maxed out So now it feels like Just relying on hitting these publicly recognized model capabilities Actually doesn't have much meaning anymore

Speaker A

And um So from this perspective I just Essentially there aren't too many key points Although everyone is releasing very fast Mm-hmm Releasing fast also shows Actually this problem has become easy For everyone Everyone knows the know-how now There are no secrets anymore

Speaker A

Right right It's still this, it's still that It's still that same thing The surfing theory, right It's still this The wave is moving forward What's the next goal everyone might be looking for What's the next paradigm-level change Will there still be one

Speaker A

Ah, I think The two things I just mentioned are I think ML coding and long horizon, right And these two are I think, I think Um Yes yes I think it might be something that hasn't reached paradigm-level change

Speaker A

But I think it is Something very valuable for Google Because first of all, ML coding is Because Google itself is a major player in AI research And it's also the most full-stack in AI research That is Not only does it have these model training parts

Speaker A

It also has hardware design The part connecting hardware to models If this entire system can be accelerated Or better managed That could be very valuable for this company Long horizon goes without saying Everyone knows Everyone thinks it's very important

Speaker A

Right So I think that might be, for me Can't say it's paradigm-level Definitely not at the paradigm level But it's something I think is very valuable That needs to be able to, within the next few months Show some light at the end of the tunnel, and um

Speaker A

I think paradigm-level Might still be those more uncertain things Like multimodal generation, that kind of thing I think there might be a hero Or a group of heroes Haha, and um, right That kind of thing might have some

Speaker A

Um, also talked about a lot is continue learning（持续学习） What about world models I think continue learning and this kind of long horizon Just said there's no fundamental difference with long horizon Because um Because people used to think these two things were very different

Speaker A

It's because Continue learning changes some of the model's weights And when you do this kind of For example, like open Open source Everyone does a lot of this kind of This kind of context management（上下文管理） Doesn't change model weights

Speaker A

But actually, if you think about it, there's no fundamental difference between these two things Because those tokens in the context Their own KV cache is also a kind of weight, isn't it So You think between these two approaches, which one can

Speaker A

Which one will be more useful More useful in the long run I think it's unclear But essentially they Are both for doing what I just mentioned, long horizon This type of thing And world models Ten thousand people have ten thousand world models

Speaker A

What does that mean? The definition isn't clear That is First of all, I don't know what a world model is And secondly When everyone talks about the world models they're building They might be talking about different things For example, the world model that Gemini builds might be different from

Speaker A

For example, like Fei-Fei Li The world models they're building are not the same thing Um sigh Describe the difference I don't particularly understand what labs like Fei-Fei Li's What these labs are doing What it's actually like But um

Speaker A

Gemini's world model is more of a It's a kind of end-to-end（端到端） level of training The result it wants is that I can, for example For example, video generation Is that given a description Then generate a video But the result it wants to achieve is

Speaker A

Not only can I generate a video I am able to generate a scenario What is a scenario Scenario means I generate The state at this moment And then I can also give it a condition A condition This condition is that under this state I did some

Speaker A

What kind of Action And then its next moment state Will become a function of my previous moment State and action And it's end-to-end training this kind of capability Right so this might be one solution And I First I don't know

Speaker A

What result everyone ultimately wants And I also don't know what everyone's Definition of their own world model is So I think it's more of an exploratory state We haven't talked about one organization just now, xAI We just talked about Anthropic

Speaker A

Talked about OpenAI Talked about DeepMind What about xAI xAI I don't understand haha As a commentator let's talk about it Why are they so turbulent recently I think they've always been quite turbulent Hahaha why so turbulent recently I don't know either

Speaker A

And Actually I don't have that much contact with xAI And Some people I contacted have also left now Actually I don't know what happened to them Hahaha When you were talking about Anthropic just now You said The technical number one being able to make bets

Speaker A

Is very important Then at Google who is this number one Who is this hero I think heroes Might be different people at different stages Mm-hmm but behind every hero there is one person Sergey Brin Google's cofounder Oh right

Speaker A

I think ultimately many many big decisions Might not be decided by him on how to do them But in the end he has to be the one to make the final call Mm-hmm even now What about Demis Hassabis

Speaker A

I think the person who appears more on the front lines Is Koray Kavukcuoglu Right Yes DeepMind CTO And he's now also that Google SVP Oh what is Demis responsible for I think Demis might manage more of those Things leaning toward science

Speaker A

Like for example drug design Isomorphic Labs and such things Right right right Oh Gemini He doesn't manage much At least from my perspective The person I see more is Koray Of course it's possible that Company management matters Actually there are many parts I can't see

Speaker A

Then I'm not clear about that You also mentioned AI is a whole system Mm-hmm What understanding do you have about how to systematically do AI Now After these two years of your work Several aspects One aspect is from the whole system perspective

Speaker A

It needs a relatively scientific attitude That you need to clearly understand like Scaling Law You need to clearly understand What assumptions you have made And when I make a change What factors are actually related to it What factors are not related

Speaker A

Right And this is from the organizational perspective From the people's perspective Actually requires people to be very reliable Requires very responsible people Actually every system Every evaluation framework Is very easily hacked Because you can always do something To make your metrics look very good

Speaker A

But a trustworthy Or down-to-earth person He would actually think If the thing he did works well Is it really For example effective at large scales Did I miss some factors in between Right Actually doing things systematically Sounds like one sentence

Speaker A

But actually doing it is very complex There are many details Many resistances It actually goes against human nature Oh Because every individual's human nature Might be to make their own things Show up better But for a company or an organization

Speaker A

The most beneficial thing Is to make the entire company's system Very solid systematically This is actually the best for you personally Because once this system is solid You can leverage this system To produce more output But the bad thing is

Speaker A

This system will make your individual heroism Not shine But you can rest assured that others' individual heroism Also won't shine But if you are in a system Where individual heroism can shine Then this system might Not be particularly stable

Speaker A

Because one person leaving Might cause the entire thing to collapse For example like OpenAI You say you love to challenge difficult things But this industry seems to require Doing simple things well repeatedly Actually I think the so-called simple things

Speaker A

Doing them well repeatedly Is actually a very difficult thing Because human nature doesn't like Doing repetitive things Because the most difficult thing in this industry Is actually doing simple things cleanly Why Because everyone can do simple things If you can't do them cleaner than others

Speaker A

需要研究员自己对于这个系统怎么运作有一个好的理解然后以及对公司负责任才能做到否则就是你很容易做到一件事就是你可能比如说你在考虑training的时候是比别人好的但你考虑training加sampling时候比别人差你总可以选择你只是有training 但这就很糟糕对所以这个就是既需要你个人的负责任又需要说组织所建立的这个体系里能够能尽量的发现这些有意的或者无意的这种边界的事情但是你作为个体的话你不知道怎么样是对全局最好的呀其实是需要我觉得如果一个研究员做不到对全局去考虑的话他就不是一个好的研究员在现在这个时代嗯就是这个我觉得这个和你就是在学术界做research 是很不一样的事哦因为在学术界做research 本质上是一个人吃饱全家不愁的状态这我为我的项目负责对吧我为我的可重复性负责但是在一个公司里你其实更多的时候是我得为这个公司负责这是两种完全不一样的心态那你这种自觉性从哪里来的

Speaker A

不知道哈哈哈哈哈我觉得我可能就是拉不下脸哈哈哈拉不下脸是什么就是你对一个公司负责任是你和这个公司的契约的一部分其实我觉得没什么道理不这么做这么做是没有原因的所以这个人英雄主义会破坏这种整体性我觉得 If you're just doing it for personal heroism and acting on that basis it's very likely to undermine the bigger picture Of course, in reality you might be very capable

Speaker A

and you actually become a hero that's also possible Since you've also been through two organizations what kind of organization do you think is better at fostering intelligence in this era I think this is actually a very controversial topic

Speaker A

I mean as we were just discussing different organizations some tend to be more top-down some more bottom-up so the natural question is for example which of these two types fosters more innovation The traditional view was bottom-up was a necessary condition for fostering innovation

Speaker A

because everyone needs freedom, right only with freedom can there be innovation But purely bottom-up you find it doesn't actually work either because it just becomes chaotic That's what Google was like before Was it?

Speaker A

Yes At least in my impression from what I understand, that's how it was It was just chaotic People didn't even know what the point of what I was doing was That might not be great either So you probably need someone

Speaker A

or a small group who can blend these two approaches somewhat Mm-hmm That's why I think whether an organization runs well or not it looks like an organizational issue but ultimately it comes down to the tech leader Mm-hmm It's about whether this tech leader has the qualities

Speaker A

to keep the organization running stably Because the optimal state is often the most unstable one It easily collapses toward a worse state Right, so you need a leader to control that So do you think it should always be the tech leader doing this

Speaker A

rather than the CEO Well of course every company's CEO may have different responsibilities But there needs to be a leader I think you need at least one leader who has two qualities to be able to do this One quality is that they can fight fires themselves

Speaker A

It's not just talking about what to do What to do What to do but rather when something really runs into trouble they can step in and lead the team to solve the problem Of course most of the time

Speaker A

a leader probably won't have time to do this But at least they have the capability The second important quality is that they need to understand others Even if it's something that they wouldn't do themselves they can understand why what others are doing matters

Speaker A

They can tolerate and accommodate others That might be another quality What do you think about Google's TPU In what ways does it outperform GPUs What are its weaknesses I think From a purely hardware perspective it's hard to say which hardware is truly better or worse

Speaker A

especially at this kind of large-scale commercial deployment Because fundamentally GPUs and TPUs In terms of usage the biggest difference, setting aside the hardware differences in terms of usage the biggest difference is GPUs have a better open-source ecosystem TPUs don't

Speaker A

But this actually isn't an issue at large-scale commercial deployment It's not a problem Because for example, Google itself uses TPUs so naturally they'll spend time building this infrastructure And infrastructure is For example, if you're only running a thousand cards

Speaker A

it could be a heavy burden But if you're running a cluster of hundreds of thousands of cards then building out the infrastructure isn't really that big of a deal And in practice So basically when it comes to large-scale commercial deployment

Speaker A

neither one is inherently superior or inferior But these two do have some differences in design philosophy Take GPUs, for example At least for the more recent GPU generations I haven't used them much Like the Hopper generation of GPUs

Speaker A

The H-series GPUs The design philosophy is that inside one pod (node) there might not be that many cards say, just eight cards and these eight cards can all interconnect with one another NVLink (NVIDIA's high-speed interconnect bus) is extremely fast

Speaker A

So within one pod, there's basically no communication bandwidth bottleneck (insufficient bandwidth between GPUs) But TPUs take the opposite approach It means that they've abandoned pairwise interconnection between cards but they try as much as possible to fit as many cards as possible

Speaker A

into one big rack It has this kind of 3D Torus design (3D Torus topology design) So each card only connects to its three nearest neighbors in three directions but the entire cluster can be connected into one big Torus

Speaker A

And if your compilers (compilers) or your sharding (data sharding strategy) logic is written well enough you can take advantage of this architecture Effectively speaking you get more memory capacity and also reduce a lot of communication bounds What's the downside?

Speaker A

I think one downside is that compared to GPUs, it definitely at least at a small scale is more of a rigid structure So its ease of use or its general versatility might not be as strong Recently many neo labs have emerged in Silicon Valley

Speaker A

What do you think of this trend? Why are they all leaving jumping ship from these big model companies to start neo labs I don't really get it Haha, my feeling is that the vast majority of neo labs will die. And

Speaker A

Well, I think some labs genuinely have good people And some labs might actually be starting to do some real work For example, like Thinking Machines is still delivering some new things But some neo labs Please bleep out the names

Speaker A

Haha, like XXX, that XXX I have absolutely no idea what they're trying to do And These two have actually been away from the field for a long time I think in 2026 China will place a lot of emphasis on the consumer-side narrative

Speaker A

Who becomes that super app What do you think? Do you think this It seems like nobody in Silicon Valley talks about this Right, because American enterprise is just...

Speaker A

It's companies Or rather, the productivity software market is just too big and the profit margins are too high So for the US there was basically only ChatGPT doing consumer before and there wasn't much money in it Not much profit

Speaker A

So now everyone will probably focus first on productivity software or enterprise And So the trends in China and the US have already diverged I think Not just AI The entire internet industry in the past was like this too

Speaker A

It was all different What China is really strong at is the consumer side It can come up with, like really, really complex product features or structures and in a way that seems very indirect to you In a very unnatural way

Speaker A

to snowball that profit For example What do I mean by indirect? (laughs) Like, take something like Douyin (TikTok) It's not like you watch a video and I charge you 20 cents per video, right?

Speaker A

It says you can watch videos for free but I can quietly slip in ads I can quietly do live streaming I can quietly do e-commerce But that doesn't work for productivity software Productivity software is very straightforward Like, I help you write code

Speaker A

My cost is 150 a month I sell it to you for 200, I make 50 It's that straightforward Mm Yeah, I think what the US has shown in the past is that with these very straightforward products it can push technology to the extreme

Speaker A

But there's never been a product that felt so sophisticated that you can't live without it yet you don't feel like it's taking your money but it's actually making money from you Hearing you say that, I suddenly feel Meta should just copy ByteDance

Speaker A

Yeah, but I don't think Meta is as strong as ByteDance Because Meta can't find its own niche either And there's no American company doing this No one has found the niche that Doubao occupies Then Meta should just copy Doubao

Speaker A

It doesn't need such strong model capabilities either But I still think the Americans making products fundamentally, the people doing consumer products aren't good enough Far behind China This is the accumulation of the past decade, right?

Speaker A

Yeah Mm Because the positive feedback loop in the US over the past decade all came from doing B2B A lot of enterprise stuff Or it's just too easy to make money in the US Mm When it's too easy to make money

Speaker A

you won't rack your brains over how to make money Hey Haven't a lot of people come to chat with you?

Speaker A

Any interesting people? Oh, well A lot of people from China came Tech companies I think they're all pretty interesting And I did find that Chinese people doing products probably think in more sophisticated ways More sophisticated Yeah, they think more...

Speaker A

Their thought process is more convoluted Yeah, it's a completely different style from the US America is like As I just said about America It's like you build something and sell it directly Yeah, it's simple That's how it is

Speaker A

You just need this capability Once you have it, you just need to be cheaper than others Then I can earn more than you And you can't do anything about it Okay What about China?

Speaker A

China seems to be all about this pattern Not making money at first But once it starts making money you can't stop it It's just that it can really form that that self-sustaining that loop When it really gets that flywheel spinning

Speaker A

you can't break in anymore Do you think American companies understand ByteDance now? My feeling is no Not yet It's already so big Oh, you mean whether they take it seriously?

Speaker A

Of course they do Everyone definitely knows ByteDance is a severely undervalued In terms of its valuation It's a severely undervalued company I think that's very clear to everyone And I think it's also clear that in the consumer market

Speaker A

On this end, I actually think No American company can compete with ByteDance But after all it's a Chinese company At least in terms of public perception After all it's a Chinese company So do people understand it I don't think people understand it

Speaker A

But look at Meta It's also actively poaching people from ByteDance Mm-hmm, do you have any idols in the AI industry Or people you admire Although you've been in the AI industry for a short time No no no, nothing

Speaker A

I just feel When I came to this industry The era of individual heroism had already passed So there are no heroes Sometimes you even think old-era heroes are a bit stupid Ah right So really there's nothing Who do you think is quite stupid

Speaker A

Let's not talk about this No comment hahahaha Right, I think it's Different from doing physics I think when doing physics There were still some People I think really much smarter than me Like me When I was doing my PhD my young advisor was

Speaker A

I think he, Douglas Stanford I think he's just much smarter than me I think he Maybe also seeing him Made me feel in that field Not very useful With him around what do they need me for Right haha

Speaker A

You came to AI to do a dimensionality reduction attack right Not a dimensionality reduction attack But anyway it feels like AI this thing But anyway it feels like AI this thing Doesn't really need brains Doesn't really need brains

Speaker A

Really doesn't need brains Then what does it need I think this The most important trait in this industry Is being reliable Doing things carefully And being responsible for what you do This is the most important trait You say how much brains those things need, I think

Speaker A

They're all things undergraduates can do But you say AI has no individual heroism Now an AI researcher is priced so high Like a star player transfer I don't know if it's a good thing or bad thing For me personally

Speaker A

Of course I'm very happy I benefit from this Right hehehe But um Actually speaking I don't know if this thing Is a good thing Why do you think the price has become so high I think maybe on one hand

Speaker A

Everyone thinks this thing is scarce But actually it might not be that scarce Because training a person Although this thing isn't that hard But training a person requires an environment You need to have that opportunity to be exposed to this thing

Speaker A

To learn this thing Without that opportunity No matter how smart you are it's useless Maybe in the past people who could encounter that opportunity Weren't that many So in the market it might be relatively scarce From this perspective

Speaker A

Mm-hmm But I think another aspect is also Maybe the hype about people is a bit excessive Right Really like to mythologize individuals Now Right I think Really Just say it again This is a collectivist thing haha Then many people are also very curious

Speaker A

Because Maybe many companies also want to recruit AI people Then you think the most important thing is still being reliable What metrics are there for this How can you quickly judge whether a person is reliable Whether they do things carefully

Speaker A

Everyone has some methods they use to measure I of course also have some of my own tricks It's just that I I used to design an interview question Let me briefly explain it This It shouldnt be confidential So I should be able to talk about it

Speaker A

Um So the interview question is actually quite simple I need this person to, within 24 hours, complete a reinforcement learning project from scratch They have to choose on their own what kind of model I tell them what resources are available

Speaker A

and they choose what model to use what data to use what algorithm to use and train the model Within 24 hours I give them 24 hours to get this done And after the 24 hours are up they'll have a one-hour discussion with me

Speaker A

So this thing isn't that hard in the AI era Without AI this would be impossible No one could do it in 24 hours But with AI, it's actually quite easy Because AI can do the whole thing for you

Speaker A

But why still do this? There are two reasons There are many reasons Among them Two reasons why it was designed this way One reason is that I think in this era, evaluating someone evaluating someone like whether they write good code

Speaker A

is actually useless Because most people don't need to write code themselves anymore What's more important is whether they can effectively leverage AI So that's one aspect of evaluating this The second aspect is that there's a trap here If you let AI do everything

Speaker A

but you don't really try to understand what AI did for you you'll be exposed during that one-hour discussion That's a That's where people fail So the other thing this tests is whether you've truly formed a collaboration with AI

Speaker A

Or if you just completely handed it off That's something I personally value very much That also reflects whether this person is someone reliable Of course, this The design of this question itself also has some rather dark cleverness to it

Speaker A

Like why it was designed as 24 hours is to see how much this person values this opportunity Can they stay up all night Right hahaha If they're willing to pull an all-nighter they can survive these 24 hours If they can't make it

Speaker A

then it just means they probably don't value this opportunity that much Haha So for people younger than you Do you think AI is still a blue ocean a place with lots of opportunities I think purely working on language models

Speaker A

is no longer a blue ocean I think it's too late — the last train has already left The last train has already left Which last train is that?

Speaker A

I feel like I got in on that last train And there might have been some people after I got in some new people But I think they won't have the opportunity to encounter such good opportunities Like being able to

Speaker A

do something in a relatively small team Chances to encounter such opportunities will be rare Right, and then But I think AI is a very vast field Language models are just a tiny, tiny part of it A very small part

Speaker A

There are many other things Like the multimodal generation we just mentioned There may still be many opportunities there Robotics probably has even more opportunities And even more extreme, there's like whether you can use AI to help with real scientific problems

Speaker A

Like helping with quantum control and things like that Then it might be more blue ocean Those are all blue sky things Right so I think for People young enough Maybe doing the hottest thing right now Is not the right choice

Speaker A

Doing things no one has done now Might be more of a good choice Right How will you develop in the future Will you be at Google for a long time I think probably not Hahahaha Saying this so publicly

Speaker A

I think probably not I think I will still try to challenge myself Right and Need to torture myself Right need to torture myself But I just might need to find something Worth torturing myself for If AI is not fundamentally difficult

Speaker A

Won't you find it boring Where is your challenge Although it's not difficult But knowing and not knowing There is still a gap From completely not knowing the details To slowly understanding the details Understanding how it works and such

Speaker A

These things I think still require spending time and effort And after you understand I think this thing will also be helpful for your future Like whether you do product related Or develop toward other AI directions I think all

Speaker A

In the long term Will be helpful Where do you want to develop in the future I think anything is possible Haha haven't figured out how to torture myself You probably won't jump to another big company again Probably not

Speaker A

Mm-hmm What differences do you feel between what you learned at Anthropic And what you learned at Google DeepMind I think they're quite different I think Anthropic Is where you can understand one thing One line, language model Every aspect of this line very thoroughly

Speaker A

It gives you that opportunity And at Google It's more horizontal It has many different aspects Many different people And you can also see different perspectives Also see different research directions You can see all of them Right Anthropic is because it bets firmly enough

Speaker A

So you can understand more vertically Right Have you thought about using AI to solve physics problems (Your theoretical physics) Someone is doing it So I don't think I need to do it haha You don't have essential interest in this

Speaker A

I think this thing First Currently it's not the highest priority for me I think if one day I think I solve the highest priority thing on my hands And haven't found anything else to do I might go do this thing

Speaker A

What is your highest priority now My highest priority now is To push the two things I just mentioned Oh ML coding and long horizon To at least a Where colleagues can Push it to a relatively I think relatively stable state

Speaker A

That I think is my highest priority Of course there might be other priorities later But Using AI to do physics I think is something Many people are already trying to do One more of me is not too many

Speaker A

One less of me is not too few Might as well let others do it first Do you have any physicists you particularly admire Not really Yes, but there are quite a few Don't know where to start Hahahaha Physicists yes

Speaker A

AI scientists No But this is related to a person's growth experience I think Like An adult finds it hard to truly worship a person A child might Who have you worshipped I think in physics Actually there are many who are really quite strong

Speaker A

But those everyone talks about People from 100 years ago let's not talk about Like Einstein Heisenberg and such let's not talk about And including everyone later knows Like Frank Yang Chen-Ning Yang and such let's also not talk about

Speaker A

And Like when I was doing topology before Actually there was someone who later also won the Nobel Prize That Haldane You'll find these people Have some abnormal foresight They seemed out of place in their era But look at Haldane

Speaker A

When he first did Haldane model and these fractional Quantum Hall effect related things It was decades away from when everyone finally figured out these topological states Many decades later Mm-hmm At that time he could feel this thing was important

Speaker A

And kept pushing this thing himself I think this is not easy Of course I think If you really want to find a similar person in AI I think maybe Geoffrey Hinton When everyone felt this thing Was optional or not that certain

Speaker A

He kept working in this direction Then I think This might be a hero-level figure After him AI after that I think I think there might also be some heroic collectives Like for example Transformer Noam And those That Ashish

Speaker A

Niki and them That might be a heroic collective You said something that made a very deep impression on me I don't have any mentors in this industry Don't have any old friends I can criticize whoever I want This might be the benefit of not doing AI

Speaker A

Hahaha the benefit of not coming from AI Right like Really have no burden No old-timer is your relative So if you think he's stupid He is stupid Can just say he's stupid directly It doesn't matter Were you like this before too

Speaker A

I think I was quite restrained when I was a student Oh But later I found restraint useless No benefit to myself No benefit to others either Better to be more direct Expressing your own ideas is the most critical thing

Speaker A

I think directly expressing your own ideas Is something where in the short term people will definitely hate you But in the long term everyone will appreciate Who have you heard speaking particularly stupidly recently Bleep out that name Thank you I think XXX has always been quite stupid, haha

Speaker A

And consistently stupid, haha Could he possibly be the right person I think what he says In Pauli's words is not even wrong Because it's not well-defined It's hard to say whether what he says is right or wrong Right, like one day

Speaker A

Maybe a different paradigm happens He can jump out and say hey I said this this this this back then But then you discover Maybe if the paradigm were another state He could also say the same thing This is why I hate this kind of very vague

Speaker A

Very vague people Because a thing being vague is meaningless Why do you think he speaks very vaguely No correct definition Like It's kind of ambiguous If it has a proper definition I can explain why it's properly defined But if it doesn't have a proper definition

Speaker A

I have no way to explain Why it isn't properly defined Because it really isn't properly defined Hahaha What about XXX I think at least I think XXX is still a well-defined thing Like, it's trying to do XXX And their approach might lean more toward this

Speaker A

More traditional kind of This neural network model approach Rather than a more end-to-end approach I think at least it's well-defined As for whether it's right or wrong I think that's something the future will test Most old geezers are actually fine

Speaker A

I think I think when people get old They don't necessarily turn into old geezers When people get old, they split into two types One type is the venerable elder They might stop nitpicking so much And actually put effort into mentoring young people

Speaker A

The other type is the old geezer They don't know what they're talking about Yet love to nitpick and boss people around Yeah, so getting old doesn't necessarily make you an old geezer Hey, who got you all riled up

Speaker A

I don't even know who got me riled up But I've definitely met plenty of old geezers Hahaha When did you change Like, becoming so direct when you speak You stopped holding back—you've always thought this way But you didn't say it

Speaker A

I think in the past I might have been pretty direct too But not this direct But after getting into AI, I became even more direct So it's like nothing holding you back, right One, there's nothing holding me back

Speaker A

Two, this field is objective enough Like You don't really have to worry too much About offending people with your opinions As long as your views are internally consistent Like, you have a coherent framework for your views You're not just randomly trashing people

Speaker A

That would definitely offend people You have your own understanding of things I think people will actually respect you for it Because ultimately, how well you do in this field Is judged by objective standards Every guest we have recommends a life-changing book

Speaker A

It has to be a book that genuinely had a major impact on you What book would you say This is the hardest question of the day I feel like you're overestimating my cultural sophistication Hahahahaha Honestly, I don't really have a life-changing book

Speaker A

Okay, I read a book recently Recently Last time Ji Yichao mentioned 'The Line Puppy' The book I recently read is Yukawa's autobiography Hideki Yukawa's (1949 Nobel Prize in Physics winner) autobiography 'Tabibito' (The Traveler) And then If I had to say, books that left an impression

Speaker A

First of all, I genuinely don't like reading I feel like I'm not very well-read And the books I read Other than professional ones All feel like leisure reading to me Like Yukawa's autobiography It's essentially leisure reading too But I found it quite interesting

Speaker A

Like You get to see A scientist who later seemed so successful Struggling in his youth Very authentic And then maybe some other leisure reads Like novels and stuff There's a novel I really like 'From the New World'—it's a Japanese novel

Speaker A

Yeah, if you really force me to recommend some leisure reading I could recommend that one Have you watched any movies or anything lately TV shows, or played any games Nothing at all Hahaha A favorite food from anywhere in the world

Speaker A

Sushi probably A favorite place anywhere in the world A favorite place anywhere in the world I—I think if you really force me to choose I'd probably choose Hawaii Because I really love the ocean Yeah, but it's hard to say for sure

Speaker A

Because after I visit more coastal places I might have a new favorite Hahaha Something not many people know But probably should Don't trust old timers, does that count? Hahaha Have you ever been superstitious?

Speaker A

Hmm I I haven't, fundamentally But I think Sometimes superstition can be a way to comfort yourself I meant, have you ever been superstitious about old timers?

Speaker A

Oh, superstitious about old timers Never? Really never But I probably didn't hate old timers this much before Then I started hating them more and more Why?

Speaker A

Maybe it's just that When you develop more judgment of your own Stupid people just look even stupider But they haven't hurt you So why hate them?

Speaker A

It's just stupidity intolerance Everyone has stupidity intolerance Hey, what's your MBTI? No idea Why has there been, in recent years, I mean, among young people Toward older people Such an unfriendly term emerging?

Speaker A

Where does it come from? No idea No no no Haven't looked into it Could ask Gemini Have it do a Deep Research See where the term "laodeng" comes from So what are the papers that have influenced AI progress the most, in your mind?

Speaker A

Sequence-to-sequence is one And then that I think language models At the peak of the feature engineering era And then Scaling Laws is one The one by Jared Kaplan Their Scaling Laws paper at OpenAI is also one It's a paper that introduced this systematic research methodology

Speaker A

Into the field A paper Of course, the actual methods in Scaling Laws May not have been exactly right But it was the first To introduce this idea I think that's crucial Based on your current understanding What's a key important bet?

Speaker A

Long horizon (long-horizon tasks) Hahaha Our studio is called Language is World Studio When you first heard this name What were you thinking?

Speaker A

I think this name is a bit... Too normal, too mediocre Hahahaha, fair enough, hahahaha I think this name is something that Maybe ten years ago Was a very unique perspective But now there's just too much consensus I think ten years ago it really was

Speaker A

Maybe it's been more than ten years now Sorry, I feel like I'm getting old too Maybe it's been more than ten years Like around 2014, 2015 Everyone thought vision was the most important thing Back then I think realizing

Speaker A

That language is an important carrier of intelligence Was probably something different But I don't think our name Was meant in an AI context Hmm Hmm Hahaha Well then that's worth deep thought, hahaha

Topics:Yao ShunyuAI model trainingAnthropicGemini AIGoogle DeepMindSilicon Valley AIAI industry challengesAI benchmarkingAI research traitsAI development strategies

Frequently Asked Questions

Who is Yao Shunyu and what is his background?

Yao Shunyu is a researcher at Google DeepMind with a background in theoretical physics. He studied at Tsinghua and Stanford before transitioning to AI, working at Anthropic and Gemini.

What are the main differences between the two Yao Shunyus in Silicon Valley?

One Yao Shunyu has a computer science background and worked at OpenAI and Tencent, while the other, featured in this video, came from physics and worked at Anthropic, Gemini, and DeepMind.

What is the current stage of AI development according to Yao Shunyu?

Yao Shunyu believes AI has entered a stage where the focus is less on whether AI can perform tasks and more on defining the right problems to solve, with model capabilities becoming more commoditized.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

App Store Google Play

Or transcribe another YouTube video here →