Building Anthropic | A conversation with our co-founders — Transcript

Anthropic co-founders discuss their AI journey, safety focus, and innovations like constitutional AI in building safer, scalable language models.

Key Takeaways

  • Long-term collaboration and shared vision among founders fueled Anthropic’s creation.
  • Scaling language models and safety research are deeply interconnected.
  • Concrete, practical framing of AI safety helped gain broader acceptance in the field.
  • Constitutional AI represents a novel, promising method to align AI behavior with human values.
  • Simplicity and principled approaches remain central to effective AI safety solutions.

Summary

  • The co-founders share their personal motivations for working on AI, including transitioning from other fields and early collaborations.
  • They recount their long-standing professional relationships formed at Google Brain and OpenAI, spanning over a decade.
  • The discussion highlights the importance of scaling laws and language models like GPT-2 and GPT-3 in advancing AI capabilities.
  • Safety is emphasized as a core motivation, particularly ensuring AI systems understand human values and can communicate effectively.
  • The concept of Reinforcement Learning from Human Feedback (RLHF) is explained as a key technique intertwined with model scaling.
  • They describe the 'Concrete Problems in AI Safety' paper as a foundational effort to ground AI safety research in practical machine learning.
  • The paper also served as a consensus-building political project to legitimize AI safety concerns within the research community.
  • The founders reflect on the early skepticism and eventual acceptance of safety-focused approaches in AI development.
  • Constitutional AI is introduced as an innovative approach where AI behavior is guided by a written constitution, leveraging AI’s ability to follow multiple-choice style prompts.
  • They emphasize the power of simple, principled methods in AI and the ongoing commitment to safety and scalability at Anthropic.

Full Transcript — Download SRT & Markdown

00:00
Speaker A
Why are we working on AI in the first place? I'm just going to arbitrarily pick Jared.
00:07
Speaker A
Why are you doing AI at all?
00:08
Speaker B
I mean, I was working on physics for a long time and I got bored.
00:11
Speaker B
And I wanted to hang out with more of my friends, so.
00:14
Speaker C
I thought Dario pitched you on it.
00:15
Speaker D
I don't think I, I don't think I explicitly pitched you at any point. I just kind of like showed you results of like AI models and then and and was trying to make the point that like they're very general and like they don't only apply to one thing and then like just at some point after I showed you enough of them, you were like, oh yeah, that seems like it's right.
00:32
Speaker C
How long had you been a professor before like when you started?
00:34
Speaker B
I think like six years or so.
00:36
Speaker B
I think I helped recruit Sam.
00:38
Speaker E
I talked to you and you were like, I think I've created a good bubble here and like my goal is to get Tom to come back.
00:44
Speaker A
And did you meet everyone through Google when you were doing the interpretability stuff, Chris?
00:48
Speaker F
No, so I guess I actually met a bunch of you when I was 19 and I was visiting the Bay Area for the first time.
00:54
Speaker F
So I guess I met uh uh Dario and Jared then and and I guess they were postdocs, which I thought was very cool at the time.
01:00
Speaker F
And then I was working at Google Brain and Dario joined and we sat side by side actually for a while.
01:06
Speaker F
We had desks beside each other.
01:08
Speaker F
And I worked with Tom there as well.
01:10
Speaker F
And then of course, I got to work with all of you at Open AI when I when I went there.
01:16
Speaker F
So I guess I've known a lot of you for like more than a decade, which is kind of wild.
01:20
Speaker A
If I remember correctly, I I I met Dario in 2015 when I went to a conference you were at and I tried to interview you and Google PR said I would have read all of your research papers.
01:28
Speaker D
Uh yeah, I think I was writing concrete problems in AI safety when I was at Google.
01:33
Speaker D
I think you wrote a story about that paper.
01:35
Speaker A
I did.
01:36
Speaker E
I remember right right before I started working working with you, like we had a I think you invited me to the office and I had to like come chat and just like tell me everything about AI and you like explained.
01:43
Speaker E
I remember afterwards being like, oh, I guess this is I guess this stuff is much more serious than I realized.
01:49
Speaker E
And you were like probably explaining the big blob of compute and like parameter counting and how many neurons are in the brain and everything.
01:54
Speaker F
I feel like Dario often has sort of effect on people, this is much more serious than I realized.
01:58
Speaker D
I'm the bringer of happy tidings.
02:00
Speaker A
But but I remember when we were at when we were at Open AI, there was for for scaling law stuff and just making things bigger and it started to feel like it was working.
02:07
Speaker A
And then it kind of kept on eerily working on a bunch of different projects, which I which I think is how we all ended up working closely together because it was first GPT2 and then scaling laws and GPT3 and we ended up.
02:13
Speaker D
We're at the blob of people that were making things work.
02:16
Speaker B
Yeah. I think we were also excited about safety because at that era there was sort of this idea that AI would become very powerful but like potentially not understand human values or not even be able to communicate with us.
02:20
Speaker B
And so I think we were all like pretty excited about language models as a way to kind of guarantee that AI systems would have to understand kind of implicit knowledge that.
02:29
Speaker D
And and RL for from human feedback on top of language models, uh which was the whole the whole reason for scaling these models up was that, you know, we couldn't do we couldn't the models weren't smart enough to do RLHF on top of.
02:36
Speaker D
So that's the kind of intertwinement of safety and scaling of the models that we, you know, still believe in today.
02:38
Speaker F
Yeah. I think there was also an element of like the scaling work was was done as part of the safety team that, you know, Dario started at Open AI because we thought that forecasting AI trends was was important to be able to have have us take them seriously and take safety seriously as a problem.
02:49
Speaker A
Correct.
02:50
Speaker A
Yeah, I mean we took I remember being in some airports in England sampling from GPT2 and using it to write fake news articles and and slacking Dario and being like, oh, this this stuff actually works and might have like huge policy applications.
03:00
Speaker A
I think Dario said something like, yes.
03:04
Speaker G
Yes.
03:05
Speaker A
In his typical way.
03:06
Speaker A
But then we worked on that a bunch as well as for release stuff, which was kind of wild.
03:12
Speaker H
Yeah, I remember the release stuff.
03:14
Speaker H
I think that was when we first started working together.
03:16
Speaker A
Yeah.
03:16
Speaker H
That was a fun time.
03:18
Speaker A
The GPT2 launch.
03:19
Speaker A
Yeah. But I think it was good for us because we did we did a kind of slightly strange safety-oriented thing all together and then we ended up doing Anthropic, which is a much larger, slightly strange safety-oriented thing all together.
03:26
Speaker F
Well, so I guess just like going back to the concrete problems because I I remember so I joined Open AI like 2016, like one of the first 20 employees or whatever with with you, Dario.
03:30
Speaker F
And I remember at that time the like concrete problems in AI safety seemed like it was like the first like mainstream like AI safety.
03:38
Speaker D
Yes.
03:38
Speaker F
Paper. I guess I I don't really know if I ever asked you what the story was for like how that came about.
03:42
Speaker D
Um Chris knows the story because he was he was involved in it. I think, you know, we were both at Google.
03:48
Speaker D
I forget what other project I was working on, but like with many things, it was my attempt to kind of procrastinate from whatever other project I was working on that I now completely completely forgotten what it was, but uh uh I think I think it was like Chris and I decided to write down what are some open problems in terms of AI safety and also AI safety are usually talked about in this very kind of abstruse abstract way.
03:55
Speaker D
Can we kind of ground it in the ML that was going on at the time?
04:00
Speaker D
I mean, now there's been like, you know, six, seven years of work in that vein, but it was almost a strange idea at the time.
04:09
Speaker F
Yeah, I think I think it was a way in which it was almost a like a kind of political project where at the time, um, a lot of people didn't take safety seriously.
04:15
Speaker F
So I think that there was sort of this goal to collate a list of problems that sort of people agreed were reasonable, already often already existed in literature and then get a bunch of of people across different institutions who are credible to be authors.
04:22
Speaker F
And like I remember I had this like whole long period where I just talked to like 20 different researchers at Brain to build support for publishing the paper.
04:30
Speaker F
Like in some ways if you look at it in terms of the problems and a lot of the things that emphasized, I think it, you know, hasn't held up that well in that it's, you know, I think it's not really the the right problems.
04:39
Speaker F
But I think if you sort of see it instead as a consensus building exercise that there's something here that is real and that is worth taking seriously, then it was a pretty important moment.
04:49
Speaker A
I mean, you you end up in this really weird sci-fi world where I remember at the start of Anthropic, we were talking about constitutional AI. And I think Jared said, oh, we're just going to write like a constitution for a language model and that'll change all of its behavior.
05:00
Speaker A
And I I remember that sounded like incredibly crazy at the time.
05:03
Speaker A
But why did you guys think that was going to work? Because I remember that was one of the first early like big research ideas we had at the company.
05:08
Speaker B
Yeah, I mean, I think Dario and I had talked about it for a while.
05:10
Speaker B
I guess, uh, I think simple things just work really, really well in AI. And so like, I think the first versions of that were like quite complicated, but then we kind of like whittled away until like just use the fact that AI systems are good at solving multiple choice exams and like give them a prompt that tells them like what they're looking for and and that was kind of kind of a lot of what we needed.
05:20
Speaker B
Um, and then we were able to just write down these principles.
05:22
Speaker D
I mean, it goes back to like the big blob of compute or the bitter lesson or the scaling hypothesis.
05:29
Speaker D
If you can identify, you know, something that you can give the AI data for and that's kind of a clear target, you'll get it to do it.
05:36
Speaker D
So like, here's this, here's this set of instructions, here's this set of principles.
05:44
Speaker D
AI language models can like read that set of principles and they can like compare it to the behavior that they themselves are engaging in. And so like, you've got your training target there.
05:57
Speaker D
So once you know that, I think my view and Jared's view is, there's a way to get it to work.
06:02
Speaker D
You just, you just have to fiddle with enough of the details.
06:05
Speaker B
Yeah. I think it was always weird for me, especially in these early eras because like I was in physics and then coming from physics.
06:12
Speaker B
And I think now we we forget about this because everyone's excited about AI, but like I remember talking to Dario about concrete problems and other things and I just got the sense that AI researchers were very, very kind of psychologically damaged by the AI winter where they were they just kind of felt like having like really ambitious ideas or ambitious visions was like very disallowed.
06:20
Speaker B
And that's kind of how I imagine it was in terms of talking about safety.
06:25
Speaker B
In order to care about safety, you have to believe that AI systems could actually be really powerful and really useful.
06:33
Speaker B
And I think that like there was kind of a prohibition against being ambitious. And I think one of the benefits is that physicists are very arrogant and so they're constantly doing really ambitious things and talking about things in terms of grand schemes and so, uh, yeah.
06:40
Speaker D
I mean, I think that's I I I think that's definitely true.
06:45
Speaker D
Like I remember in 2014, it was like, there were just like, I don't know, there were just like some things you couldn't say, right?
06:50
Speaker D
But but I actually think it is kind of an extension of problems that exist across academia, other other than maybe theoretical physics, where they've kind of evolved into very risk-averse institutions for a number of reasons.
07:00
Speaker D
And even the industrial parts of AI had kind of transplanted or forklifted that mentality. And it took a long time.
07:09
Speaker D
I think it took until like 2022 to get out of that mentality.
07:11
Speaker F
There's there's a weird thing about like what does it mean to be conservative and respectful, where you might think, um, like one one version you could have is that what it means to be conservative is to take the risks or the potential harms of what you're doing really seriously and worry about that.
07:16
Speaker F
But another another kind of conservatism or caution and I I think we were sort of in a regime that was very controlled by that one.
07:23
Speaker F
I mean, you you see it historically, right? Like if you look at the like early discussions in 1939 between, you know, people involved in nuclear physics about what the nuclear bombs were sort of a serious concern.
07:32
Speaker F
You see exactly the same thing with like Fermi resisting, um, these ideas because it just seemed kind of like a crazy thing.
07:40
Speaker F
And and other people like Zeller taking, um, taking the ideas seriously because they were worried about the risks.
07:45
Speaker D
Perhaps the deepest lesson that that I've learned in the last 10 years and probably all of you have learned some form of it as well, is there can be this kind of seeming consensus, these things that kind of everyone knows that that, I don't know, seem sort of wise, seem like they're common sense, but really they're just they're just kind of hurting behavior masquerading as maturity and sophistication.
07:52
Speaker D
And when when you've seen the consensus can change overnight. And when you've seen it happen a number of times, you you you suspected, but you didn't really bet on it.
08:00
Speaker D
And you're like, oh man, I kind of thought this, but what do I know?
08:05
Speaker D
How how, you know, how can I be right and all these people are wrong?
08:10
Speaker D
You see that a few times, then you just start saying, nope, this is the bet we're going to make.
08:15
Speaker D
I don't know for sure if we're right, but like just just ignore all this other stuff, see it happen and I don't know, even if you're right 50% of the time, being right 50% of the time contributes so much, right?
08:20
Speaker D
You're you're you're adding so much that is not being added by anyone else.
08:22
Speaker A
Yeah. It feels like that's where we are today with some safety stuff. There's like a consensus view that a lot of this safety stuff is unusual or doesn't naturally fall out of the technology.
08:28
Speaker A
And then at Anthropic, we do all of this research where weird safety misalignment problems fall out as a natural dividend of the tech we're building.
08:34
Speaker A
So it feels like we're in that counter-consensus view right now.
08:38
Speaker H
But I feel like that has been shifting over the past even just like 18.
08:41
Speaker A
We've been helping to shift that.
08:42
Speaker H
We've definitely been helping.
08:43
Speaker A
I know.
08:44
Speaker A
I mean.
08:45
Speaker A
We've been doing research.
08:46
Speaker H
Constant force.
08:47
Speaker H
Yeah. But I also think just like world sentiment around AI has shifted really dramatically. And, you know, it's more common in the user research that we do to hear just customers, regular people say, I'm really worried about what the impact of AI on the world more broadly is going to be.
08:55
Speaker H
And sometimes that means, you know, jobs or bias or toxicity, but it also sometimes means like, is this just going to mess up the world, right?
09:02
Speaker H
Is how is this like going to contribute to fundamentally changing how humans work together, operate, which is I wouldn't have predicted that actually, you know?
09:10
Speaker E
Yeah. I mean, I guess two things that at least connect to what you were saying earlier. I mean, one is, um, I feel like people frequently join Anthropic because they're sort of scientifically really curious about AI.
09:18
Speaker E
And then kind of get convinced by AI progress to sort of share the vision of the need not just to advance the technology, but to understand it more deeply and to make sure that it's it's safe.
09:29
Speaker E
And I feel like it's actually just sort of exciting to have people that you're working with like kind of more and more united in their vision for both what AI development looks like and the sort of sense of responsibility associated with it.
09:38
Speaker E
And I feel like that's been happening a lot due to a lot of advances that have happened in the last year, like like what Tom talked about.
09:46
Speaker E
Another is that, I mean, going back really to concrete problems, I feel like we've done a lot of work on AI safety up until this point.
09:55
Speaker E
A lot of it's really important.
09:57
Speaker E
But I think we're now with with some recent developments, really getting a glimmer of what kinds of risks might literally come about from systems that are very, very advanced, so that we can investigate and study them directly with interpretability, with other kinds of safety mechanisms and really understand like what the risks from very advanced AI, uh, might look like.
10:09
Speaker E
Um, and I think that that's something that's really going to allow us to sort of further the mission, um, in a in a like really deeply scientific, empirical way.
10:19
Speaker E
And so I'm excited about sort of the next six months of how we use our understanding of what can go wrong with advanced systems to characterize that and figure out how to avoid those pitfalls.
10:27
Speaker H
Perfect.
10:28
Speaker A
Finn.
10:28
Speaker H
Good timing.
10:29
Speaker A
Good timing.
10:30
Speaker H
We got to do this.
10:31
Speaker A
This is the only time we ever get.
10:32
Speaker H
We got to do this.
10:33
Speaker A
I do want to have.
Topics:AnthropicAI safetylanguage modelsGPT-2GPT-3scaling lawsreinforcement learning from human feedbackconstitutional AImachine learningAI research

Frequently Asked Questions

Why did the founders decide to work on AI?

The founders were motivated by a mix of personal interests, such as moving from physics to AI, and recognizing the broad potential of AI models through early experiments and collaborations.

What is the significance of the 'Concrete Problems in AI Safety' paper?

The paper helped ground AI safety research in practical machine learning challenges and served as a consensus-building effort to legitimize safety concerns within the AI research community.

What is constitutional AI and why is it important?

Constitutional AI is a method where AI behavior is guided by a written set of principles or a 'constitution,' leveraging AI's ability to follow prompts, which helps align AI systems with human values in a scalable way.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

Or transcribe another YouTube video here →