The Real AI Bottleneck? Graph Routing for Skills. — Transcript

Explore a new graph-based AI skill selection method tackling complexity and cybersecurity in large skill libraries for LLM agents.

Key Takeaways

Selecting AI skills at scale requires understanding complex interdependencies beyond simple similarity matching.
A typed skill graph enables LLM agents to navigate and select appropriate skills more effectively.
Security threats from AI-powered malware are rising, but new runtime solutions are being developed to counteract them.
Open-source repositories like Google DeepMind’s skill sets provide a solid foundation for scientific AI applications.
The integration of graph structures into AI skill retrieval marks a significant advancement in LLM agent capabilities.

Summary

The video discusses a new June 2026 paper on a self-evolving typed skill graph for large-scale LLM skill selection.
It addresses the challenge of selecting the right skills from tens of thousands available for AI tasks.
Traditional vector similarity matching fails due to skill dependencies, redundancies, and conflicts.
The proposed solution is a typed directed graph exposing inter-skill relationships to the LLM for better retrieval.
The graph supports multi-stage search returning vector matches, typed neighbors, conflicting signals, and allows dynamic edge edits.
Google DeepMind’s scientific skill repository is highlighted as a reliable resource for scientific AI tasks.
The video also covers emerging cybersecurity threats from adaptive AI-driven computer worms and corresponding defense solutions.
Tsinghua University’s agent libOS is introduced as a runtime environment enhancing security for long-running LLM agents.
The skill graph approach transforms skill retrieval from a one-shot ranking problem into a structured, inference-time graph retrieval.
The concept of 'typed' in the graph means nodes and edges have predefined categories ensuring data consistency.

Full Transcript — Download SRT & Markdown

Speaker A

Hello, community. So great that you are back. We have to talk about AI. We have a new development, especially regarding your AI skills and a new methodology here to optimize the skill use for your AI system. So let's start. This is

Speaker A

here a brand new paper, June 2nd, 2026. This is by FAN University, National University of Singapore, AAR, and they talk about a self-evolving typed skill graph for the LLM skill selection at scale. And if I say scale, I really mean 10,000 skills

Speaker A

or more than 100,000 skills that are available on the internet. And your AI has now to find the right skill for your particular task. And they have some brand new ideas. So let's have a look.

Speaker A

But on the other side here, the very same day, June 2nd, 2026, I just want to show you AI is opening up a can of worms now because literally we are talking now about a computer worm that is now highly

Speaker A

intelligent. You have never seen a worm like this before because now AI agents enable adaptive computer worms. And guess what? They are directly connected here to our new skill availabilities. So a computer worm, if you know, is a

Speaker A

malware that spreads on a network by replicating itself from one AI machine to another but with completely new methodologies that are just amazing. But of course, let's be positive. We have solutions for both sides, so let's have

Speaker A

a look at the first solution. You ask me a lot, which tools do I use or which skills do I use? Now I would like to show you here after some weeks I really like here Google DeepMind. Google DeepMind

Speaker A

published some science skills and you see here a GitHub repo and they have almost for every scientific task that you can think about, for literature search, for, I don't know, PubMed databases here, for particular proteomics, for whatever search, here particular skills,

Speaker A

search here for particular options, and they are tested, they are battle-tested, and if you like, you can just optimize those skills. But they are a great place to start for your scientific skills.

Speaker A

Google DeepMind is not something that can go massively wrong if you ask me. On the other side here regarding the cybersecurity threats, it is amazing.

Speaker A

There's a whole bunch of literature I don't show you up until now, but I think we should talk about it because there are also solutions. And this solution I also posted here at the same time that I generate this video is, for

Speaker A

example, here by Tsinghua University, the Department of Computer Science and Technology, and they built here an agent libOS. So this is a library OS-inspired special runtime for our long-running and capability-controlled LLM agents, for our AI agents. So this is absolutely

Speaker A

fascinating. I have a little bit more technical detail here in my post or you just go and read the paper. They are now really trying to make systems more secure. But given the intelligence of our computer worms, this is a continuous

Speaker A

fight. And this is just amazing to watch this fight here in real time. Every day they publish new papers.

Speaker A

So you see we have solutions like beautiful skills or yes, we have solutions here to fight against your cybersecurity threats. Just want to show you this here skill MD files. Yes, absolutely. They give you here an unbelievable detail. And if you go here

Speaker A

for categories here or met or archive categories here, you have here, I think, everything here from genetic and genomic medicine, health informatics, here nursing, neurology, medical ethics, medical education. You find if you're doing science, I think one of the skills will

Speaker A

fit here your particular thing. But let's go with this because this is exactly the topic. Let's say you have, I don't know, 20 skills now just from Google. There are minimum 20 skills just to find literature, scientific literature here

Speaker A

on the net, on databases, on particular servers, on particular not so well-known servers. So you are having now a job and now the problem is imagine you have more than 20 skills just for scientific search algorithms. So which one should

Speaker A

your AI system utilize? Which one should it pick here? So let's have a look. And they said, hmm, you know what? We build a self-evolving skill graph for the skill selection by the LLM. And they make it

Speaker A

now a little bit different. Of course, they built here on the research that just happened last week. And of course, a lot of people ask me, hey, where's the GitHub repo? Here you have the GitHub repo here with everything that we are

Speaker A

going to talk about. Beautiful. So, let's be clear. What is the problem that we are trying to solve in this video and the orders solved in here in their publication? As LLM agents adopt here or large skill libraries, thousand,

Speaker A

10 thousands of skills, selecting the right subset and maybe it is not one skill but you need here a consecutive A B C sequence of skills becomes a structural problem rather than a cosine similarity matching that you have in a

Speaker A

vector database, because skills depend on and conflict with specialized or duplicate one another. So we have heavy dependencies between those, let's say, 10,000 skills and it is not easy to pick one just by an embedding similarity.

Speaker A

This is not good enough anymore. So you are to say, hmm, we identify that a flat retrieval over a large skill library silently omits the structurally necessary skills and concatenates redundant or interfering ones, and this is definitely a failure mode that is invisible to

Speaker A

similarity matching alone. So we just have so many skills we don't know which skill to trust. I think this is here one of the main features, but let's go here with complexity and interdependencies and redundancy and interfering.

Speaker A

So the solution is simple by the authors. No, they say, okay, you know what, so let's check out the interskill relationship, but what we build, we build a graph, of course, what else? A typed directed graph and expose it here to

Speaker A

the LLM agent, and this is now new as a new inference-time agent-callable structural retrieval interface. So we don't squeeze in some results that we also had from a graph search and put it into the context. No, now we build a

Speaker A

graph and we provide the graph here to the LLM. This is new. So the search now and we have multiple stages here. The search now returns here vector matches, type-edged neighbors, and conflicting signals. Beautiful. We also have the

Speaker A

possibility now to propose a new edge in our graph structure or the function to edit an edge. Let's make it more prominent. Let's say, hey, we have new dependencies or whatever. So let the agent register execution-backed edges that are developed in the inference

Speaker A

run and are detected in the inference, and then we have a graph that accumulates here a more complex or maybe a more sleeker structure across all the episodes. So we want to have a graph that is visible that is now here the

Speaker A

agent-callable structural retrieval interface for our LLM. And of course, this is a new idea. Quick start. If you want to jump in the code, yes, you find it here with your MIT license. But we now want to

Speaker A

understand the main idea. What is the brand new genius idea that they had? What is the insight that they apply now in their mathematics, in their execution before they went to code this whole thing? Now last time there was a

Speaker A

question here when I say type, what does it mean? So okay, definition, you are right. In graph databases, type means every node, edge, or every relationship, if you want, in general here, and property is assigned a specific predefined category

Speaker A

or class, and this creates here a strict schema like classes here in programming language, ensuring the data remains consistent and reliable. And let's talk about skills. Skill is a self-contained package, a unique identifier, natural language description of when it applies,

Speaker A

in a body of instruction or code the agent runs when invoked, or with auxiliary scripts. Beautiful. So, back to the main topic.

Speaker A

This new skill deck turns now the skill retrieval from a one-shot ranking problem that we had up until now into a typed and this is not interesting, editable, and agent-visible relational substrate. And this is a str...

Speaker A

more than any other. Hey, we have a new retrieval or a new retriever uristics and we have a re-ranking and whatever we do something different here. Okay.

Speaker A

And I think pardon my word my simple words the prep new contribution is not just a new graph. No, it is we do something different. The graph stores of course the typed relational evidence but now it is the R agent itself that

Speaker A

interprets the evidence during here the inference during choosing our particular action to execute and the execution can feed back into the graph complexity into the graph evolvement through some controlled edits. So if you want we again back to our old topic here a self

Speaker A

evolving skill complexity and they call it here also a self- evvolving skill memory because we understand when we used it the last time how successful it was what were the dependencies. So again, skill memory and this is a topic

Speaker A

we encountered already multiple times on this channel. If you remember here, those are the two videos immediately come to your mind as a subscriber of this channel. We have here skills with their own memory and deck compression.

Speaker A

You see almost the same idea but it is still different or evolving self evolving skills here with skill opt. So you see all the researcher around the globe in every university more or less wherever you are in China or the US they

Speaker A

are working on this and why because this is the benchmark to beat this is I got published almost a week ago May 27 2026 here version three careful take care about the versions and here you have GitHub and everything graph of skills so

Speaker A

here we are here in the best is dependency area structural retrieve for massive agent skills and they tried to solve the same topic. No, here it was University of Pennsylvania, University of Maryland, Brown University, Cunningham University and Lehigh

Speaker A

University. Beautiful. Now, what was the idea? They said okay what we had until now vanilla skills. No cost query and then we have I don't know 12,000 skills.

Speaker A

So long context token inefficient. So not the greatest idea. What we do? We go with a semantic top K without a structure. We go here on the semantic similarity either in the title or in the description of the skill. No great. If

Speaker A

you go in science here the semantic similarity can be everything. And here graph of skill set hey we have to optimize this. We have to move from vector skills here with a semantic similarity a cosine similarity mathematical function to understanding

Speaker A

here the environment around the particular target skills. So we don't pick just one skills but we diffuse this a little bit into the epsilon environment and we have a look at all if you have one node here in our graph what

Speaker A

are the connected nodes how strong are there connected here to our target node and we try to extract here diffuse here if you want the probability mouse not on a single node but on the connected environment so that we have something

Speaker A

that is maybe here relevant [clears throat] to Let's go with a T5 node. And therefore they extracted here now this graph of skills with a graph structure. Beautiful. They had started here with a hybrid semantic lexical seed node application and then performed the

Speaker A

structure error retrieval to recover the prerequisite skill and assemble a compact execution bundle. But they did not provide the graph to the LLM. They just provided here the text of all this bundle as a text into the context

Speaker A

window. And you guess now what is the next evolution step? Yeah, by the way we have also another one. This is here from February 26, 2020. Skill maybe you know skill net. This is the task here headed here by say young university to create

Speaker A

evaluate and connect EI skills. They built here 500,000 skill graph network complexity but this was used for a platform organization rather than the runtime routing. And we're talking about the routing. This is not going to the next step. But you see this is really a

Speaker A

leading and bleeding edge here for skill development which of I mean 500,000 skills I mean come on who uses this. So just to be absolutely clear if you have not read the paper which I highly recommend what is the difference graph

Speaker A

of skill our number one up until now the closest baseline we have now to this new publication is what it builds on a type skill graph offline and retrieves now a bundle of our personalized page rank or the description here in the paper that

Speaker A

is concatenated into the context window as some text beautiful but Now today this new study inverts now two choices. It goes a step further.

Speaker A

They say you know what the graph is now the agent callable structural retrieval interface. So not the text that we squeezed into the context of the the personal page rank but now we give the graph itself to the LLM

Speaker A

and we make it dynamic. So the edges are now editable during the execution rather than frozen after the construction or at the construction.

Speaker A

This is something that we have a self-arning dynamic system that really responds here at inference time and this is the beauty of it. So you see we increase the complexity but we are standing on the shoulders of giants. We

Speaker A

go every week a step further. Great. So therefore skill deck this new methodology is built to be one of the same footing as the tools as the agent already uses here the tools for the execution. So you see skills and here

Speaker A

the direct icyclic graph here comes closer to the tool use and therefore the LLM and not the graph ranking policy decides now how the structural evidence that is manifest itself in the graph complexity should affect the execution of the reasoning trace of our large

Speaker A

language model. Beautiful. Again, if you want to understand it, skill dag guessing is easiest to understand as a memory system for skills that behave more like a living graph that grows, that evolves, know that it's dynamic than just a static retrieval

Speaker A

index that we had up until now. Let me give you a simple example. Suppose the agent needs to make coffee. So, how to make coffee? Unsolved problem. We have here a retriever goes out and returns you the following skills from 10,000

Speaker A

skill libraries somewhere. So we have the skill you know the normal retriever goes ranking by cosine similarity the semantic similarity let's go with the title no so make coffee so we have a coffee maker skill or the buy coffee

Speaker A

skill this is already in environment or espresso skill this is also here as you see ranking by similarity it was here in the mathematical vector space position close here to espresso to graph a graph if it exists at all is only

Speaker A

behind the scenes helping to produce this particular list and these are the this is the result that the retriever brings back normally. Now with skill deck it is different. Why? We provide a higher complexity level. We provide more

Speaker A

context to the task make coffee. Look at this skilling now lets the agents see the complex structure of the graph. And the graph has not only make or prepare because now we have the coffee maker skill depend on another skill set and

Speaker A

this is grind the bean skill or we have the espresso skill is just a specialization of the coffee maker skill. So it understands this is more or less a subset or the instant coffee skill is similar to the coffee maker

Speaker A

skill or the milk form skill composes now with the coffee maker skill and the espresso skill conflicts now with the instant coffee skills. But you see the structure is now much denser. We have more context. Before we only had the

Speaker A

cosign similarity of a linguistic semantic similarity, but espresso and coffee. Yeah. But instant coffee and espresso, what is the relation between them? Easier, simpler?

Speaker A

Oh, it conflicts because it has a different procedure if you prepare an instant coffee or you prepare a real Italian espresso.

Speaker A

So, you need more information. You need more context. And this is exactly what our skill direct as graph brings into the arena because the agent is now not just asking what skill looks semantically similar but now it can ask

Speaker A

hey what is the prerequisite what is here the relation of the prerequisite what is the specialized version how many specialized version do we have 3 4 10 what should not be loaded together what is a potential conflict and do I need to

Speaker A

revise the graph because I learned a new relationship Did I discover something that is not yet a skill materialized as a skill? So therefore I can write now a new skill markdown file.

Speaker A

So you see this is what the agent callable structural interface really means. This is it what we are dealing with. But the graph is not merely here an internal lookup table like it was before but now it is an active part of

Speaker A

the agent reasoning surface complexity. Again going back the graph of scale up until yesterday. No we saw the graph as a substrate. A retriever uses here the graph only internally to complete your particular bundle and then hides this

Speaker A

graph structure. Instead of he or five skill trust the retriever like graph of skill says more or less also if it searches here in an epsilon environment over our target node. The new agent now the skill deck gets here all the matches

Speaker A

all the neighbors and all the conflicts and dependencies and other inter relations separately. So the context is massively enhanced.

Speaker A

Yeah, those are the three faces. We have this we built the skill library and the routing. Then we have the runtime search. Beautiful. And then the LLM analyzes this or proposes here an additional edge or has some other or

Speaker A

addit an edge because it found here a new either new factor or a new intensity or whatever. So graph evolves across the episode if you want to have it here simple and of course there's the question cold start how we start here

Speaker A

this complete thing now. So let's say we have a skill library. Let's say we have 1,000 skill MD files each with 1,200 character body in your markdown file.

Speaker A

Then we have now two view embeddings. We have here one the e self. This is more or less what the skill does embedding and what the skill needs to be functional operational all the dependencies that come before that you

Speaker A

have to upload or install before. Then we have our embeddings and here we go for a candidate pair region. Here we have our cosine threshold here. So apply an adaptive coine threshold. Then we run an LLM pair classifier where we have now

Speaker A

the following edges similar to or specializes I showed you this what depends on or composes with whatever are here your edge classifier and then it materializes here into the initial cold start graph. This is now a typed skill

Speaker A

direct asyclic graph structure and this is what we can dep what we can build on beautiful results. Okay, [laughter] let's come to the results. What are the main results? So, we have skill bench. I introduced you to this at the beginning

Speaker A

of the week and then we have Alforld. This is an old friend of us. So, let's have a look. They go with two LLMs. At first, we have Minimax M2.7 and then a little bit more powerful a GBD 5.2 codec

Speaker A

system. So, let's go. As you see, what we are looking for is here. Just look here at the reward or in percentage and here up until now go 18.7 and now with the new skill deck 27.3. Isn't this

Speaker A

beautiful? Now you see if we go now here to uh more powerful quotation mark GD5.2 codecs here you see GRS is 34 and skill deck is 36. So it's a little bit better but it is not as massive I would call it here.

Speaker A

as if you only have a minimax M2.7. So, okay, you see this also beautiful with ALF world. No, the GBD 5.2 codeex here normally with GS has 93.6. And guess what? The new skill deck has here our reward here 93.6.

Speaker A

What a coincidence. Great. So you might say so is it really that much better or is it that the more powerful our GPD I don't know 5.5 whatever you have or whenever you see this video whatever the latest GPD is is

Speaker A

it not that it really depends here on the let's say intelligent here of the LLM so this is not interesting so they further investigated this now what is interesting is the agent the new agent downloads only the type

Speaker A

neighbors and the skill bodies that it really needs avoiding thereby up until now the context bloat failure where the fixed bundle retrieves a flood and they all put it here in the prompt with marginally relevant skills or some

Speaker A

even corrupt downstream here the cor the downstream reasoning trace here. So you want to make sure that you only have the perfect skills 1 2 3 4 5 in your context window that is absolutely needed that have the right dependencies. They might

Speaker A

depend on each other input output relation complexity structure whatever domain relevance and you want to make sure this is it and you don't want to flood your prompt with nonsense.

Speaker A

Now this here is one of the most interesting um results that I think that if you read the paper that you will find also it's not that prominent here in the paper but let's have a look at this I

Speaker A

like this why this is not measuring here the task performance this is here measuring here separated from the if you want intelligence of the LLM this is measuring the retrieval quality in isolation and they define find 87 skill

Speaker A

deck queries. Beautiful. And then you run this 87 queries here on up until here now go and this is in red and the new is skill deck in blue. They have on the y-axis the retrieval quality going up. Beautiful. And they have now

Speaker A

three further detailed structures. So what is the question that they want to answer is before the agent even starts reasoning or acting here? No, does skill deck retrieve the right skills faster, better, more elegant, cheaper, whatever than goss the graph of skills. Is there

Speaker A

really an improvement if we provide here the context complexity in a graph to the LLM to improve here the reasoning process? yes or not. So let's have a look at this.

Speaker A

And they also say which skills should the agent load now into the context window because as the skill library becomes larger and larger and you see here the skill pool scale we start with 200 uh skills in the pool then we go to

Speaker A

2,000 skills and if you see skill libraries with 20,000 skills you can you have an idea where we go.

Speaker A

And the simple the question is simple. As the skill library becomes larger and larger, does the retrieval break? And they investigated this here on a scale from 200 to 2,000 skills in our pool.

Speaker A

Okay, let's do this. So the first one here, the if you want here is recall it five. So what does it answer? It'll answer did the correct skill appear anywhere in the top five that we retrieved and you see here skill deck

Speaker A

outperforms here our GRS or graph of skills significantly. There is a particular goldilock where it comes close but otherwise if you go here to 2,00 here in our pool you see it just goes down down down.

Speaker A

So this is here the retrieval quality that the correct skill will appear anywhere in the top five retrieved skills. So our new skill deck is better.

Speaker A

What about in the middle? In the middle we have uh at one. So it asks is the first retrieved skill really the correct skill or one of the correct skill if you have multiple dependency you know. So this matters because LLM's often

Speaker A

overweight here the early context. So if you want this test here in a certain way the ranking precision here and as you see skill deck here the blue line is really above the red line. So beautifully now you see that the more

Speaker A

the skill pool increases our performance decreases. So of course if you have now 2,000 skills here to choose here the right one at place it at at position number one is not an easy thing. But you see the difference between 42% and 58%.

Speaker A

This is a massive difference here in the performance. So therefore skill deck absolutely is an improvement over GRS.

Speaker A

And then the third one is here really interesting. This is the mean reciprocal rank. This is much more nuance. It's just if you want this is here asking here can it retrieve the correct skills earlier? Are we faster? Is it more to

Speaker A

the point? And again you see here the blue line here is just outperforming here our GR system. And here look here the spreading the skill pool the skill increases here. Look at the differences here. So skill deck absolutely

Speaker A

beautiful. The next step in the evolution here, if you have a skill library or a skill, whatever bank here of 20,000 skills, 200,000 skills, and your little AI must now decide, my goodness, which skill should I select, use combined with other skills, what is

Speaker A

the complexity that I can handle? And this skill deck methodology is currently at the recording this video, June 3rd, 2026. It seems to be the best intelligent methodology, the best algorithmic code implementation especially for this task. I hope you had

Speaker A

a little bit of fun. I hope there was some new information for you in your professional uh work with EI and I hope to see you in my next

Topics:AI skill selectionlarge language modelstyped skill graphLLM agentsgraph retrievalGoogle DeepMindcybersecurityadaptive computer wormsagent libOSTsinghua University

Frequently Asked Questions

What problem does the typed skill graph solve in AI skill selection?

It addresses the challenge of selecting appropriate skills from thousands available by modeling dependencies and conflicts between skills, which simple similarity matching cannot handle.

How does the new skill graph improve LLM agent performance?

By providing a structured, typed directed graph that exposes inter-skill relationships and conflicts, allowing multi-stage retrieval and dynamic updates during inference.

What cybersecurity concerns are raised in the video?

The video highlights the emergence of AI-enabled adaptive computer worms that spread malware intelligently, and discusses new runtime solutions like agent libOS to enhance AI system security.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

App Store Google Play

Or transcribe another YouTube video here →