EDH power RANK SYSTEM FINISHED better then the bracket system

Full Transcript — Download SRT & Markdown

00:04
Speaker A
It's finished! My power ranking system for EDH is finished!
00:09
Speaker A
Now, this is going to be a super long video, so grab your tea, some cookies, lean back and enjoy. Hot!
00:17
Speaker A
So, for some time, I've been working on a ranking system for Commander.
00:23
Speaker A
How to evaluate cards on an individual basis, but also to kind of understand how strong decks are and say, this deck is stronger than that deck, et cetera.
00:34
Speaker A
Currently, we have something of a bracket system that is helping you communicate something you could say.
00:40
Speaker A
This is a, I'm not, so I'm not going to replace the bracket system because I am a nobody, so that's not going to happen, but the goal is to do something of the similar, but in my opinion, better.
00:51
Speaker A
So here we have a bunch of decklists, and you see names, for example, you have Etali, Kinnan, Kraum and Tymna, et cetera, and we also have a, in yellow text, Power Rank, and yeah, that's the rank.
01:06
Speaker A
That's how good they are. And as you can see, the value here goes down and it decreases and gets lower. For example, here at the bottom, I have a Azula casual deck.
01:58
Speaker A
But here I have a confession to make, this power ranking system is not complete, and it's limited to its capabilities.
02:07
Speaker A
You see, I have collected 471 tournament games from only top four and top 16 from this website.
02:16
Speaker A
That means that I've gone in here, and I've gone to view brackets, and I click pairings, and then top four, and I've collected this pod. So here we have a winner and three losers, and I've marked who the commanders were, but also their decklist. Then I've gone here, collected this, but I haven't touched any of this. Basically, the entire tournament is a method to eliminate or just get a certainty that we're only collecting people with a high skill in Commander matches, in playing Commander, so to say.
03:33
Speaker A
Which removes the factor of skill level and more focus on Commander card power. But that means that, as it's usually the same commanders over and over, always reaching to the top, all of the other casual things like this Doctor Octopus Master Planner, he's an a good example, Lord Windgrace, another great example, Winota is very rare, she gets up there sometimes, but in a rarity, Tof, the First Metalbender, I've never seen her reach top 16 yet, Laid Octopuss Inspired Mentor, she doesn't get to top 16 either, Urza, Lord High Artificer, doesn't reach the top that much at all, Emil the Blessed, Yaganta, and yeah, you get the point. There's a big variety in strong decks and weak decks in tournaments. And as I'm only focusing on top 16, I don't collect that much data on casual.
05:14
Speaker A
Which means that this system that is currently showcasing a power rank on various different commanders can only showcase a power rank on the topmost common commanders. So we do have a good verdict on Etali, we have a good verdict on Kinnan, Kraum, Rograkh, TNT, and then around here at Kefka, Sisay, actually TNT too, it starts to drop. I want to focus on the missing here, 828. We're going to get to that later. Ah, this tea is actually really good.
05:46
Speaker A
But there's a key thing we can do here. We can go into Etali's decklist and look at each individual card. So we can see here the power rank is at 30,961.20. But we can also see each individual card. You can see here total power, 622. And then we scroll down, and I've made it rank all the individual cards one by one, how much power rank they contribute to the total system.
06:57
Speaker A
And with this, you can basically see, okay, don't have this card here actually, so ignore that one. I have data on it, but I don't have the Scryfall picture. But as you can see, you can kind of see what cards are strong and weak. So you can take out cards and put in cards, and yeah, look, the power ranking goes up and down depending on what cards I put into it. This is Kraum and Tymna, and according to the system, Birgi is the strongest card so far. We'll get to that later. Then we have Ad Nauseam, Tymna is really strong, Diabolic Intent, Culling the Weak, Faerie Mastermind, and we're getting all sorts of cool numbers on these different cards, how strong they are, according to the system. What, what is the weakest card if we just keep scrolling down here, actually? What does it view as some of the weaker ones? Well, all cards are kind of amazing and great, just saying, but still. Cyclonic Rift and Mindbreak Trap, Counterspell, Dragon's Rage Channeler. Yeah, I can agree with this in some sense.
08:40
Speaker A
However, this system can be used to create tournaments in a certain way. As well as I've showcased, you can use it to look at how strong certain individual cards are. You can also create a tournament and currently, someone could create a tournament and say, let's go with bracket-free tournament. Perfect. Or let's go with a budget tournament. If you want to power down this format, that is something you can do. However, you can also use this system and say, let's put a limit on 20,000 to the power rank, and we'll only allow decks that are on 20,000 or lower, which is basically the exact same thing as saying budget tournament. If you put a budget on 500 USD dollars on Moxfield, and every deck has to be below that, then yeah, that's the same thing. The problem is that budget isn't really related to pure strength and a card's true power evaluation, this system is.
10:21
Speaker A
However, it can be, as you can see, a little bit tricky reading this. So, the system is all in binary, obviously, because of file space. Now, here, look, you have Temple Garden, great. We have Spellseeker, great. We, oh, we have Vexing Bauble, perfect. Oh, we also, here, look, we have Sacred Foundry. But, uh, yeah, have fun trying to figure this out. So, if you are a patron of mine, write to me and I will see what I can do, and I can send you something, et cetera. I highly recommend writing to me as well on Discord, and we can work something out, depending on how you want to use it, I can see what I can do, if I can build something individually for you.
11:47
Speaker A
But this system is also getting updates. For example, these tournaments here were actually added just today. And I usually get something between 15 to 20 new games into the system each week. So it should climb from 471 to 500 pretty soon.
12:05
Speaker A
So now we talked about the goal of this system, how the data, where it comes from, but also how you could theoretically use it, the goal of it, and I can send it to you, et cetera.
12:18
Speaker A
But how in the world does it work? Can we really trust this monster's figure? Well, that's what we're going to talk about now for the rest of this video. And now it's going to be a lot of math and philosophy, so I hope you have some tea, some cookies, and time. This is not going to be a video for you if you like to talk about cards in general. This is now going to be a video if you like about, if you like math. And tea.
13:27
Speaker A
So, in an older video of mine, I talked about how I'm going to save casual. And this video right now is basically that answer. I've solved the problem this video is talking about. If you haven't watched this video, you should, but you don't need to. Instead, we're going to fast forward and scroll down here. I wrote this two months ago. This is a two months old video. True story, I actually had an Eureka moment. Right after uploading this video, I went to groceries, and I suddenly got an idea on how all of this would be solved. Still, I would love some feedback and suggestions of potential ideas. But my idea is this, we should create ID on all two card combinations. This means we get both synergy and individual cards power while we still can use a smaller dataset to get true answers. Because something I talked about in that video was that Magic the Gathering cards have synergy. They are strong on a, in a combination with another card, and sometimes they're great on their own. So how can you evaluate a card based on these two factors?
15:05
Speaker A
And here is that answer, just like I kind of wrote in the video below. We are not going to rank or evaluate an individual card. We're going to evaluate all different two card combinations. And that's what you're seeing right here. Underworld Breach together with Flare of Duplication, have a winrate of 53%. You can actually see winrate 5.3. That is just divided by 10, so it becomes a smaller number for simplicity's sake, otherwise the numbers will explode to high values for no good reason at all. So it wins 3% more than the total average, you could say. We'll get to that. Then it has confident challenges, and then it has a log multiplier, and then a final value of power rank 6.416. Remember that 30,000? So this is a super small portion that climbs to that 30,000.
17:03
Speaker A
Now we can evaluate a lot of cool cards in here like Thassa's Oracle and Tainted Pact. No, sorry, Consult. Or we can do Tainted Pact too, as well, actually. You can actually see here, they have a different power ranking, by the way. We can also do Underworld Breach and Thassa's Oracle, which is also a different power ranking.
17:23
Speaker A
Now, in a 100-card Commander deck, you have a total capacity of 4,950 total possible pairs. And now some of those values up there in the corner might be easier to understand. You see pairs, 4,950 found, 4,944 missing, six. Average pair power, 6.1358. And what the code does is look at the total average, and okay, we have six missing, so just let's add total average times six to reach a full 4,995. Sorry, 4,950. Which is how you get the total power of 30,372 for Tymna and Kraum here.
18:53
Speaker A
Then you can look at each individual card. For example, Birgi has pairs 99. That means that she has been paired with every single card inside the entire decklist. And the total pair power from all of those pairs together becomes 668.3080. Then you have Mockingbird, also 99, which means its total pair power is 656.0470. So now you can kind of see how each individual card is contributing to the total performance and power of the deck, so to say. But also how this becomes very dynamic. For example, Birgi and Ragavan has a power ranking of 6.647, while Birgi and Birds of Paradise have a power rank of 6.024, which is weaker compared to Ragavan. And you can find synergy between certain cards. For example, here we have The One Ring three times with three different other cards. So The One Ring and Voltaic Key, that is in general a good synergy. It is achieving a power rank of 5.49. Then you have The One Ring and Mana Vault, which is also a good synergy. Okay, they don't synergize, but with a Mana Vault and The One Ring, you can have The One Ring in play turn two. So, that's great. They don't synergize more than that. But like Mana Vault is a pretty good card on its own, The One Ring is a pretty good card on its own, so they perform pretty okay too. But then you have The One Ring and Seedborn Muse, and suddenly the power rank spike, or spiked, went up to 6.271, which is a lot higher to 5.49. Or well, not a lot higher, but yeah, you see my point. But we still get an evaluation on cards that don't have any synergy with them whatsoever. Now, sure, someone out there could argue that The One Ring and Mana Vault have great synergy. But I'm just going to argue they're great cards on their own, which ultimately means that we are getting the evaluation of how good a card is on its own, but also if there's a good synergy between two different pairs. And as you can see here, it just chains in a long way. For example, here we have The One Ring and Sol Ring, then we have Smothering Tithe, Esper Sentinel, and we could basically make an entire decklist with only these cards here, and we're kind of getting a picture of how good all the cards together become something great, or something wonky and weird and nothing good at all. Because if you just put a lot of things that don't synergize with each other, have a very weak power, like a strength to them on the individual level, you're going to end up with a casual deck, and you're going to end up with a deck with a low power ranking in general. Now, I'm not going to sit here in this video and showcase every single pair possibility, because that will be forever and boring. So if you want to know something specific, write to me in the comments below of this video, and I can potentially send it to you. Warning, if you're asking me of fringe cards, we don't have data on them. I want to emphasize, we only have data on very common cards that is played frequently, because it actually demands a lot of games for the system to actually add a card in here, which is what we're going to talk about now.
25:11
Speaker A
So, how in the entire world is this monster reaching this math? Well, kind of simple, but also quite long and complicated. But you will understand, don't worry. So from this specific game, we found 52,386,941 challenges.
25:34
Speaker A
Remember those four, 100-card deck can create 4,950 unique pairs. We're putting all of those pairs versus each other pair. With the winner's pair versus each individual loser, not combining the pairs from loser to loser. So first, in this case, Esika won. So it will first look between Esika and Rocco. Now, first and foremost, if there is a identical pair in both decks, let's just throw out like Commander Tower, Sol Ring. Both of them probably have that card inside their decklist. So that pair is removed from the entire evaluation. Then once it's removed all duplicates from both sides, it compares all pairs, and then does it the same thing with the second loser and the third loser. And that's how it achieves a 52 million different unique challenges.
27:18
Speaker A
And that looks like this. Now, this is not specifically from that game, actually. But you can see here all the different pairs. You can see a Chain of Vapor, Polluted Delta, with the plus, basically combining the two, and then a versus a split, and then you say Jeska's Will plus Temple Garden, and you see a 0/1. That means that the winner pair here was Jeska's Will and Temple Garden. And then you just keep going down. For example, you have Jeska's Will plus Temple Garden versus Mental Misstep plus Mindbreak Trap, and here Jeska's Will and Temple Garden won again, and yeah, this just keeps going until you have, yeah, those 52 millions. This also means that the more mirror match or more identical the commander lists actually are, the less challenges you actually get. Sometimes I've actually gotten something like 24 million unique challenges, and my record is 69 million challenges. This is actually the very usual typical value I get. 52 million is quite common.
29:00
Speaker A
Okay, but then what? Well, we're going to combine those together and save them. So that once again, when that, here, at the top, Chain of Vapor plus Polluted Delta versus Jeska's Will plus Temple Garden. That's one game. Once they encounter each other again in this situation, it will accumulate and climb in value. Now, I actually can't showcase that to you because it's all in binary. I mean, we can just do this here, output, go in here. It's going to think for an enormity. Let's just take this one, read, and yeah, that's how it looks like. However, this is more or less how it looks like once you translate binary. So, here we have an example of a challenge. You have Thassa's Oracle plus and Demonic Consultation versus Underworld Breach and Rite of Flame. And in this case, you can see the 3/5, which means that Thassa's Oracle and Demonic Consultation has a winrate in this individual challenge of 37%. Then, you have a new challenge. Thassa's Oracle and Consultation versus two casual cards. Those two cards could be anything, it doesn't really matter. And suddenly it wins 9 of, of 10, which gives it a 90% winrate, because 9 of 10 is 90. Then you combine all of those, so 37 plus 90 equals 46.5. Okay, I've actually forgotten to divide it by two. But you get the point. We're basically combining all, every single one of those challenges. Like this continues for like ever. Like the amount of challenges every pair has, well, we're going to get to that, is very high. But what you get is a total average of all of that.
32:22
Speaker A
Then we're basically doing the same thing with Underworld Breach, Rite of Flame, versus that Thassa's Oracle, and versus that casual card too. Now, in this case, Underworld Breach and Rite of Flame had a winrate. Remember, the Thassa's and Consult got 45. Well, that means that the other side gets 62.5, while it gets 100% versus that casual deck. So they end up with an 81.25 average winrate by combining the two values, 62.5 plus 100 and divide it into 81. And then, of course, we do the exact same thing with the casual deck and or casual challenge. And as you can see, they don't perform that well at all. They lose a lot, and yeah, they end up with an average winrate of five.
33:39
Speaker A
And now suddenly we have a pairing leaderboard, you could say. So in this small simulation, Thassa's Oracle would receive a, we also divide it by 10. Remember. So what you see is actually the winrate, then divided by 10. This is just to make the numbers smaller. But the simple logic is this. What we are creating is a wrestle match of all different unique pairs and versus different other cards. And a pair that is in general really strong will defeat a lot of opponents. Now, if a pair is equally strong to another pair, the challenge is 50/50. Both will get a 50% winrate versus each other. But once they are starting to fight versus casual pairs, weaker pairs, they're going to beat them more often and get a higher winrate. So if you beat a lot of weak, you get a higher winrate. If you fight versus the same kind of strength, you get the same winrate. And this is, by the way, why the entire system is just going to generate a very equal outcome, because I've only included the top of the top best decks you can find. And that means that the spread inside the system is very small. All decklists in general in the system are going to be very narrow, because something I've said in a lot of my videos, the card choices doesn't truly matter that much. What matters is player skill. At the top, top best, it's kind of equal in general.
35:35
Speaker A
However, we're doing one more thing. We're creating a criteria such as confident challenges, and we're putting a value at 20. So, for example, here we have two different challenges. We have the pair, or we have three pair. We have Underworld Breach and Dragon's Rage Channeler. First, they go up against Fellow Stone and Arcane Signet. Now, this is a very common interaction. They go up against each other quite often. So they have a 17 plus 9, which equals into 26. So that means it is above 20, and it is approved. This is more of a confident challenge. But then you have another example. Faithless Looting and the Green Goblin thing. Now, in this situation, the Green Goblin thing, have, no, sorry, this is not the Green Goblin, this is the Ragavan Goblin. Sorry. However, in this situation, that Commander thing has won three times. Now, that doesn't mean much. You need more games. And that is why we have a confident criteria of 20. This could, of course, increase over time. It can just evaluate how much we want to put demand of trust to this system, you could say.
36:49
Speaker A
But if we go back to this graph right here, you can see Underworld Breach and Birgi. They have confident challenge. What is that? 19,539. So they have a lot of confident challenges. And all of those confident challenges have combined into an average winrate of 5.5, or 55%, or 55.15% to be more specific. Now, they have a lot more challenges. Trust me. But it has been basically excluded everyone that is below 20. Over time, with more data, more games, the confident challenges will increase in value. But then what? You're seeing winrate 5.5, but the power rank is at 6.6. This takes us to the log 10 system. So what this means is that each confident challenge of 10 increased by multiplied by 10 will increase the winrate by 1.05 multiplier. So, for example, if you have 10 confident challenge, you increase your winrate by 1.05. If you have in that situation, actually, that we showcased, that had, what was it, 19,000, you increase your winrate by 0.1, 1.2, as you can see on this graph. This means that the next multiplier demands more challenges over time. This is a great system to acquire confidence. A pair that is used a lot will acquire a higher multiplier log, log multiplier, and get a more higher reward. Because something I've noticed is that when you have very few challenges, you can end up with a very high winrate or a very low winrate. So it's a system that is just balancing everything more together. However, a new card coming in will usually get a good winrate or a bad winrate, and it will have in the end a pretty equal evaluation to older cards that have basically found themselves somewhere around the 50% evaluation. This is definitely where you could debate and discuss other methods of acquiring this, but I think this is a good system.
42:32
Speaker A
In any case, the total finished product is containing 3,922 unique cards, because the amount of cards inside cEDH tournaments are very few. Well, 4,000 is not that small, but still. If you compare it to the amount of unique cards in casual, it's tiny. Basically, every player is playing the same identical card, more or less. Like, from decklist to decklist, sometimes the difference is 10 cards, which is also kind of why this system will only answer currently the top of the top most common cEDH decklist you can find. Now, from that, we get a, what is that? 569,040 different unique pairs. And among those, we have 53,619 that are confident. And as you can see here, we have an enormous amount of not confident, not finished pair. So there's a lot of pairs out there that don't have a single confident challenge in them so far. Once again, also why we don't have a good verdict on a lot of cards. For example, Lurrus have played in a few cEDH games, but we don't have a single pair with Lurrus yet, because Lurrus hasn't, I don't think I have 20 games with Lurrus yet. Let's actually take a look at that. So this is actually those 471 tournaments. Here is Lurrus. Yeah, so Lurrus currently has total game count 13. That means that there's not a single pair with Lurrus currently inside this system yet. Once Lurrus reaches seven more games, Lurrus will get an evaluation. So it's just a matter of time, but that's where the system stands currently.
45:49
Speaker A
And all of that, all those 471 games so far have basically taken 108 gigabyte. Over time, I'm going to make my computer into a data center. But that's it. That's more or less how it all works. That's more or less how we acquired different values on these cards and Commander decks currently right here.
46:53
Speaker A
Now, if we actually scroll down to that Azula casual, and let's go in and look at it. We currently have 70 cards in it, because it contains basic lands like swamps and mountains. So that decreases it from 100 to 70. Among pairs that is possible from 170, you have 2,415 possible. Among those, you have found 195 and 2,220 missing. However, you can have average pair power. So you can theoretically use this system kind of and see how many cards you're playing in this casual Azula deck that are somewhat cEDH. For example, Hullbreaker Horror inside this build is a casual, not a casual card. That's a cEDH card. Past in Flames is a cEDH card. Dualcaster Mage, and et cetera. But then you have things like Lightning Greaves, Aggravated Assault, and suddenly here there's a lot of cards we simply don't have any power ranking to whatsoever. But you could throw in any casual deck and get a small tiny picture of it if you wanted to. And in theory, you could sit with two different casual decks and just look like, oh, look here, you're playing a bunch of cEDH cards that are really strong, and such.
48:56
Speaker A
That's it for this video, and thank you so much for watching. Once again, if you want to talk with me about it, write to me on Patreon or Discord, if you are a patron. Or if you just want to talk and philosophize and send some feedback, feel free to write a comment below in the description below of this video. And if you want to actually see like a pair here and there, feel free to write one too, and I will look into it. As this system is going to acquire more games, like I said, something between 15 to 20 new games each week, it is naturally going to grow over time, and becomes something with a stronger certainty. Once it does, and I will definitely make update videos on it. But until then, have a great day, guys, and I'll see you all in the next one.

Get More with the Söz AI App

Transcribe recordings, audio files, and YouTube videos — with AI summaries, speaker detection, and unlimited transcriptions.

Or transcribe another YouTube video here →