The dangers of overanalyzing results.
I’m one of those guys who likes to have all of my results recorded. I want to know if I’m improving or not and whether changes to my game have had a positive influence.
I want to know which hero is best for me, I want to know everything. The numbers are going to tell me the answers right?
Modernleper addressed this subject a few weeks ago with regards constructed win rates. I’m going to build on that with regards to Arena.
Headline win rate
The question everyone wants answered is “What’s your Arena average?”
It’s also a question that needs so much context that it’s virtually impossible to answer. This article deals with why.
I’ve played approximately 300 Arenas since GvG was released. In some I was trying hard, in others I was picking deliberately amusing combos. In yet others I was trying out things I didn’t have a good grip on to improve my game. I also play the classes I’m good at less often than the ones I’m bad at.
How do I address these things to work out my true win rate.
Well, let’s do some of the obvious things. Firstly, I’ll only record results when I’m trying hard.
Already this presents some issues. Firstly I have to be absolutely honest with myself. I can decide whether to include it or not before I start. Okay, I can do that.
Secondly, I won’t record results when I pick up an amusing card on purpose rather than the strongest one. So when I take Hobgoblin to go with my Mana Wyrm/Annoyotron deck, I’ll just delete the run.
Wait. Am I not trying hard now? I was before though? Not so keen on this. The definition seems too hard to make consistent.
Okay, nevermind. We can all agree that I can just work out my favourite classes in order. I'll take the averages and work out how often they will come up and use that to work out how good I'll be if I just play my best classes.
Come with me, I want to show you something....
25 Arena runs is nothing.
This is a graph of all my streamed runs since GvG sorted by consecutive 25 run averages.
That is to say that the first point on the graph is runs the average of runs numbered 1-25, the second point 2-26 and so on.
This was done to generate some smoothing and your instinct would probably be that 25 runs would be a decent number to smooth things out. As you can see, that instinct is rubbish.
Some of these runs weren’t serious, sometimes maybe I didn’t get heroes I liked in the 25 runs, sometimes I was testing stuff out, but this attempt to smooth it says that my average is between 4.6 and 7.6, which is about as unhelpful as you could hope for.
Some parts of the line I can explain. I was mailing it in at the start trying to play safe and then I had a period where I did a series of coops with some incredibly strong players (those results not included obviously) and learned a lot. There was also a spike where mobile Hearthstone was launched, which is now known to have helped people’s averages. The final downswing feels beyond explanation. I couldn’t win for a week no matter what I did. I also lost some confidence and ended up reverting to the basics. This involved simply making sure I was picking good cards, ignoring curve, that kind of thing. It’s easy to blame bad luck, but I’m not blaming good luck for my upswing and that was almost certainly a factor too. Let’s try a different chart to try to make some sense of everything...
So now we get a different story. I went downhill a little before starting a learning curve (This is normal), got better, and then forgot everything I learned and went back to almost where I was before. This doesn’t really make sense, I can even verbalise some of the stuff I learned, let alone the improvement in instincts over 200 Arenas. Also, I’ve completely smoothed out one of the most important downswings of the year. That downswing happened. It hurt me. I felt it. I came out of the other side back to a decent win rate and although the downswing is showing, the recovery is not appearing yet. As the bad run filters out, the future line will go up as with the Last 25 graph.
It’s good that variance is leaving the graph. That’s what we want to happen, but we're losing detail as a tradeoff. I’m also still somewhere between 4.8 and 6.8 average, which isn’t very helpful.
Let’s have one last go...
So now we’re getting somewhere. It shows a rapid improvement where I put in the work and is pretty stable after that, with a slight dropoff with my recent downturn.
There’s another problem now though.
This is ONE HUNDRED Arena runs. That’s a couple of months or more for most people .
Worse still, if I want an even bigger sample (For instance all my games ever), then changes in skill level due to coaching, practice or breakthroughs take hundreds of games to impact the average. That’s useless.
Back to the beginning
At the start of the article I was talking about getting a more pure sample. Eliminating games that don’t matter, or don’t fit the theory I want.
Hopefully I’ve shown since then that we need a sample of something like 100 games before things begin to make any sense – and yet all the pruning I was talking about at the start reduces the sample, not makes it bigger.
Then look at all the excuses and justifications that have taken place throughout the three graphs. The 25 graph shows a pretty different story to the 50 graphs – from the same sample of results! I justified the results in different ways. This feels dangerously like a whole bunch of confirmation bias.
The 100 sample makes more sense to me. I can logically justify the average being lower than I expect using the same excuses that I explained at the start. I don’t get to pick my hero, I have other things going on at the same time, I’m picking cards to try out new strategies and sometimes just to entertain.
This has come out to an average of six or so, and I’m happy to add a bit for that. How much I add is my own taste, as we can’t really get a big enough sample to accurately determine how much, but we can definitely make a case for it being something . That’s not much use if I want to see if I’m better than my friends with similar records if we all add “a bit”, but I can see that I’m getting better at Hearthstone if I keep my recording conditions the same. I just don’t know exactly where I’ve gone from and to.
“Which is the best hero to choose?”
I’ve used this heading before, and most likely will do again, it’s a question that nearly everyone wants to know the answer to and I’ll take every chance to answer it that comes up.
You can see from the article that if you average 4.33 with Hunter over 12 runs and 4.6 with Mage over 15 runs that really you’ve learned nothing at all. You could be anywhere on the Last 25 graph.
It’s incredibly tempting to read into your stats and declare which are the best and worst heroes for you based on differences of 0.2 per run or something.
What you should actually be doing is questioning your own understanding of each class. Imagine you’re teaching someone else how to play each class. Work out which ones you have the best understanding of, and the chances are that they’re the best for you.
You will find sitewide averages on some sites that give you an ordered list of the best heros with proper sample sizes. Remember though that these aren’t YOU. Mage is probably played less badly than other classes by beginners. Rogue is probably played better compared to other classes by the best players in the world. Averaging these things once again blurs the reality that is YOUR personal Hearthstone skillset. I’ve written this paragraph before, but hopefully the context of this article helps really make it clear how important YOU are to your Hearthstone ability
Conclusion
What I’m hoping to have shown in this article is that using stats is a very dangerous thing.
If you try to make the sample selective to make it accurate, then you make it too small. The best you can do (unless you can play 100 arenas in 2-3 weeks, which some people can) is to keep a rolling average and infer things from it.
If you’re looking for an exact answer, then sadly, you’re not getting one.
It also has the effect of being misleading. You can convince yourself you’re amazing with your 7.4 average if you use too small a sample. You can get overly paranoid if you panic about your 4.6 average. The best thing to do is to keep the stats but let them look after themselves.
Not only all that, but you’ll stifle the learning process if you’re not willing to tank your results from time to time to try out new things. The one time my average was remotely stable was when I was terrified of losing and repeated the same old stuff over and over again.
Use the numbers as a guide but no more. Don't let them control your game.
Neil "L0rinda" Bond