So I want this to be a bit of a fast one. You could easily cover entire essays on this but I would not like to go into that level of detail.
Let's have a quick look at testing methodology. If you've requested a team build example from me at one point you might have noticed that I've chosen some songs, and I've always wondered if those songs are the best songs to use as examples for teambuilding, so I decided to do a quick study on this. Hopefully this should also answer some questions and shine some light on the general teambuilding process when it comes to validating team power.
Let's have a look at the following simulation result. What kind of opinion does this result create?
It looks like the first team is significantly weaker than the second one isn't it? The second team looks about 8% stronger. Well, how about we have a look at a different song?
Wait now the first team is better? The first team is 3.7% better in this example. What is going on here?
This is an example of the issues when testing teams on songs that are not very timer neutral. In the example above, Valkyria is one of the most 11s biased Lv30 alltypes in the entire game. The first team is 7s, which has an estimated coverage of only 61.63% on that song, while the second team is 11s, which has an estimated coverage of 69.13% which is a significant difference. Interestingly, the difference in coverage is almost exactly 8%, the difference we saw in the computed results.
The next example was computed with Hifi days, which is a 7s favored song. 7s coverage is about 65.85% while 11s coverage is at 62.12% which is about a 3.7% difference. This is why timer neutrality is important when testing teams, because it makes it hard to compare team vs team if each team has a different timer and each song has a different timer bias. This adds more confounding factors that makes it hard for you to make an objective judgement of which team is actually better.
Let's begin by having a look at the two teams again. One of the most popular songs of all time to test on is Samakani M+. That song is a very well known timer neutral Lv30 MASTER+ and you might have seen me used it a lot. In fact, here's the two example teams again in Samakani M+. Notice how small the difference is now.
Let's think about the most popular High timers. The vast majority of cards in this game are 7s, 9s and 11s. There are also less common high timers like 13s, and there's the rare and insanely good 4s. Because we only really need ONE chart to test I'll just list the top few MASTER+ songs including their bias%, which is the maximum coverage difference between one timer to another.
All Type WITHOUT 13s
All Type WITH 13s
An interesting to note fact is that all Alltype songs generally rank poorly in terms of neutrality. Samakani does not even make it to the top 20 most neutral songs when comparing all timers. 13s also dramatically increases the amount of maximum bias compared to without. In general for the top of the meta, 13s is not a relevant timer in alltype songs, but when considering any and all teams possible it would be preferred to consider 13s as well.
Cute WITHOUT 13s
Cute WITH 13s
Cute seems to have a consistent winner, where we have onedari as first place in both categories.
Cool WITHOUT 13s
Cool WITH 13s
Saite Jewel is actually the best chart ever made. It's the most neutral chart in the game and it's also the only chart in the game to have a bias of under 1% when including 13s timers!
Passion WITHOUT 13s
Passion WITH 13s
And finally for passion it seems like there's a lot of good lv30 choices here.
So what songs am I going to use in the future for testing? Judging by the data on hand I think I will stop using Evil Live M+ (2.8046% bias to 13s), Fascinate M+ (5.1532% bias to 13s) and Babel M+ (5.4633% bias to 13s) in favor of Twinkle Tail M+, Saite Jewel M+ and Onedari Shall We M+ which will provide greater accuracy when comparing between teams. All in the name of science to improve testing methodology and provide better analysis across the board.
PS: These values were computed with the settings set on (]
Is there a reason only MASTER+ was selected in this case? I think that going the MASTER+ route includes songs that have generally higher note count, which gives most timers (not just the 4 stated earlier) a better shot at being fair. MASTER+ is also the only difficulty that carries the Slide rhythm icon. Outside of MASTER+ there are actually several songs better than saite jewel MASTER+ in terms of raw bias values.
Also the note density matters for some cases, for example skills with healing effect benefit from high note count and also affect the viability of skills like life sparkle. I think that while it is possible to peg the note count at the average of all songs and look for a fair chart around that note count, but I don't think it gives you an idea about the full potential of the team that is being built.
This is also why I have a preference of choosing songs that are just higher in difficulty level if possible. Higher difficulty means higher multiplier which makes differences in composition much easier to see.
What else is there to consider? The note type distribution is one of them. Too much of any particular note type will make the score favor specific act cards. This is a difficult confounder to control as you don't really have a specific way of making sure the chart you're selecting is close to the average % slide/flick/whatever for the game. We simply have to accept that this is a problem that can exists, and prefer to choose non-act cards in general, while choosing act cards only when targeting specific charts.
Did you find the article interesting or helpful?
You can help keep this website running by becoming a patron or making a one time contribution with the links at the bottom of the page.
Become a Patron!
Thank you for helping keep aidoru.info ad free for everyone.