A Multi-Armed Bandit PVP Script

WARriorer

Member
For most casual PVP players out there, we might not know how our current minigames strength stacks up against the general rest of the field. This script seeks to automatically solve that problem (to squeeze out that little extra swagger) so you don't have to.

The implementation is an epsilon-decreasing multi-armed bandit strategy (from 0.9 to 0.05 linearly over ~1000 pvp fights minis) which attempts to estimate your best minigame and plays that mini (1-epsilon)x100% of the time, and otherwise chooses to play the other minis uniformly at random. (Advanced users can probably code up a more complex strategy if so desired [e.g. Thompson sampling or UCB, although a big hurdle would be to get gaussian/beta distributions in the ASH script] - this was just a pretty simple implementation).

The default implementation is the Bernoulli Thompson bandit, but users who prefer other bandits may run
Code:
PVP_MAB strategy=UCB|epsilonGreedy|gaussianThompson|Exp3
to use that instead.

This script has a hardcoded dependency on UberPVPOptimizer, but may otherwise be run out of the box (it currently accepts no arguments and will simply use all of your remaining pvp fights). It should reset the stats every time the season changes (unfortunately, this means that even if a mini is repeated in the future, we do not consider how it has performed historically in the previous PVP seasons), so unless there's a drastic change to the peevpee.php?place=fight page, there's pretty much no upkeep to be done on the script.

To install the script, simply type the following into the KoLMafia CLI
Code:
git checkout https://github.com/Pantocyclus/PVP_MAB.git release



Edit: It currently also does not track the difference between HC and non-HC stats, and simply assumes that they are one and the same thing.

Edit 2: Updated to choose non-uniformly (or rather, uniformly across the [padded] win percentages so that minis with better win:loss ratios are more likely to be chosen - i.e. a mini with a 20% win percentage will be twice as likely to be chosen as compared to a mini with a 10% win percentage) when exploring (i.e. not playing the current statistically best minigame).

Edit 3: Updated to carry stats over into Post-Season since the minis haven't changed yet.

Edit 4: Fixed some errors in parsing pirate season minigame names. Arrrgh

Edit 5: Fixed a bug on line 139 - changed PVP_choice into pvp_choice

Edit 6: The script now tracks the winrate of each minigame encountered over the course of every single fite, which should give a more accurate estimate of the offensive strength of each minigame. (Previously, it tracked the winrate of the selected/guaranteed minigame for each fite, and recorded the overall win/loss of each fite instead of each minigame).

Edit 7: set PVP_MAB_reduced_verbosity=true to reduce the verbosity of the script.

Edit 8: It should now properly track your overall season's wins and losses, from the date you update to this newest version of the script (it does not parse your historical fights). Also added a pref PVP_MAB_use_meteoriteade to acquire and use meteorite-ades if they are available or acquireable for < 10k meat.

Edit 9: Reduced minimum epsilon to 0.05 since we get already get significant amounts of data about the other minis since the previous update (which tracks individual minis each fite). Also added some refactoring to print out the win rate for the day/session (across your multiple legs if you're looping).

Edit 10: Fixed bug trying to loot in HC. We will attack for fame instead.

Edit 11: Fixed error in tracking whether we won or lost a mini. (I would have liked to push the blame to the fact that there exists 2 different html results depending on whether the user enabled the "Use compact PvP mode" under the vanilla KoL Interface Options, but it turns out that the parsing was actually broken in both). Also, ties are now a thing.

Edit 12: You can now checkout the script from GitHub to keep it updated (whenever it does rarely get updated). The github script has been converted to TypeScript, so do delete the ASH script if you intend to switch over (which you should).

Edit 13: Deprecated arbitrary parameter-dependent epsilon-greedy algorithm in favour of parameter-free UCB1 and Gaussian Thompson Sampling algorithms. Also fixed bug regarding Drunken Season having more than 12 minis.
 
Last edited:
Oh, wow. This sounds really neat. I just guess which pvp minigame will be the best for me each season, but this sounds so much better.
 
With the season just ending (post-season just started) I thought I would post my stats from running this script since I started using it.

> Season 60 minigames statistics:
> - Barely Dressed: 29/67 (43.2%)
> - Basket Reaver: 47/96 (48.9%)
> - Polar Envy: 43/78 (55.1%)
> - Maul Power: 2533/4108 (61.6%)
> - Grave Robbery: 36/78 (46.1%)
> - Most Things Eaten: 50/95 (52.6%)
> - Visiting the Cousins: 35/73 (47.9%)
> - Northern Digestion: 21/51 (41.1%)
> - Hibernation Ready: 67/135 (49.6%)
> - What's in the Basket?: 46/94 (48.9%)
> - Bearly Legal: 38/74 (51.3%)
> - Beary Famous: 49/86 (56.9%)
> This season's win rate: 2994/5035 (59.4%)

This was achieved without any preparation for any specific mini, and it's also nice to see that my season win rate is (1) better than 50%, and (2) it's pretty close to the win rate of my best mini [as it should be]. Naturally these win rates don't apply to anyone else other than me, since each individual's win rate is a function of their character state (which is highly likely that it's different from mine).

Will be interesting to see how it fares in the next upcoming season.
 
My numberology results:

Code:
> Season 63 minigames statistics:
> -  Rule 42:  44/49 (89.7%)
> -  Baker's Dozen:  33/36 (91.6%)
> -  I Like Pi:  24/29 (82.7%)
> -  80 Days and Counting:  21/26 (80.7%)
> -  Tea for 2, 3, 4 or More:  51/58 (87.9%)
> -  Back to Square One:  980/1141 (85.8%)
> -  15 Minutes of Fame:  31/34 (91.1%)
> -  Fahrenheit 451:  47/51 (92.1%)
> -  Most Murderous:  55/61 (90.1%)
> -  The Purity is Right:  32/38 (84.2%)
> -  Zero Tolerance:  44/48 (91.6%)
> -  That Britney Spears Number:  33/37 (89.1%)
> This season's win rate: 1395/1608 (86.7%)

This was fun to use! Interestingly it ended up choosing Back to Square One the majority of the time, even though I think Rule 42 was most likely the best choice (I always picked up a towel before running PVP)
 
Oh huh interesting - it may be because your win rates are pretty high (there is some padding that adds 7W and 7L to smooth out the variance slightly for minigames with fewer games played, so that does mean that in weird scenarios [like yours, where the win rate deviates far from 50%] it may not choose the optimal minigame)
 
Last edited:
It currently only uses data from your own attacks, and does not attempt to parse the win/loss result of each of the minigames played within each attack - it only looks at which minigame was selected as the guaranteed one, and parses whether the attack was a win or loss overall.

(These are definitely changes that can be made to improve the script, although I would expect that for a casual player your outfit when you're attacking would be different from your outfit when you're defending, so I'm not sure if it would be useful to track the stats of attacks against you)
 
Yeah, I don't think tracking defense is valuable for something like this.

Individual tracking of the minis could be cool though
 
I think individual tracking of the minis would give more accurate feedback than winning/losing the entire fite. You would also be getting feedback on 7 of the 11 minis every single fite, which is a lot of information. Honestly, with that, you could just choose your winningest mini every turn, and let the 6 other random choices do all the exploring you need.
 
Another season finished, another set of results:

Code:
> Season 64 minigames statistics:
> -  Least Bland:  42/53 (79.2%)
> -  Spice Farmer:  45/51 (88.2%)
> -  That's a Spicy Tomato:  32/45 (71.1%)
> -  Iron Palate:  45/52 (86.5%)
> -  Balanced Diet Comparison:  42/51 (82.3%)
> -  Fashion Show:  35/42 (83.3%)
> -  New Tastes:  39/44 (88.6%)
> -  Snootee Customer Loyalty:  42/49 (85.7%)
> -  Who Runs Bordertown?:  45/53 (84.9%)
> -  Briniest Liver:  46/56 (82.1%)
> -  Upward Mobility Contest:  46/54 (85.1%)
> -  Freshest Taste:  1428/1630 (87.6%)
> This season's win rate: 1887/2180 (86.5%)


It REALLY liked Freshest taste this time, and seems fairly accurate
 
Script has been updated to decouple the minigames' winrates from the session's winrates. The session winrates now track the win/loss ratio of every fight, which the minigame winrates track the relevant win/loss ratio of each minigame within each fight (as per the feature requested above).

The season's win rate reports the sum of all the minigames won over the sum of all the minigames played over the entire season (as opposed to the sum of fights won over total fights; which in hindsight isn't too interesting, I'll probably fix it to track the winrates of fights across the season some other time)
 
Last edited:
Code:
Season 65 minigames statistics:
- Ice Hunter: 0/533 (0.0%)
- All Bundled Up: 2055/2262 (90.8%)
- Ready to Melt: 2592/3165 (81.8%)
- Purity: 2669/3605 (74.0%)
- Snow Patrol: 919/1888 (48.6%)
- Frostily Ephemeral: 817/945 (86.4%)
- Frozen Dinners: 0/2247 (0.0%)
- Foreigner Reference: 0/3081 (0.0%)
- Best Served Repeatedly: 3399/3841 (88.4%)
- Sharing the Love (to stay warm): 6218/6464 (96.1%)
- Burrowing Deep: 0/3592 (0.0%)
- A Nice Cold One: 3588/3771 (95.1%)
This season's win rate: 5545/7336 (75.5%)

Stats with the updated tracking feature. (Note that the season's win rate tracks the win/loss ratio of the overall fite, while the minigames' win rates track their own win/loss ratio [so the denominators of the minigames' win rates do not sum up to the denominator of the season's win rate])
 
I'm unsure but there might be something weird going on with the win/loss tracking? Perhaps I just don't understand how minis work in PVP

I won all but one mini according to this, but still lost the fight?

Code:
Chose mini: Scurvy Challenge
Preference availableSwagger changed from 5624 to 5625
You challenged Random Thief and lost the PvP fight, 3 to 4!
Shiverwarp lost 68 Muscleboundness
Shiverwarp lost 82 Magicalness
Shiverwarp lost 74 Chutzpah
We won the mini: Scurvy Challenge
Preference myCurrentPVPWins_7 changed from 15 to 16
Preference myCurrentPVPEpsilon changed from 0.7210022678375244 to 0.7200022807121277
We won the mini: Karmic Battle
Preference myCurrentPVPWins_4 changed from 20 to 21
Preference myCurrentPVPEpsilon changed from 0.7200022807121277 to 0.719002293586731
We lost the mini:
Preference myCurrentPVPLosses_0 changed from 26 to 27
Preference myCurrentPVPEpsilon changed from 0.719002293586731 to 0.7180023064613342
We won the mini: Most Unbalanced
Preference myCurrentPVPWins_6 changed from 18 to 19
Preference myCurrentPVPEpsilon changed from 0.7180023064613342 to 0.7170023193359375
We won the mini: Installation Wizard
Preference myCurrentPVPWins_8 changed from 19 to 20
Preference myCurrentPVPEpsilon changed from 0.7170023193359375 to 0.7160023322105408
We won the mini: What is it Good For?
Preference myCurrentPVPWins_9 changed from 22 to 23
Preference myCurrentPVPEpsilon changed from 0.7160023322105408 to 0.715002345085144
We won the mini: Nog Lover
Preference myCurrentPVPWins_10 changed from 23 to 24
Preference myCurrentPVPEpsilon changed from 0.715002345085144 to 0.7140023579597473
Random Thief beat us!

This is what the actual fight looked like: 1683166240788.png
 
Last edited:
It looks like there's an error in the Scurvy contest. It lists the name of the player who LOSES the scurvy contest, but PVP_MAB seems to think the player's name showing up means that they've won.
 
I can't seem to replicate the Scurvy bug in the latest script update. It's correctly reporting the losses on Scurvy for me.

Just confirming that you're currently on the latest update?
 
Shifted the script onto GitHub and ported it into TypeScript. If any existing users intend to switch over, remember to delete the ASH file after checking out the above script from GitHub
 
Back
Top