Familiar Arena Parameters

Veracity · Jun 19, 2015

I've been wondering for years if our model for how familiars stack up in the arena is correct. We started out with Vladjimir's arena data at The Rye. That doesn't seem to exist any more, but ever since I added the "Learn Familiar Strengths" facility to the Familiar Trainer, we've gathered our own.

(For what it's worth, the Familiar Trainer was my first big project for KoLmafia. I was still new to Java and there are many things I'd do differently were I to do this project again, but, it is what it is.)

For each contest, a familiar has one of four levels: 0, 1, 2, 3.

0 means that it sucks at this contest; when you enter it into that contest, there is a special message to point that out. For example, if you enter a Steam-Powered Cheerleader in the Hide and Seek Contest, you are told "Steam-Powered Cheerleaders aren't great at hiding, what with the shouting and the rustling pom-poms." A familiar that sucks at a contest automatically loses against a familiar that does not also suck at that contest. Against one that also sucks at the contest, there is a 50% chance that either wins, regardless of comparative weight.

A familiar that is rated 1, 2, or 3 is Poor, Average, or Good at that contest. This was recently confirmed by Twitch, when we saw Jick add a familiar to the game and each arena contest had a dropdown with exactly those options in it. But, what, exactly, does it mean for a familiar to be "Good" at a contest?

I am sorry to say, but I don't remember the name of the person who wrote FamiliarTool.java, which is used to determine the "best" opponent/contest/weight for training your familiar, given the current set of opponents. I tried looking at the SVN logs, but the original FamiliarTool/Familiar Trainer went in via CVS, before we migrated to SVN. But the model is this: each rank in a contest is the equivalent of 3 pounds of weight. If your familiar's buffed weight + rank *3 is the same as a particular opponent's weight + rank *3, it's 50/50, but if you exceed the other one by 3 (or more), you will win (almost) all the time.

So, for years, I've wondered the following:

1) Are there really 3 non-zero ranks? (Twitch says YES)
2) Is each rank really the equivalent of 3 lbs.? Why not 4, or 5, or ...?
3) If not 3, what is the "optimal" weight to exceed an opponent by?

When I derive familiar parameters, I make sure that I have the following equipment in inventory:

Hat: crumpled felt fedora = +10
Hat: plexiglass pith helmet = +5
Accessory: tiny plastic mosquito = +1
Accessory: tiny plastic mosquito = +1
Accessory: tiny plastic mosquito = +1
Familiar: lead necklace = +3
Familiar: rat head ballon = -3
Familiar: das boot = -10
Familiar: little bitty bathysphere = -20

(plus my inherent Amphibian Sympathy, which is +5)

This allows the Familiar Trainer to adjust my familiar's weight quite nicely vs. opponents weights to hit the "optimal" weight vs. today's opponents. I also run 12 iterations over each contest, which takes 144 turns, if the familiar has no "0" contests. And lastly, I start my familiar at 0 experience. That probably doesn't matter, depending on the particular set of opponents, since it is possible to add -20, -10, or -3 to the weight in order to get it down to the target familiar's weight, but it might be easier to have the option to go up or down...

The results that I see are good enough to let us set the parameters in familiars.txt such that the Familiar Trainer can get max XP for training that familiar, but do nothing to make me abandon questions 2 & 3...

Take a look at what I found today, when I trained Puck Man and Ms. Puck Man. Yes, I know that they are "the same" familiar when equipped and in action, but there is no reason to believe that the arena parameters have to be "the same".

Code:

Results for Puck Man after 12 trials using 144 turns:

Contest         XP[1]   XP[2]   XP[3]  Original Rank Derived Rank
Cage Match      59      30       0          0               1
Scavenger Hunt  49      50      50          0               2
Obstacle Course 46      60      25          0               2
Hide and Seek   28      49      60          0               3

Results for Ms. Puck Man after 12 trials using 144 turns:

Contest         XP[1]   XP[2]   XP[3]  Original Rank Derived Rank
Cage Match      53      15      10          0                1
Scavenger Hunt  52      54      30          0                2
Obstacle Course 38      52      40          0                2
Hide and Seek   27      45      58          0                3

The best possible score is 60 - earning 5 XP for all 12 contests.

Those don't really look the same, although they have the same end parameters. Look at the Scavenger Hunt.

Puck Man at Poor/Average/Good got 49/50/50 experience. It could be any of those.
Ms. Puck Man at Poor/Average/Good got 52/54/30. The first two were very similar, and both were better than the third.

What would it take to answer questions 2 & 3? I have the following assumptions:

- Poor/Good/Average are the equivalent of +X/+Y/+Z lbs. of weight.
--> X/Y/Z don't have to be the same
- Poor/Good/Average mean the same thing for all four contests
--> We can test using one contest, rather than testing all 4.

So, what to do? Something like the following, perhaps:

- Start with a single max-weight familiar
- Choose a single contest where today's opponents have at least one familiar with the same skill rank as your chosen familiar
- Iterate your familiar from weight -10 to weight +15 (say) and note the experience you earn
- Ponder the results.

Try the same thing matching your Average familiar vs. a Poor Familiar.
Ditto for Good vs. Poor.

Thoughts? Suggestions? Refutations?

lostcalpolydude · Jun 19, 2015

Veracity said:
Yes, I know that they are "the same" familiar when equipped and in action, but there is no reason to believe that the arena parameters have to be "the same".

I would consider it a KoL bug if they had different arena parameters.

heeheehee · Jun 19, 2015

For what it's worth: web archive has the old Rye info.

Code:

function runFight( mybuddy, enemy, event )
{
	var enemyStars = enemy.stars[event];
	var enemyAbility = enemyStars * 3 + enemy.weight;
	if ( enemyStars == 0 )
		enemyAbility = 5;

	var myStars = mybuddy.stars[event];
	var myAbility = myStars * 3 + mybuddy.weight;			
	if ( myStars == 0 )
		myAbility = 5;
	
	var losses = 0; var fives = 0; var fours = 0; var threes = 0; var twos = 0;

	// RNG best / worst case run through (both sides)
	for ( var n = 0; n < 5; n++ )
	{
		var myRNG = myAbility - n - 2;
		for ( var m = 0; m < 5; m++ )
		{
			var enemyRNG = enemyAbility - m - 2;
			var margin = myRNG - enemyRNG;
			if ( margin < 0 )
				losses++;
			else if ( margin <= 5 )
				fives++;
			else if ( margin == 6 )
				fours++;
			else if ( margin == 7 )
				threes++;
			else
				twos++;
		}
	}
	return { 'lose': Math.floor( (losses / 25) * 100 ), 'five': Math.floor( (fives / 25) * 100 ), 
		'four': Math.floor( (fours / 25) * 100 ), 'three': Math.floor( (threes / 25) * 100 ), 
		'two': Math.floor( (twos / 25) * 100 ) };
}

From some initial testing, I found that scores seem to be uniformly distributed; the above tool also suggests that they're uniformly distributed. I of course don't have a sizable dataset to back me up on this, though.

I would say that the arena trainer should also be updated to account for bonus familiar exp, since that should make you more risk-averse (as of 2009).

edit: I guess they look triangular enough, now that I have some more data.

heeheehee · Jun 19, 2015

I took the liberty of creating a patch for this, and the results seem more or less as expected: if I increase familiar experience (e.g. via pulled blue taffy), then losing is seen as that much worse.

Since I'm still not totally convinced either way whether scores are triangularly or uniformly distributed, I included functions to compute both expectations.

Familiar Arena Parameters

Veracity

Developer

lostcalpolydude

Developer

heeheehee

Developer

heeheehee

Developer

Attachments