Feature - Rejected (Big)Decimal DataType

Fluxxdog · Jun 4, 2012

Veracity said:
I see that two people voted "yes" for this. Now that Jason made ASH use longs and doubles internally and I fixed the I/O such that we can actually get such numbers into ASH with full precision, I believe this is sufficient for all the use cases people have presented.

We still need to store Meat as a long and change all the places that we parse or manipulate Meat internally to not use ints as intermediate values and such, but that's another feature request. See, for example, Ludwig von Miser's report on the G_D KoLmafia thread.

Is there any reason I can't close/reject this Feature Request now? Got any realistic Use Cases that longs and doubles can't handle?

Code:

float sum,points,avg; 
foreach x in $ints[2,2,3,4,5]{
	sum+=x;
	points+=1;
	avg=sum/points;
	print("Total: "+sum+" - Points: "+points+" - Average: "+avg);}

Total: 2.0 - Points: 1.0 - Average: 2.0
Total: 4.0 - Points: 2.0 - Average: 2.0
Total: 7.0 - Points: 3.0 - Average: 2.3333333333333335
Total: 11.0 - Points: 4.0 - Average: 2.75
Total: 16.0 - Points: 5.0 - Average: 3.2

That 7.0/3.0 is a tad off, but other than that, all other values are as expected. Any idea why that last digit came out as a 5 instead of a 3? Is that due to a limitation?

Catch-22 · Jun 4, 2012

Fluxxdog said:
Any idea why that last digit came out as a 5 instead of a 3? Is that due to a limitation?

It's the nearest decimal value that can be represented by the float, if you multiply it by 3 you'll still get 7.

Veracity said:
The best way is to use revision 11076, which fixes what you correctly characterized as a "bug".

Code:

> ash ((1.1**2)==1.21)

Returned: false

Would this also be characterized as a bug?

Anyway, I believe discussion on (Big)Decimal DataType has ended. File bug reports accordingly

Veracity · Jun 4, 2012

Fluxxdog said:
Any idea why that last digit came out as a 5 instead of a 3? Is that due to a limitation?

That's floating point arithmetic; there simply aren't that many digits of precision available in a double.

Here's your program in C:

Code:

#include <stdio.h>

double sum, points; 

void accum( int x )
{
	double avg; 
	sum += x;
	points += 1;
	avg = sum / points;
	printf("Total: %.17g - Points: %.17g - Average: %.17g\n", sum, points, avg);
}

int main()
{
	double sum,points,avg; 
	accum( 2 );
	accum( 2 );
	accum( 3 );
	accum( 4 );
	accum( 5 );

	return( 0 );
}

and here is the result of executing it:

Code:

$ ./foo
Total: 2 - Points: 1 - Average: 2
Total: 4 - Points: 2 - Average: 2
Total: 7 - Points: 3 - Average: 2.3333333333333335
Total: 11 - Points: 4 - Average: 2.75
Total: 16 - Points: 5 - Average: 3.2000000000000002

Veracity · Jun 4, 2012

Catch-22 said:
Code:

> ash ((1.1**2)==1.21) Returned: false

Would this also be characterized as a bug?

Well, let's see what's going on.

> ash 1.1

Returned: 1.1

> ash 1.21

Returned: 1.21

> ash (1.1 * 1.1)

Returned: 1.2100000000000002

In C:

Code:

#include <stdio.h>

int main()
{
	double a = 1.1;
	double b = 1.21;
	printf( "%.17g * %.17g = %.17g, rather than %.17g\n", a, a, a * a, b );
}

yields:

Code:

$ ./foo2
1.1000000000000001 * 1.1000000000000001 = 1.2100000000000002, rather than 1.21

No, I would have to say that ASH's results are a result of limitations in the floating point representation of 1.1 and 1.21, rather than being a bug in ASH.

Fred Nefler · Jun 4, 2012

Yes, one of the fun little quirks of computers. Nothing's actually precise. You need to round numbers off after a certain number of digits (by the point you'd expect errors like this to pop up in practice/application or earlier), or look for small ranges instead of exact values. I once found myself with nothing else to program in but Matlab, and I had to do exact integer calculations and comparisons, and that forced me to do a lot rounding or forced typecasting to deal with those .00000000000002 buggers. Hooray for tangential anecdotes!

xKiv · Jun 4, 2012

Veracity said:
Happy?

Not really, because I just *know* it's still curing symptoms instead of the cause. The floating point demon will still be waiting for unsuspecting coders, just less people will realize what's actually happening.

For example (yes, I am on 11076 ):

Code:

> ash float a=0.1; print(a+a+a+a+a+a+a+a+a+a);

0.9999999999999999

(apparently it would be 1.0000001 with floats - this just to illustrate that testing a simple "<=" doesn't give a complete picture)

Or

Code:

> ash print(0.1+0.1+0.1==0.3)

false

There's also this, which is rounding, but maybe not as you think you know it:

Code:

> ash print(2.2999999999999996)

2.2999999999999994
Returned: void

> ash print(2.2999999999999997)

2.3
Returned: void

> ash print(2.2999999999999997==2.3)

true
Returned: void

The biggest problem I have with floats in programs is, as http://firstclassthoughts.co.uk/java/traps/java_double_traps.html(<- link) puts it: "What you see is not what you get".

ETA: just so that we are on the same page: I am not asking anybody to implement BigDecimal in mafia. (but my offer to try to do so, provided there's a chance it gets in if not too horrible, still stands)

Fluxxdog · Jun 4, 2012

OK, so ultimately float will always have some problems with precision. Does BigDecimal eliminate or reduce this without introducing other problems? Can I, for example, calculate 24/9 out to 15 places and then multiply it by 3 to guarantee a result of 8?

heeheehee · Jun 4, 2012

arbitrary degrees of precision for 4/9 capable via rational.ash. It handles (as the name suggests) all of your rational needs. edit: well, unless the numbers get too big. I'm working on an in-ASH BigDecimal (BigNum) equivalent, which would solve that issue (BigDecimals wouldn't solve this problem directly).

(only irrationals in KoL, IIRC, are basement calculations and... square-rooting in some places like DA?)

Fred Nefler · Jun 5, 2012

Square roots and irrationals also appear in many familiar formulas, especially meat and item drops. You only need like 5 or 6 digits of accuracy to do those calculations as accurately as KoL would, though.

heeheehee · Jun 5, 2012

Yep, in any actual use-cases (i.e. something in KoL), you'll be fine with double precision (since that's what KoL uses, I think, and you want the same amount of imprecision).

Catch-22 · Jun 5, 2012

heeheehee said:
Yep, in any actual use-cases (i.e. something in KoL), you'll be fine with double precision (since that's what KoL uses, I think, and you want the same amount of imprecision).

KoL is built largely on PHP which typically uses the IEEE 754 double precision format.

Please don't take this as a jab at you in any way because I love the work you do, but I do think it's a little naïve for people to apply the restrictions of KoL to all applications within ASH.

For example, if the maximum meat you can have on hand in KoL is 2^32 that doesn't mean we should only ever assume people will be performing calculations on quantities of meat no larger than that (such as networth.ash calculating your total worth).

As another example, if we were to say "It's impossible to spend or acquire anything other than whole values of meat, so there should be no need to represent meat as anything other than a whole value." then we would be killing off the _meatpermp value used by Universal_recovery.ash.

Fred Nefler · Jun 5, 2012

Game's a bit more discrete than that, actually.

Fluxxdog · Jun 5, 2012

Catch-22 said:
As another example, if we were to say "It's impossible to spend or acquire anything other than whole values of meat, so there should be no need to represent meat as anything other than a whole value." then we would be killing off the _meatpermp value used by Universal_recovery.ash.

I gotta say that's a bad analogy. In UR you're not measuring meat, you're measuring meat per MP. A speedometer doesn't measure miles, it measures miles per hour. I get what you're saying though. I think where 3hee is coming from is establishing a base line standard. Since mafia is entirely dependent on KoL for it's information, using their methods would give a minimum standard to work from. ... From which to work. In the case of meat and intergers in general, they've elected to use a higher standard and allow integer storage in the quintillions instead of a couple billion that KoL uses.

The big issue between the two arguments seems to boil down to the level of precision that is wanted. BigDecimal is capable of theoretical infinite precision, but we have to apply a cutoff somewhere. On the other hand, float and double, if I understand this correctly, have not only preset levels of precision but are also approximate values due to the way bits are recorded. The confusing thing to me is why can't float do simple additions without introducing error, like the .1 added 10 times? As a relatively large value compared to the .0...2, it shouldn't be introducing such an error that you get false boolean results such as ==1 is false. Or is that caused by something else that's even more out of my understanding?

xKiv · Jun 5, 2012

Fluxxdog said:
The confusing thing to me is why can't float do simple additions without introducing error, like the .1 added 10 times? As a relatively large value compared to the .0...2, it shouldn't be introducing such an error that you get false boolean results such as ==1 is false. Or is that caused by something else that's even more out of my understanding?

The *why* isn't confusing at all. Modern computer languages use only binary floats. That means you can only represent numbers that are some sum of powers of two.
Like 1/4=0.25. Or 1/2+1/4=0.75.
0.2 is 1/5 and as such won't ever be expressed exactly in binary (because gcd(2,5)==1, they are both primes). You can get very close, but never exact. What happens when you print(0.2) is that the computer stores the internal binary representation of 0.2 (which is not equal to our 0.2), then it knows how to fool you by printing the shortest decimal number that would get converted to that binary representation (which is 0.2).
With doubles, this will also work for 1.2*5==6, because the result will accidentaly get rounded the right way, or something like that (or the FPU's multiplication tables are actually *wrong* in that they will say 1.199999999999992347615487652734 (or whatever that actually is) * 5 == 6. That's wrong if you look at the binary floats, but produces results that humans like better most of the time; but that would be cheating).

0.1 is even more far away from being exactly representable by a binary float (by exactly 1 bit of precision, incidentally), and 0.1d+0.1d is already potentially different from 0.2d (the double-precision number closest to 2*X is not necessarily the same as 2*the double-precision number closest to X, because you have that 1 bit of precision difference) (.. actually, 0.1d+0.1d==0.2d, but 0.1d+0.1d+0.1d!=0.3d, but 0.1d+0.1d+0.1d+0.1d==0.4d, but 0.1d+0.1d+0.1d+0.1d+0.1d!=0.5d ... there's a bit of something somewhere off due to how the rounding ends up being different for different numbers).

Also, *any* error greater than 0 will cause "false" comparison results. That's why you are supposed to always compare floats/doubles like "if (abs(a-b)<=tolerable_difference) { .... seems legit ... }"

slyz · Jun 5, 2012

xKiv said:
That's why you are supposed to always compare floats/doubles like "if (abs(a-b)<=tolerable_difference) { .... seems legit ... }"

So THAT'S why all those old FORTRAN routines I had to use compared numbers like that. Knowing even a little bit about all this (and about exotic things like compilers/debugers) would probably have spared me weeks of wondering why my material behavior model was failing so hard. It turned out that fortran ints aren't the best place to store the results of exponents of big numbers.

Catch-22 · Jun 6, 2012

The trouble with using a tolerable difference, or an epsilon value, is that across a wide range of values the tolerable difference can become intolerable, depending on how accurate you want your calculations to be.

Bruce Dawson from Valve Software has a great article on comparing floats, as well as many other articles on the peculiarities of floating point maths, if anyone is interested.

Feature - Rejected (Big)Decimal DataType

Are you interested in a decimal DataType being added to ASH?

Yes

No

Fluxxdog

Active member

Catch-22

Active member

Veracity

Developer

Veracity

Developer

Fred Nefler

Member

xKiv

Active member

Fluxxdog

Active member

heeheehee

Developer

Fred Nefler

Member

heeheehee

Developer

Catch-22

Active member

Fred Nefler

Member

Fluxxdog

Active member

xKiv

Active member

slyz

Developer

Catch-22

Active member