Feature - Rejected (Big)Decimal DataType

xKiv · May 31, 2012

You want to see one piece of code so that you may tear it up, and fail to consider that the world of programming is infinite?

I think you *do* have a point somewehere, just expressing it in an extremely wrong way.
And I am not asking anybody to implement BigDecimals, I am just feeling extremely offended by your argumentation. Like you are dismissing anything you haven't experienced yourself, or something.

roippi · May 31, 2012

I think I'll just step out of this thread because I find myself arguing about a thing that I really really don't care about, either way.

Catch-22 · Jun 4, 2012

fronobulax said:
I'm sorry, but anyone who uses the phrase "perfectly accurate" when describing computer calculations either has their tongue firmly in cheek or lacks some basic understanding of numerical analysis or whatever term one might use to describe computational mathematics using a computer.

Apologies if my phraseology is slightly out, to clarify what I mean by "perfectly accurate":

Code:

> ash (1.3*10)

Returned: 12.999999523162842

Not perfectly accurate.

Code:

> ash (1.3*10)

Returned: 13

Perfectly accurate.

Code:

> ash (99999993857.0 < 100000000000.0)

Returned: false

Not perfectly accurate.

Code:

> ash (99999993857.0 < 100000000000.0)

Returned: true

Perfectly accurate.

I don't see how there can be any disagreement that floats and decimals are useful for different purposes.

What I'm asking for is a reason why this shouldn't be done (ie. why it might be a bad idea).

Bale · Jun 4, 2012

Catch-22 said:
What I'm asking for is a reason why this shouldn't be done (ie. why it might be a bad idea).

People have already told you a reason: Big Decimals have MUCH slower calculation speed. If you want to counter with the argument that mafia's calculations don't need to be fast for its purpose since server hits are the real bottle-neck, then I'd have to counter with the fact that our purposes don't require many decimal places of accuracy either.

BigDecimals are a solution for which we do not have the matching problem.

Catch-22 · Jun 4, 2012

Bale said:
BigDecimals are a solution for which we do not have the matching problem.

I suppose the depends upon how one would define "we". Some ASH scriptwriters may prefer accuracy over speed. Bare in mind, I'm not asking for anyone to use BigDecimals over floats or forcing them upon anyone who wants to retain their current data structures, merely proposing support for an alternative DataType.

According to the poll, at least one person other than myself would be interested in such an alternative. The feedback I am looking for is whether or not support for the new DataType would negatively impact existing users of KoLmafia (ie. a bad idea) and how.

This thread should not be used for discussion as to which DataType is superior.

Bale · Jun 4, 2012

Catch-22 said:
I suppose the depends upon how one would define "we".

Okay. Please tell us why you actually need that many actual decimal places of accuracy? What application do you have for which 7 decimal places isn't enough? If your only answer is that you find less accuracy to be aesthetically displeasing, then this is a "want," not a "need."

If you do have an application for this solution, then you are not part of the "we" I imagined. Please explain my misconception.

Catch-22 · Jun 4, 2012

Use case: A banker bot built on ASH that loans meat, keeps track of players debts, transaction history, provides weekly statements and charges interest.

xKiv · Jun 4, 2012

Bale said:
Okay. Please tell us why you actually need that many actual decimal places of accuracy?

Why do you call 1 "that many"?

ETA: anyway, hypothetically, if I were to get off my ass, sit down, and, as an exercise, code BigDecimal support into mafia in a way that wouldn't affect anything that exists now (apart from adding some memory footprint to the .jar), what would be your response to that?
Who would object to adding that to mafia? (that is, assuming at least one dev would be inclined to code-review it)

Preliminary ideas:
- all operations would be new ASH functions/methods
- if possible, no new type; I am thinking about making it so that it would pretend it's a float everywhere except the new functions (by using BigDecimal.doubleValue() and BigDecimal.toString())
- the only way to create one would be to call one of the new functions (so something like "float big = new_big("0.0");", which I don't particularly like, but I want to avoid introducing new ASH types, at least for starters)

also

Bale said:
Big Decimals have MUCH slower calculation speed

Source please (with actual measurements, not just somebody saying "it's much slower").
Of course something that has desired properties (in this case: being able to represent all finitely-long decimal numbers, which is the kind we put in our source files and data files -- there will still be non-accuracy with any results that can't be represented as finitely-long decimals) will be slower than something that doesn't have them.
Of course heap allocation and garbage collection will happen more often.
Will that matter in an application which is often bottlenecked by network performance and human interaction?
Are we *competing* on performance?

Of course, for many purposes, fixed-point is good enough:

Code:

long divisor=1000; /* 3 places of fixed-point precision */ long fixed_a=20; /* actually is 0.02 */ long fixed_b=5000; /* actually is 5 */ long fixed_c=fixed_a*fixed_b/divisor; /* == 100, which actually is 0.1 */ string fixedToString(long num) { // something here to properly (that means no floats!) convert num to string } print(fixedToString(fixed_a) + "*" + fixedToString(fixed_b) + "=" fixedToString(fixed_c));

Click to expand...

Bale · Jun 4, 2012

xKiv said:
Why do you call 1 "that many"?

1 decimal place? Please give me an example. Here's Catch-22's favorite example:

> ash (1.3*10)

Returned: 12.999999523162842

That answer is off by only 0.000000476837158, or in other words, it is accurate to within 6 decimal places. Do you have a counter example?

Catch-22 said:
Use case: A banker bot built on ASH that loans meat, keeps track of players debts, transaction history, provides weekly statements and charges interest.

Nope. I'm not feeling it. 6 decimal places of accuracy! I suppose you might possibly be wishing for BigFloat which will be twice as much accuracy without the dramatic slowdown in processing. With that much accuracy so you cannot possibly have a real problem, but I'm not really feeling the need.

xKiv · Jun 4, 2012

Bale said:
1 decimal place? Please give me an example. Here's Catch-22's favorite example:
> ash (1.3*10)

Returned: 12.999999523162842

That answer is already wrong *before* the decimal point. Because 1.3 isn't 1.3. 1.3 was precise to the first decimal point, float(1.3) isn't.

Nope. I'm not feeling it.

Then stop using it as an argument, please.

6 decimal places of accuracy!

Nope. Accuracy to within 10**(-6), but that's not the same thing.
6 decimal places of accuracy means 6 decimal places that stay the same. Which is impossible with binary floats, at all, no matter how close you get.
"it's the same after rounding" isn't the same as "it's the same".
(umm ... ETA^2: I mean ... 1.3 is a number with 1 decimal place 1.29999999999 is a number with 11 decimal places, and none of them are the same (that "decimal" in there is important, it's different in binary - in binary, 1.3 simply ... isn't), when I say "1 decimal place of precision" I mean I want all numbers with 1 (or less) decimal places to stay the same without any loss of precision anywhere)

I suppose you might possibly be wishing for BigFloat which will be twice as much accuracy without the dramatic slowdown in processing. With that much accuracy so you cannot possibly have a real problem, but I'm not really feeling the need.

BigFloat has exactly the same problem float (and double) has, namely that it's built for binary floats. You get subtle invisible changes to your numbers at compile/construction time, then spend fruitless hours trying to debug the *rest* of your code looking for bugs that aren't there.

(what I *really* want (not need, just get excited about) is a MathNumber that would be able to precisely express any number expressible with chalk and blackboard; for example all rational numbers, pi, e, all algebraic numbers, ... - but then we are talking about integrating something like mathlab, and a performance hit so big it would kill dinosaurs retroactively)

Catch-22 · Jun 4, 2012

Just thought I would note that Zarqon seems to have asked for something similar in an earlier thread and the feature was rejected (seemingly because nobody could be bothered doing it?).

Let's be clear, in my use case we are talking about financial data. There are plenty of documented cases as to why you should not use approximate numeric data types when working with financial data but, as I've said before, this thread is not intended to be a discussion of floats vs. decimals.

Also, to be fair, I have only used ash (1.3*10) as an example once

xKiv · Jun 4, 2012

Benchmark time!

For addition:

Code:

package cz.xkiv;

import java.math.BigDecimal;
import java.text.DecimalFormat;

public class XTest {

	public static final long UPTO = 500000000l;
	public static void main(String[] args) {

		int pass = 0;
		
		for (pass = 1; pass <= 3; pass++) {
			System.out.println("pass " + pass);

			BigDecimal res = BigDecimal.ZERO;
			BigDecimal addd = new BigDecimal("1.0000000001");
			double dres = 0d;
			double daddd = 1.0000000001d;
			
			long ts_start = System.currentTimeMillis();
			
			for (long i = 0; i < UPTO; i++) {
				res = res.add(addd);
				if (0 == i % 10000000) {
					System.out.print(".");
				}
			}
			long ts_end = System.currentTimeMillis();
	
			System.out.println(res.toString() + " - " + (ts_end - ts_start));
	
			ts_start = System.currentTimeMillis();
			
			for (long i = 0; i < UPTO; i++) {
				dres = dres + daddd;
				if (0 == i % 10000000) {
					System.out.print(".");
				}
			}
			ts_end = System.currentTimeMillis();
	
			DecimalFormat df = new DecimalFormat("#.###########");
			System.out.println(df.format(dres) + " - " + (ts_end - ts_start));

			long lres = 0;
			long ladd = 10000000001l;
			ts_start = System.currentTimeMillis();
			
			for (long i = 0; i < UPTO; i++) {
				lres = lres + ladd;
				if (0 == i % 10000000) {
					System.out.print(".");
				}
			}
			ts_end = System.currentTimeMillis();
	
			System.out.println(Long.toString(lres) + " - " + (ts_end - ts_start));
}

	}
}

results: (three passes to rule out startup effects; numbers after "-" are times in milliseconds to finish)

Code:

pass 1
..................................................500000000.0500000000 - 41079
..................................................500000000,0001163 - 12671
..................................................5000000000500000000 - 12422
pass 2
..................................................500000000.0500000000 - 45891
..................................................500000000,0001163 - 12562
..................................................5000000000500000000 - 13938
pass 3
..................................................500000000.0500000000 - 45625
..................................................500000000,0001163 - 12171
..................................................5000000000500000000 - 15016

Addition with BigDecimal is roughly 3-4 times slower than doubles. This is even better than I expected, and much better than what some people would have me believe.
Note how fixed-point (with 64-bit longs) is slightly slower than floating point, *and* has better precision (because it has 64 bits for the entire signed integer, while double has to devote some bits to the exponent).

Same thing with multiplication (longs now need scaling with the fixed-point base, so one extra division per multiplication), results only:
starting with zero, so the result should be zero:

Code:

pass 1
..................................................0E-2147483647 - 59531
..................................................0 - 14063
..................................................0 - 28297
pass 2
..................................................0E-2147483647 - 65203
..................................................0 - 13235
..................................................0 - 29546
pass 3
..................................................0E-2147483647 - 58266
..................................................0 - 13063
..................................................0 - 27359

BigDecimal is slightly slower here (roughly 5 times slower than double, say), and slightly confused about the exponent.

starting with one, so the multiplication actually does something:
...
well, now we are talking slow. I had to reduce the iterations significantly. 10000 times. Apparently default java implementation is M*N where M and N are numbers of significant digits in the multiplicands, which is ... very suboptimal algorithm?
Changing the multiplicand from 1.0000000001 to 1.1 ... first few "." are much faster then the last ones.
Doing only one pass, because the long tails at the end are long. Each multiplication adds 1 decimal place to the result, so we are now talking numbers with 50k decimal places. That means I am seeing a lot of performance lost to garbage collection too, and possibly swapping because I am doing this on a computer that is already memory starved before running another JVM.
Only doing 1 pass too, otherwise my console gets garbled

Code:

pass 1
..................................................[insert very long number here, I don't want to spam the server] - 1890
..................................................∞ - 0
..................................................-921991187 - 0

Now we are talking infinitely slow

Next try, the same but rescaling the BigDecimals to 10 digits after every multiplication (10 is what I am using for the fixed-point longs, except that BigDecimals will handle that properly even with numbers much larger than 1), with 5k iterations:
(edit: this is wrong results, I rounded every number to 10 digits, but then threw away te rounded number and used the non-rounded ...)

Code:

pass 1
.................................................. [insert very long number here] - 10859
.................................................. [insert another long number here, which is the same in the first 12 digits, different in 4 more digits, zeroed out in many many digits after that, and doesn't even *have* a decimal part] - 0
..................................................922152497 [completely wrong, but that's expected] - 0

Redoing this with actually using the rounded number:

Code:

pass 1
..................................................919233389885635058382647497107173754394058285838127213229214989249537223192967335883159291350381340682876744796509280825390008593162442569432185021507101782064811260496984702470941910972291462538984926091900.8425702751 - 47
..................................................919233389907432500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 - 0
..................................................922152497 - 0

And 500k iterations:

Code:

pass 1
.................................................. [looong number again] - 36343
..................................................∞ - 15
..................................................-922149003 - 31

Let's do less iterations of the multiplication (shorter numbers, more comparable computations), but repeated more times (to get the floating/fixed performance significantly above 0 ms). 50 multiplications, repeated 1M times.

Code:

pass 1
..................................................117.3908528769 - 13516
..................................................117,3908528797 - 1469
..................................................0000914651276 - 3422

There.
(aaand, I was doing the setscale wrong, and have to redo the last two, boooo!)
that's only 10 times slower at slightly better precision (tries to maintain decimal precision in all 10 decimal places, where doubles can't do that in the last few).

So ... three orders of magnitude? Not a chance.
You get "three orders of magnitude" "worse" performance when you are doing something that doubles can't do at all. Like saying that a flight around the world is slower than a walk to your refrigerator, or something silly like that.

fronobulax · Jun 4, 2012

fronobulax said:
I'm sorry, but anyone who uses the phrase "perfectly accurate" when describing computer calculations either has their tongue firmly in cheek or lacks some basic understanding of numerical analysis or whatever term one might use to describe computational mathematics using a computer.

Catch-22 said:
Apologies if my phraseology is slightly out, to clarify what I mean by "perfectly accurate":

Your examples indicate to me that it is the latter - you really seem to have no understanding of what it means do do numerical computation on a computer.

Veracity · Jun 4, 2012

Catch-22 said:
Code:

> ash (1.3*10) Returned: 12.999999523162842

Thanks for reporting this bug. Jason made ASH internally store data as longs and doubles, but he forgot to make the input methods parse into longs and doubles; instead they parse into ints and floats and are coerced into the larger data type.

Revision 11076 fixes this.

> ash 1.3

Returned: 1.3

> ash (1.3 * 10)

Returned: 13.0

Fluxxdog · Jun 4, 2012

EDIT: This was started just before Veracity made the above post and finished after. Dang ninja

I have a question relevant to this: How many places of precision do we practically need? Considering KoL uses integers up front, how many numbers of precision do we really need?

I admit myself, I heavily dislike the use of an imprecise number as it just screams at the math nerd in me. However, I see the value of teaching children 3.14 as a quick substitute for pi. We're sacrificing great precision for ease of use. As time goes on we learn pi is not only more than 3.14 but also has an infinite number of decimal places that can be calculated. In that vein, we only use the number of decimal places that are really relevant to the task at hand.

In KoL, we only get presented with integers. HP, MP, meat, adventures, power, familiar weight, etc. Calculations we then do on mafia's side uses those integers and need to return numbers that are meaningful to those same integers. If we calculate that a monster can do 13.2 points of damage to us, we say that the monster does either 14 or 13 points of damage using floor(), ceil(), or round() since that data is actually useful. If we have 14 hit points, decisions would be made differently. A monster attack would never leave us with .8 HP. So if we say 13 hit points, you can goof off for a round. If you have 14 hit points, you better OHKO that sucker or run.

The original problem occurred from modifier_eval() having problems handling scientific notation, specifically numbers that were very small. The new float that was introduced solved that problem, but introduced a new one: poor precision with larger numbers.

Code:

> ash (1.3*10>=13)

Returned: false

This is not just an issue of precision to a decimal place, but of boolean. This is just flat out wrong and one that I'd easily call a bug. This is where my biggest issue with the new float comes in. Since we generally do a lot of comparisons in scripts, especially with integers, having a boolean return false when it should be returning true will cause unexpected errors.

We now have positive integers that go all the way up to 2**63 (9,223,372,036,854,775,807 or almost 10 quintillion) which is more than enough to handle large amounts of meat, and pretty much just that. But how small do we really need decimals to go out? I find myself only really going out to the ten thousandth (or one hundredth of a percent). After a certain point, too small of a decimal is useless. As it stands now, the maximum unbuffed stat you can have is 65536 (total substat 2**32). In the same vein, that caps the basement as well, since there would then be a point that you could not proceed any further without new buffs being introduced, and that's not counting an estimated 5% random variance which pushes hundredths as more important than billionths.

In that token, I would argue our current length of 17 decimal places is more than we practically need (ash (length(.1)-2) with the -2 to eliminate the 0.) but length is irrelevant. What is relevant is the lack of larger number precision, such as the above code example. We have a problem with the tenths. I think it was this comment that made me realize that the current system needs further repair:

roippi said:
You don't need infinite precision for that, just rounding and leading-zero trimming.

Catch-22 said:
Indeed, this is very true and I have already written a function in ASH that would handle it perfectly.

In other words, every person who plans to use floats in their calculations is going to need a way to workaround the tenths digit at the least to make sure their calculations come out right.

In essence, my question is this: what would be the best way to cause "ash (1.3*10>=13)" to return a value of true without adding to the script line itself?

EDIT

Veracity · Jun 4, 2012

Fluxxdog said:
EDIT: This was started just before Veracity made the above post and finished after. Dang ninja

In essence, my question is this: what would be the best way to cause "ash (1.3*10>=13)" to return a value of true without adding to the script line itself?

The best way is to use revision 11076, which fixes what you correctly characterized as a "bug".

> ash (1.3*10>=13)

Returned: true

Fluxxdog · Jun 4, 2012

I swear to Krom, you had some 7th sense go off saying "Ooh, I could totally ninja someone right now!" Thanks ^^

Veracity · Jun 4, 2012

xKiv said:
I want this to return true (I am doing this in C, but the IEEE-defined binary [1] floats are the same):

Code:

float a = 1.2; float b = 6; return a*5==b;

Revision 11076 said:
> ash float a=1.2, b=6; (a*5 == b)

Returned: true

Happy?

Veracity · Jun 4, 2012

I see that two people voted "yes" for this. Now that Jason made ASH use longs and doubles internally and I fixed the I/O such that we can actually get such numbers into ASH with full precision, I believe this is sufficient for all the use cases people have presented.

We still need to store Meat as a long and change all the places that we parse or manipulate Meat internally to not use ints as intermediate values and such, but that's another feature request. See, for example, Ludwig von Miser's report on the G_D KoLmafia thread.

Is there any reason I can't close/reject this Feature Request now? Got any realistic Use Cases that longs and doubles can't handle?

Catch-22 · Jun 4, 2012

Considering the thread is going nowhere fast and some of the replies are becoming downright insulting, I'd say it's time for this to come to an end. Thanks Veracity and Fluxxdog for being open-minded enough to realize that there was a problem with the then current implementation of float precision. Also thanks to xKiv for taking the time out to actually run some proper benchmarks.

Feature - Rejected (Big)Decimal DataType

Are you interested in a decimal DataType being added to ASH?

Yes

No

Active member

Developer

Active member

Minion

Active member

Minion

Active member

Active member

Minion

Active member

Active member

Active member

Developer

Developer

Active member

Developer

Active member

Developer

Developer

Active member