1 map or 2?

efilnikufecin · Oct 19, 2006

In some cases since the implementation or records we have the option of having 2 data maps or 1 data map of records. This thread is an example of such a case:
http://kolmafia.us/index.php/topic,527.0.html
With some of the massive scripts some of us have been writing, I begin to wonder which way is more efficient in ash?

In pascal, when an array is created, a location in memory is assigned to contain pointers (the physical addresses) to the location in memory of the actual data, and creating 2 arrays creates 2 of these address maps which makes it less efficient than to create 1 array returning a record.

Is this the case in ash?

Veracity · Oct 19, 2006

Your concern is memory usage? Here are some technical details of how ASH uses memory for holding data.

- ALL data is "boxed". That is to say, all data - even simple things like integers or booleans - are wrapped in an allocated piece of memory that tags them with a "type".

- Data in a map or a record is always pointers to such data wrappers

- ASH maps are not like Pascal arrays. You access them with similar syntax - square brackets and indices - but they are sparse. A Pascal array that can hold 100 integers will always contain 100 integers, even if you've only set 5 of them. An ASH map from integer to integer will contain 5 pointers to integers, as well as memory associated with the Java implementation of a map. That will include, at least, 5 pointers to the integers that are the keys.

Code:

int [int] map1;
int [int] map2;

map1[5] = 6;
map2[5] = 7;

map1[12] = 2;
map2[12] = 3;

Memory usage:

Code:

map1 ->   (Java map)
          +---------------+
          |      Key         |--->[ INT, 5 ]
          +---------------+
          |      Value       |--->[ INT, 6 ]
          +---------------+
          |      Key         |--->[ INT, 12 ]
          +---------------+
          |      Value       |--->[ INT, 2 ]
          +---------------+

map2 ->   (Java map)
          +---------------+
          |      Key         |--->[ INT, 5 ]
          +---------------+
          |      Value       |--->[ INT, 7 ]
          +---------------+
          |      Key         |--->[ INT, 12 ]
          +---------------+
          |      Value       |--->[ INT, 3 ]
          +---------------+

So, you have 2 java maps, 4 boxed INTs for the keys, and 4 boxed INTS
for the values.

Now, compare to a record implementation with the same data.

Code:

record {
    int val1;
    int val2;
} [int] map;

map[5].val1 = 6;
map[5].val2 = 7;

map[12].val1 = 2;
map[12].val2 = 3;

Memory usage:

Code:

map ->         (Java map)
          +---------------+
          |      Key         |--->[ INT, 5 ]
          +---------------+
          |      Value       |      (Record)
          |                    |      +--------+
          |                    |      |  val1    |--->[ INT, 6 ]
          |                    |--->+--------+
          |                    |      |  val2    |--->[ INT, 2 ]
          |                    |      +--------+
          +---------------+
          |      Key         |--->[ INT, 12 ]
          +---------------+
          |      Value       |      (Record)
          |                    |      +--------+
          |                    |      |  val1    |--->[ INT, 7 ]
          |                    |--->+--------+
          |                    |      |  val2    |--->[ INT, 3 ]
          |                    |      +--------+
          +---------------+

You have one Java map, 2 boxed INTS for the keys and 4 boxed INTS for the values - and 2 records. It's not a huge difference with only two fields in your rcord. The difference gets more pronounced with more fields.

Compare N maps to 1 map holding records with N fields. Ignoring the allocation of the maps themselves, if you have M keys:

N * M keys
N * M values

total of M * 2N memory allocations

M keys
M records
N * M values

total of M * N+2 memory allocations

For any N > 2, 2N > N+2, and therefore the record implementation will allocate less memory. Allocating one Java map also uses less memory than allocating N of them. Additionally, you only require one key lookup in the map to retrieve all the N data values, so less execution is needed to fetch the data. And finally, grouping related data values in a record makes your code easier to understand, in my opinion.

Go with the records.

1 map or 2?

efilnikufecin

Member

Veracity

Developer