Bug NPE and debug log when loading chit in relay browser with r27062.

Magus_Prime

Well-known member
With r27062 when I open the relay browser Chit tries to load and fails with a NPE. The error does not occur with r27061.

Here's what is in the gCLI:
(chit_brickGear.ash, line 20)
Script execution aborted (java.lang.NullPointerException: Cannot read field "right" because "l" is null): (chit_brickGear.ash, line 20)
And here's the debug.log:
 

Attachments

  • DEBUG_20230104.txt
    8.6 KB · Views: 4

heeheehee

Developer
Staff member
I think there's probably some code path that's not properly initializing Modifiers::strings and so we're overwriting some modifiers to null; down the line, things break horribly.
 

Ryo_Sangnoir

Developer
Staff member
(in particular, the part where we ripped out `this.strings[index] == null ||`)
It was ripped out because it was no longer possible. The value could be null only if it was never initialised, and the PR ensured it was always initialised.

This issue is an issue with DoubleModifiersCollection, not with the strings array. I think it might be a race condition around densify; failing that a race condition around set directly. If the issue is with densify the simplest option is to never set `sparseDoubles` to null; if with `set` the simplest option is to never call `remove` (thereby never setting any key to null -- only ever to 0).

We could also synchronise access to the map, but there's going to be a speed penalty to that I'd prefer to avoid.
 

Ryo_Sangnoir

Developer
Staff member
FWIW I also use chit; I've seen the error once but a refresh fixed it, and I've not seen it again.

The race condition around densify is:
* `set` confirms (this.sparseDoubles != null)
* densify is called on a different thread, runs through, sets sparseDoubles to null
* boom!

The race condition around set would be that if multiple threads attempt to modify the tree at the same time, it could get knackered because it's not thread-safe. I don't think there's actually a way around this without making it a synchronizedMap.
 

heeheehee

Developer
Staff member
Hm, that makes sense.

The problem is that you're doing two separate reads, right? and so the relay thread sees an inconsistent state?

The immediate NPE can be addressed via...

Java:
var sparseDoubles = this.sparseDoubles;
return sparseDoubles != null ? sparseDoubles.getOrDefault(index, 0 : this.doubles[index];

You'd also need something similar for set(), and forEach().

But, honestly you're tempting fate by reading/writing to a mutable data structure from multiple threads anyways. Either synchronize the map, delegate all reads / writes to a single thread, or make writes create a new map each time (RCU instead of locking).

(Also if it's really a sparse map I don't see why we're not just using a HashMap; probability of collision should be low)
 

Ryo_Sangnoir

Developer
Staff member
The problem is that you're doing two separate reads, right? and so the relay thread sees an inconsistent state?
The specific problem as seen in the stack trace seems to be that two separate threads are both modifying the map at the same time (I assume, as the crash is in TreeMap internals when remove is called), and as TreeMap isn't thread safe this can lead to issues. Separate reads might lead to reading inconsistent values, but I don't think it would crash.

I'm not sure why two separate threads are modifying the map, or even which threads they are.

When I was reading the PR I assumed it was a TreeMap because when it copies to an array (in densify) there's presumably a performance gain from having the keys in the right order. Thankfully we have some benchmarks we can use to confirm. It might be worth throwing these in-tree (in a separate project?).
 
Top