Bug - Fixed GUI stuck in gray mode

roippi

Developer
I'm not sure I can see how removing concurrency locks would help address this issue; if anything, it seems like it will result in more bug reports containing runtime ConcurrentModificationExceptions.
 

fronobulax

Developer
Staff member
To the extent that there is a case I made it over at the dev forums. I guess I know why there are crickets there. The short answer is that CharPaneRequest is a GenericRequest which in turn extends Foxtrot's Job which has an explicit comment that Job "should NOT be synchronized". Since CharPaneRequest is the ONLY request that is synchronized and since it seems to be implicated in some of the undesired behavior it seems worth a shot to throw something out there and get some empirical results while investigations continue.

Edit: I thought I was going to have to report that the change did not work for me but upon further investigation what happened was a script ended and left the pane gray. I was, however, able to do all sorts of things, including restart the script and have it work.
 
Last edited:

roippi

Developer
Huh, I should look over at the dev forums more often.

Synchronizing Job would definitely be a very bad thing. That would essentially kill all multithreaded behavior in mafia, as any thread that subclasses Job would have to wait on all other threads that subclass Job to finish. It would very likely end up in deadlock, too. Synchronizing subclasses of Job is different, though. All synchronization does is create a lock on a particular thread until it is finished executing. Now, if charpaneRequest never finishes... that lock would never be released...

Hm. So in writing this, I'm now not entirely sure if your patch wouldn't treat the symptom. If it does, though, the problem is that there's an underlying bug with RelayRequest that needs to be addressed. Basically, RelayRequest.run() SHOULD be done with the object and release its lock.
 

fronobulax

Developer
Staff member
My apology for not being precise. I was kind of thinking, based upon past experience, people would look at the code rather than taking my word for it ;-) The comment I was referring to in Job boils down to the statement that the run method implemented by classes that extend Job should not be synchronized. CharPaneRequest violated that. That, of course, begs the question - what problem was hola trying to solve when synchronization was added? Multiple concurrent CharPaneRequests and he needed a way to make sure they all got processed?
 

roippi

Developer
I feel like I just had a moment of clarity where I saw the whole problem and then lost it. But..

There have been scattered reports of ConcurrentModificationExceptions in various other threads. CharPaneRequest does:

Code:
	public Object run()
	{
		if ( CharPaneRequest.isRunning )
		{
			synchronized ( this )
			{
				return null;
			}
		}

		synchronized ( this )
		{
			CharPaneRequest.isRunning = true;
			super.run();
			CharPaneRequest.isRunning = false;
		}

		if ( this.responseCode == 200 )
		{
			CharPaneRequest.lastResponse = responseText;
		}

		return null;
	}

In order to avoid ConcurrentModificationExceptions from CharPaneRequests. BUT it calls super.run() which runs a RelayRequest - which is not synchronized. No concurrency lock is generated for it. So if an exception happens while super.run() is being called, the lock upstream in CharPaneRequest is never released. There is nothing stopping two RelayRequest threads from interfering and that could permanently lock down CharPaneRequest - and anything that subsequently calls it would wait on a hung thread that will never complete.

I'm increasingly convinced that your patch would solve this symptom, but there is an underlying cause that needs to be treated, otherwise more very strange bugs will arise. Thread interference is notoriously difficult to detect and solve, so we should be happy that we've even detected here.

A more robust solution needs to be implemented for all Requests. Concurrency should be expected, and handled gracefully. The current status quo is that we can have interleaving threads generated from the same object (i.e. concurrent UseSkillRequests can happen from moods and clicking the "up" arrow) or from two threads that call a parent's method (i.e. RelayRequest's run(), which is likely the problem here).

So you're on the right track here, I think. But the proper solution is not to remove all concurrency locks, but add more of them, make them better, with failsafes. Try.. finally() blocks that release the lock if something goes wrong. Etc.
 

fronobulax

Developer
Staff member
This is the discussion I hoped to spark elsewhere :)

Foxtrot was part of mafia before before I was privy to any design decisions (not that I am now) but my understanding is that many of the concurrency concerns were delegated to it. So I would prefer to understand that decision and resulting implementation before proposing too many changes. It may, for example, be that there are places that should use it that don't, for example. If I recall the docs correctly, you extend one Foxtrot class if you expect to thrown an exception and a different on if you know you will handle them yourself so it may be that Task should be extended in a few places where Job is being extended. Foxtrot's last update was circa 2008 and that is what our code is based upon. But there is always a chance that Foxtrot has an undocumented feature as well.

My sense is also that there is a deeper problem that the patch does not address
 

roippi

Developer
And again my ignorance arises - I'm totally unaware of how Foxtrot handles concurrency issues.

I'm going to read up before I comment further.
 

xKiv

Active member
Try.. finally() blocks that release the lock if something goes wrong.

I don't think that's how synchronization works in java.

If you have this
Code:
synchronized (this) {
  throw new Exception();
}
the lock is released as soon as the exception is propagated outside the synchronized() {} block.
 

roippi

Developer
Hum. True, exceptions cause threads to terminate and release locks acquired by synchronize(). Thought I was on to something there.

I'm wondering if RelayRequest threads are hanging/not terminating for some other reason.
 

holatuwol

Developer
In case it's relevant, Foxtrot was added because it implemented what was equivalent to a Swing worker without requiring we upgrade Java to get that functionality.

And I just read this thread after adding code which added a synchronized block to UseSkillRequest. (Edit: Synchronized block removed.)
 
Last edited:

Bale

Minion
In case it's relevant, Foxtrot was added because it implemented what was equivalent to a Swing worker without requiring we upgrade Java to get that functionality.

So is this a good reason to change target platform from Java 1.4 to something newer? Would swing be better than using foxtrot? (Considering that it has been so long since foxtrot was last updated in 2008 this sounds like it is probably a good thing.)
 
Last edited:

holatuwol

Developer
Not really, a better idea would be to move away from the whole SwingWorker concept altogether. There's really no additional UI processing needed from pushing a button that people wouldn't expect to have happen automatically anyway from the equivalent CLI command / relay browser request.
 

Tom Sawyer

Member
I will say again to everyone who help develop and maintain Mafia in advance. I see a huge flurry of updates in the daily builds. I am grabbing the latest one and will test the heck out of them on Win-XP and Win 7 Ultimate.

Thanks again for this awesome program!
 

roippi

Developer
Marking fixed; I believe the removal of the synchronized() block got it. Thanks everyone who worked on this, this was a very a-word bug.
 

fronobulax

Developer
Staff member
r9736

Not sure whether this is new or a symptom that things are unfixed.

I tried to load the Mystic Buff relay script. It took forever to load and never actually succeeded before I grew impatient and shut mafia down. The GUI was stuck in gray, the browser said it was waiting for mafia and I was able to adventure in the mini-browser.

Since the relay script clearly failed to load this may just be the expected behavior. The fact that the script has visit_url and the URL in question is not responding for me at the moment may mean this is just another instance of the URL time out discussed elsewhere.
 

matt.chugg

Moderator
sorry, I havn't been paying attention, i'll reload the same override that I know was causing problems and test it later!
 

Orbrisa

Member
Thanks to everyone who was involved in tracking down and fixing this bug. Its being gone massively improves my general KoL experience. :D
 
Top