Feature - Implemented Parallelize up-to-dateness check during svn update

xKiv

Active member
I have posted about this in the past already - when I "svn update", it takes a long time checking each script in turn, because I have many scripts. Often this ends up spending a minute or more just to say "nothing needed to be updated".
Updating in parallel would turn this into a few seconds, right?
But SVNKit as used by mafia isn't thread-safe. Well, we can always instantiate new SVNRepository for each thread ...

So I finally got off my lazy bum and coded a proof of concept.
This only parallelizes the initial "check which scripts' WC are at head", i.e. it splits the loop in SVNManager.doUpdate(), moving the check in a separate block (loop to create threads, then execute threads to see which scripts should be passed to the UpdateRunnable-executing loop).
Each thread gets its own instance of SVNRepository.
I didn't find an easy way to get new instances of SVNInfo, so I am calling SVNInfo.doInfo() sequentially. That shouldn't be a significant slowdown - the main slowdown is repeated waiting for internet replies.

I also put some slightly expanded diagnostic messages (in (my new class) CheckStatusRunnable):
1) I want to see the status of all scripts, not just the ones that are at-head
2) I want to see the name of the script for every updated script, even if SVN threw an exception. This is important, because mafia's current implementation SVN exceptions makes them seem to come from another script.
3) This should probably get a bit of reworking; perhaps put all messages in the a doReport() method, and call that sequentially from the main thread, in a known order (I currently print the messages when returning from the thread)

It *should* be possible to expand this so that the actual "update working copy from repository" part is also executed in the thread (parallelized). I didn't do this because
1) it's already in a different runnable, which is used in more places than this
2) I am not sure how to properly split-and-rework the code so that all the "fetch from repos to WCs" happens first, and *then* pushing *all* the updates at once (which has to be done sequentially, because it might push into the same file from different WCs, and might require users attention for each script separately).

(I can't rule out that it's as easy as new UpdateRunnable( f ).start(); at the end of CheckStatusRunnable, when the script should (and can) be updated)

My code is probably not mafia-standard-conforming, too.

So far, I have ran it once. It didn't destroy my svn directory. Then checking went quickly, and then it updated the three scripts that needed updating.

Patch attached:

View attachment 20160706-parallel-svn-update-wc-check.diff
 
Would any part of what you've done here make it easier to continue the svn update process when a script fails to check out, rather than silently failing? Or is that an entirely different part of the code?
 

xKiv

Active member
I didn't change anything with checkout.
I also didn't really change anything with the actualy "update" part of the process when a script needs updating, but I hope it would fail already when checking that script's repository revision, in which case I *think* (i.e. that's what I wanted to happen):
- the "checking" process will print a line saying that script so-and-so "not checked - exception:" + exception text
- message includes script's name *and* exception (previously there would be just the exception)
- all scripts are checked, regardless of whether some of them fail
- scripts that checked successfully and need to be updated are passed to the "actually update these" phase

i.e. everything that can be updated should end up being updated (unless there's another error later)
 

Veracity

Developer
Staff member
Bump from page 9 to page 1. Since you actually supplied code, perhaps I should look at it. ;)
 

Veracity

Developer
Staff member
Integrated it, changed to KoLmafia coding standards, and ran it.
Worked great and sure was fast.

To quote Agent Cooper, Damn good coffee ... and hot!

I released it into the wild in revision 17436.

Thanks!
 

Bale

Minion
Holy moley! I cannot believe my scripts were updated that fast!!

I've deleted the svn from quite a few scripts while leaving them intact in my script directory -- so that they don't slow down the automatic update process. It seems that's no longer a concern of mine.

This is the best update of KoLmafia v17.5 so far!!
 

lostcalpolydude

Developer
Staff member
After a bit of digging, I figured out that "svn update" doesn't update any scripts unless simpleSvnUpdate is true, since the code that figures out what to update is skipped completely. I can probably dig in and figure out why that is, but I could see it being a simple fix for someone that has worked with the code recently.
 

xKiv

Active member
I suspect the check for simpleSvnUpdate (and the preference) should be removed entirely, just call the "WCAtHead( f, false, checkingRunnables );" always.

I kind of don't get what its purpose was supposed to be, even. Looking at the code now, it used to be like this:

with simpleSvnUpdate:
1) for every script, ask the server for current version number (synchronously, so each script added 0.N seconds delay - but it's a read-only operation so maybe it has less lock contention somewhere? And not-at-head-ness can be cached perhaps?)
2) for every script that was determined to be not at head, update from server and push updates (synchronously, adding even more delay for each no-up-to-date script)

Benefit: separates the two phases

without simpleSvnUpdate:
1) for every script, update from server and push updates (synchronously), even if it already was up to date (which has to be determined by asking the server anyway)

Benefit: (if I understand this correctly) is consistent with pushing updates from .kolmafia/svn/$script to .kolmafia/scripts even for scripts that only have local modifications
 

Veracity

Developer
Staff member
Bumping back to first page.

If it doesn't work with SimpleSvnUpdate == false, we should eliminate the setting and always do the "true" behavior, as xKiv suggests.
 

Veracity

Developer
Staff member
I see the following comment in front of WCAtHead:

Code:
	/**
	 * For users who just want "simple" update behavior, check if the revision of the project root and the repo root are
	 * the same.
	 * <p>
	 * Users who have used <code>svn switch</code> on some of their project should not use this.
	 * 
	 * @param f the working copy
	 * @param quiet if <code>true</code>, suppresses RequestLogger output.
	 * @param runnables if present, enables multi-threaded behaviour, and adds a runnable to the list; otherwise executes the runnable
	 * @return <code>true</code> if the working copy is at HEAD, or runnable was successfully added to <code>runnables</code> 
	 */
I have no idea what "svn switch" does or how somebody who has used it would know to go and change the value of the setting from true to false.

Code:
global	simpleSvnUpdate	true
Which is to say, change from the default value.

We do let you configure it via the GUI in the SVN Panel of Preferences.

Code:
			tip = "<html>\"Simple\" behavior means that svn will just check the revision of your project's<br>"
				+ "root directory and compare it to that of the root directory in the repo.<br>" + "<br>"
				+ "This saves time and server hits over a full <i>svn update</i>, but advanced users with<br>"
				+ "mixed-revision working copies or the like may want to turn it off so that<br>"
				+ "a full <i>svn update</i> happens every time.</html>";
			this.queue( new PreferenceCheckBox(
				"simpleSvnUpdate", "Use simple SVN update behavior for faster updates", tip ) );
Apparently, somebody lost knows chose to disable it - and the new parallelized update behavior doesn't support it.
 

xKiv

Active member
svn switch is for "relocating" the remote repository url - i.e. if you have a script at, say, https://svn.code.sf.net/p/user/script/code which stops working, but you know that there's a working fork over at https://svn.user.great.cloud.org/code/script
the problem with that is that the fork doesn't need to have the same revision history, so revision numbers are useless for testing if you are up-to-head
but honestly, if you svn switch to somewhere that isn't an exact duplicate, you shouldn't be surprised when everything breaks.


Mixed revisions are a different thing, and dangerous for a different reason. It's possible to get those in two ways. One is relatively benign - you update (to HEAD) only parts of the WC. This means that the WC root is "stuck" on an older revision number, but only until the next update. Once you check the WC root's revision against the remote server, you will find that the remote has a newer revision (and mafia will do a full update).

The other is worse - you update part (file or directory) of the WC to an *earlier* revision. It is then *pinned* to that particular revision, and doesn't update. And different parts can be pinned to different revisions. And you can nest an "updated to HEAD" part within a pinned subdirectory.
I don't think this shouldn't be a problem here either, though, unless you somehow (manually) update the root to current HEAD but also leave some files/subdirectories *not* updated (and also not pinned). This is possible, but requires the user to be a bastard from hell. And if you tell svn to only update first two subdirectory levels, you should expect the third level to not update.
 
Top