visit_url and possibly the wiki.

fronobulax · May 11, 2010

Below is a fragment of code that visits a URL at JickenWings (contained in DCCASE).

Code:

caseRes = visit_url(DCCASE);
if (length(caseRes) <= 0) {
	print ("Bad return from Jicken Wings.");
	caseRes = visit_url(DCCASE);
	if (length(caseRes) <= 0) {
		abort("Jicken Wings fail twice.");}

It would routinely generate the fail message suggesting a time out or other problem. When I previously mentioned this there was some mention that buffers were more reliable than strings when parsing large pages. I did not follow up on that suggestion at the time because I thought my problem was before parsing and wrote the test code to confirm that. However, it appears that by changing to

Code:

buffer caseRes;

it now fetches reliably, first time, every time.

In light of this should the wiki entry be changed to mention that buffer is preferred under some circumstances and possibly the code samples tweaked to reflect that as well?

I suppose I could edit the wiki myself but I am also looking for some confirmation that my experience does indeed generalize.

Thanks.

Bale · May 11, 2010

I'd gladly edit the wiki to add that, but I'm also hoping that someone will explain your experience.

heeheehee · May 11, 2010

That's generally the case -- heck, that's why most, if not all, relay override scripts use buffers instead of strings.

Added at the end of the special (notes section, pretty much):
"Also, a buffer is often preferable under certain circumstances (i.e. when dealing with an especially large page) -- in this event, append() would be required in addition."

That good by you?

jasonharper · May 12, 2010

This is absurd - visit_url() ALWAYS returns a buffer. Whether you choose to keep the value as a buffer or convert it to a string CANNOT possibly have any effect on whether the function succeeded, because the value is already determined before any such conversion takes place.

It is at least conceivable that a timing issue is at fault. If a server imposes some limit to how often it will fulfill requests from a particular IP address, then, one technique or the other might make the program run fast enough to hit that limit.

fronobulax · May 12, 2010

I'm not sure I would have used "absurd" but there is definitely some head scratching going on...

With reference to the test snippet above, which accesses a non KoL URL...
1) The string version of the code consistently "fails" the first time it is called in a session. As implemented this means two calls to visit_url return a non-positive length.
2) The second and subsequent calls in a session succeed.
3) Admittedly limited testing using the buffer version consistently succeeded in contrast to 1).

Hitting the URL multiple times in a session succeeds with no obvious lag so I would downplay the possibility that the server is throttling back responses to me.

I don't know what is cached, where, but if anything between my client and the server is caching, that potentially makes calls after the first "faster" to respond.

I have walked through the Java several times with the debugger and I can't reproduce the problem which also suggests timing.

There are numerous examples in Java where run time performance differs significantly between code that uses strings and code that uses buffers so at first blush, it is plausible that the declaration makes a difference in timing.

It seems pretty likely that timing is involved although the change in behavior between string and buffer could have been dumb luck.

StDoodle · May 12, 2010

But I believe what JH is saying is that visit_url() is ALWAYS a buffer first. Assiging a call to a string just means an implicit conversion is done from the buffer, not that it's assigned to a string in the first place. I think timing is much more likely (unless jh is wrong about the code, replace "likely" with "certain").

fronobulax · May 12, 2010

I believe we have a timing problem.

Because the problem seemed to go away when ASH used a buffer instead of a string and because there exist programming situations in Java where using a string instead of a buffer introduces a tremendous timing hit in performance, I speculated there might be a relationship between the change and the timing. JH believes such a relationship is absurd and I have dug into the mafia code enough to agree with him although I would like to understand more about the relationship between Java types and ASH types, when ASH variables are "allocated" and where (from a Java perspective) they are stored. However I do not expect those answers to turn up anything that would introduce a timing problem.

Veracity · May 13, 2010

fronobulax said:
Because the problem seemed to go away when ASH used a buffer instead of a string...

You didn't listen to what jason said - or, perhaps, didn't look at the code.

The code NEVER "uses a buffer INSTEAD of a string" (emphasis mine). The code ALWAYS puts the result of the request into a buffer. In your example, it then converts that buffer into a string. Storing the buffer result into a buffer variable obviates the conversion. In both cases, the result of the request is in a buffer first.

fronobulax · May 13, 2010

Veracity said:
You didn't listen to what jason said - or, perhaps, didn't look at the code.

The code NEVER "uses a buffer INSTEAD of a string" (emphasis mine). The code ALWAYS puts the result of the request into a buffer. In your example, it then converts that buffer into a string. Storing the buffer result into a buffer variable obviates the conversion. In both cases, the result of the request is in a buffer first.

I wasn't clear. My reference to code using buffer vs. string was strictly meant in the context of the ASH code snippet in the original post. The change I made that made the problem appear to go away was in ASH. So I would clarify the statement by saying the "ASH code uses a buffer INSTEAD of a string". The decision to use an ASH buffer or a string was made by me when I wrote the script.

If I understand you correctly, the underlying Java code uses a buffer throughout and finally converts the Java buffer to a Java string if the target of visit_url is an ASH string. This could result in different timing for the two ASH versions but would not explain why the first invocation of the script fails and second invocation in a session "works".

Veracity · May 13, 2010

fronobulax said:
the underlying Java code uses a buffer throughout and finally converts the Java buffer to a Java string if the target of visit_url is an ASH string.

Yes. That is why Jason found it "absurd" that converting the buffer to a string would result in an empty string whereas passing it straight through as a buffer would work.

Unless there is a bug in converting buffers to strings, which you would expect to show up elsewhere. Can you reproduce that without visiting a url?

visit_url and possibly the wiki.

fronobulax

Developer

Bale

Minion

heeheehee

Developer

jasonharper

Developer

fronobulax

Developer

StDoodle

Minion

fronobulax

Developer

Veracity

Developer

fronobulax

Developer

Veracity

Developer