Multiline strings in ASH (potential feature?)

heeheehee

Developer
Staff member
So, the other day, someone was asking in chat about how to use multi-line strings in ASH, and I noted yet again that it just can't be done. I can see this being potentially useful to relay scripts that generate large amounts of HTML (think CHIT, Guide).

I got around to looking at what sort of changes I'd need to make, and it dawned upon me that I don't actually know what syntax to use, so I thought I'd .

Some examples of what's done in various languages:
Perl:
Code:
"this
is
a
multiline string"
(Python is nearly identical but uses the special delimiter """ for this context.)

C:
Code:
"first"
"multiline string";

"second\
multiline string";
(second option also shows up in Javascript)

Alternately, a really fancy thing that we could do is extend ASH to have template literals, like in Perl / Ruby / (ES6) Javascript:
Code:
`The ${i+1}th element of the list is:
${arr[i]}`
but that's a more involved change than I want to make for now.

The easiest options to implement would be the Perl example and the second C example, as those would both be localized changes to Parser.parseString. Thoughts?
 

fronobulax

Developer
Staff member
No strong opinion, but the phrase "multi-line" conjures up all kinds of wrong ways to define/create/support depending upon character set and operating system. If we refuse to deal with an file as an ash file unless it uses one of an enumerated set of character sets and let the Java environment define what constitutes a "line" then I think we should be OK.
 

heeheehee

Developer
Staff member
We're already implicitly using Java's notion of a "line" to throw error messages indicating that strings can't be multiline (No closing " found), so I think this is a non-issue.
 

Veracity

Developer
Staff member
Did you notice that we currently allow '\n' (and '\r', '\t', '\xA0', and '\uABCD') escape sequences in strings?

I don't especially like that we let you do '\r', since \n in a Java string should be all you need to specify a new-line in an OS-independent way.

I think I prefer this to simply allowing the Perl mechanism, since I think it is easy to forget to close a string and letting your string just run on until the start of the NEXT string and then choking with a syntax error when we try to parse the contents of said next string seems a whole lot less friendly than saying "unterminated string".

I think the second C mechanism - letting you escape an EOL character and having us translate it internally into \n, say - probably lets you look at your multiple lines easier than if you used embedded \n.

So, if you think the existing way to embed \n is not good enough, I vote for that.
 

Ulti

Member
With no functional gain other than readability, I'm in support of the idea since I feel Ash is lacking in its readability.
With downward compatability in mind, I'm in support of the Perl/PHP multiline syntax used in your first snippet:
Code:
print("yay
world");//No downward compatability issues
The above currently errors so there would be no downward compatability issues with currently functioning scripts since no currently existing scripts would have been able to utilize the syntax for anything that doesn't error.
So if we implemented it as a syntax addition to Ash, the change would be rather seamless.

Regarding your second snippet's C-style concatenation idea for wrapping long lines without the inserting a new line character,
I don't think it would help readability whatsoever. If anything, I think it could potentially make debugging Ash syntax errors harder such as a missing comma between arguments. If speed is the concern here to avoid unnecessary concatenations, that's already possible with one long line. I see no reason why Ash scripts can't be written with both a readable well-commented development version as well as an optimized "minimalistic" production version.

Regarding your second snippet's C-style backslash idea, as far as I can tell, it would become redundant if we went with the first snippet's idea. But if we didn't go with the first snippet's syntax, backslashing could be useful. But honestly, I find the syntax very ugly and hard to manage. Imagine 100 lines of CCS and having to append a backslash on each of them. Not only does it make the CSS harder to read, when you need to add a new rule to the CSS, you gotta remember to add a backslash which is very easy to forgot, resulting in more debugging.

Regarding template literals, I love them in PHP, but even in PHP I find printf and sprintf easier to maintain when you're working with a lot of variables.
If we simply added Perl-style multi-line string support, then this becomes a reality:
Code:
printf('<!doctype html>
<html>
<head>
<title>%s</title>
</head>
<body>
<div>%s</div>
<script type="text/javascript">
%s
</script>
</body>
</html>',title,message,js);
Native printf/sprintf support would also be nice, but it might conflict with some already existing scripts. But as long as you define the function natively with a signature/prototype in such a way that there's no way we can conflict with it, then that would make life as an ash scripter a hella lot easier.
 

heeheehee

Developer
Staff member
Did you notice that we currently allow '\n' (and '\r', '\t', '\xA0', and '\uABCD') escape sequences in strings?

I don't especially like that we let you do '\r', since \n in a Java string should be all you need to specify a new-line in an OS-independent way.

I think I prefer this to simply allowing the Perl mechanism, since I think it is easy to forget to close a string and letting your string just run on until the start of the NEXT string and then choking with a syntax error when we try to parse the contents of said next string seems a whole lot less friendly than saying "unterminated string".

I think the second C mechanism - letting you escape an EOL character and having us translate it internally into \n, say - probably lets you look at your multiple lines easier than if you used embedded \n.

So, if you think the existing way to embed \n is not good enough, I vote for that.
fronobulax raised a similar point that \r and \n are not cross-platform. I'm well aware that one could simply embed \n and such (in fact, octal is also fine), but that doesn't help with length of lines in code (currently the best option on that front is to split up the string and concatenate over multiple lines).

That is a very good point re: errors. My initial thought for an implementation was via the explicit escaping of newlines, since multi-line should be something done intentionally.

Ulti: native printf format is inherently difficult to manage because printf takes a variable number of arguments, something ASH just doesn't support. You'd be better off requesting varargs as a feature (although if anything it'd probably be Java-style, and printf would have to take only strings, since we don't have a general catch-all datatype like Object in Java).

I included multiple examples of how other languages address this, since we ought to consider how other languages approach to the same problem and decide what we do and don't like from there.
 

heeheehee

Developer
Staff member
Something that I've noticed now that I've made a short (10-15 line change) implementation: Parser.getNextLine automatically trims lines. Would this be desirable behavior? I think I can see arguments both ways.

Pro:
Code:
string s = "this is a \
            multiline \
            string";
(s == "This is a \nmultiline \nstring")

Con:
Code:
string s = "this is a \
multiline\
 string"
(s == "this is a \nmultiline\nstring")

I'm not super concerned with the

edit: I have no idea what I was in the middle of saying. Sorry!
 
Last edited:

Veracity

Developer
Staff member
If you print a string to the gCLI, which is an HTML buffer, leading spaces will not be displayed, anyway. I'm OK with trimming strings.
 

Bale

Minion
So, the other day, someone was asking in chat about how to use multi-line strings in ASH, and I noted yet again that it just can't be done. I can see this being potentially useful to relay scripts that generate large amounts of HTML (think CHIT, Guide).

If you're going to mention my script, then I'd like to say that I'm good. I don't feel any need for this feature. ChIT is fine without any multi-line strings.
 

Ulti

Member
Ulti: native printf format is inherently difficult to manage because printf takes a variable number of arguments, something ASH just doesn't support. You'd be better off requesting varargs as a feature (although if anything it'd probably be Java-style, and printf would have to take only strings, since we don't have a general catch-all datatype like Object in Java).
What I don't understand is why can't the code which interprets Ash, and runs it as Java, handle function overloading in more of a C++ fashion.
In C++ the overload resolution sequence is in the following order:
1) Exact signature and argument type match
2) pointer or reference to constant conversion e.g. pointer to short gets promoted to pointer to const short
3) arithmetic promotion i.e. no precision loss e.g. short to int or float to double
4) arithmetic conversion e.g. int to char or float to int
5) user-defined conversion
6) Error e.g. ambiguous error (can't decide between two or more function versions) or an unacceptable conversion such as a pointer to const char cannot downgrade to pointer to char.

I don't see why Ash can't be handled in a similar fashion. When the code encounters "printf(...)" first check for an exact user-defined signature by trying to execute the code.
An easy way to do this might be with a try-catch block to see if it executes. If the catch is executed, then assume the function isn't defined by the user or alter the parsed statement to type cast all its parameters to an Object containing the parameter of the exact type. Then have the catch block use another try-catch block to attempt to call a natively defined function in Mafia which accepts a string followed by any number of Object arguments.
Then from within the native implementation have it work with whatever data the user tried to send printf. If this try-catch fails as well, then forward over to the usual error-handling routine and abort.
I believe with some creativity, it's definitely possible to have natively defined functions which the user can send any data type combination he wishes while also allowing them to define their own version of the function without conflicts by handling the overloading in more of a C++ manner. But I understand if this request is too difficult to implement.
 
Last edited:

heeheehee

Developer
Staff member
Since you clearly know what you're talking about, would you care to provide a patch that implements printf(string, ...) as an ASH primitive?
 

Bale

Minion
This feature was added in r16800.

Add multi-line string support to ASH.

In order to specify a multi-line string, you must explicitly escape the
end-of-line by placing a backslash (\) at the end of the line.

Note that as a consequence of how the parser is written, it will trim all
leading and trailing whitespace for each of these lines via Java's
String.trim() method. Previously, this was irrelevant in the context of ASH
strings, but now that ASH strings can span multiple lines, it is worth
mentioning. If you want to have whitespace at the beginning of the line,
just escape the first character of the line.

by heeheehee on 2016-03-23 12:55:11

M /src/net/sourceforge/kolmafia/textui/Parser.java (view) (diff)
 

Veracity

Developer
Staff member
This thread. Which was never a Feature thread in the Bugs forum, so I didn't find it while searching there.

OK, thanks for this. This suits my needs.
 
Top