Feature - Implemented file_to_string() or file_to_map(string,string) or session_logs(string,int,int)

yojimbos_law

New member
Basically, I just want file_to_map() to allow the second argument to be a string (or buffer if that's important for practical applications for reasons I don't fully understand) and have that string's value be the text of the specified file. I'm not sure if that's harder to implement than a file_to_string() function that'd behave similarly.

In case context helps, I'm writing a session log parser for sharing ascensions in ASH (i.e. ALV except not requiring an external program to use or a compiler to edit), but the inability to specify a starting point for a range of days in session_logs() makes it infeasible/inefficient to parse particularly old sessions. As far as I can tell, the only way to access old session logs in ASH currently is to fill a string[int] map with all session logs between the start of the ascension and the present day.

Maybe there's a way to do either this currently (without pre-parsing the file in an external text editor to give it some mappy structure or another), but I haven't been able to figure out a way to do it after extensive testing, wikiing, asking around, and googling.
 

fronobulax

Developer
Staff member
have that string's value be the text of the specified file

I'm not sure I understand.

Would a version of session_logs (say session_logs(char, start, end) that returned data from all logs for the character char between start and end inclusive be an alternative?
 

yojimbos_law

New member
I'm not sure I understand.

Would a version of session_logs (say session_logs(char, start, end) that returned data from all logs for the character char between start and end inclusive be an alternative?

If that's easier, it'd be fine for my current application. The other two implementations I mentioned would essentially let you do that with arbitrary files in mafia's directories.

Lemme try to clarify my intent with the file_to_map() related request.

Suppose the file ~/data/butts.txt has as its text

Code:
butts is the first line of this file
we want a second line because nontrivial examples are good

and further that this is some ASH code using file_to_map() and referencing butts.txt:
Code:
string something;
string[int] something_but_split_at_linebreaks;

file_to_map("data/butts.txt", something);

something_but_split_at_linebreaks = split_string(something, "\\r");

foreach i in something_but_split_at_linebreaks{
	print("line "+i+": "+something_but_split_at_linebreaks[i]);
}

I would like that code to have this as its output (assuming I didn't make some stupid typo elsewhere and maybe with an extra linebreak or two because I don't know whether there're \n's or not) :

Code:
line 0: butts is the first line of this file
line 1: we want a second line because nontrivial examples are good

I don't believe I can make the above happen in an ASH script. Anything at all that makes it happen would be welcome; I don't really have strong feelings about the specifics, as long as it's possible to put an arbitrary, mafia-related text file in into a map somehow. :)
 

fronobulax

Developer
Staff member
Code:
string something;
string[int] something_but_split_at_linebreaks;

file_to_map("data/butts.txt", something);

something_but_split_at_linebreaks = split_string(something, "\\r");

foreach i in something_but_split_at_linebreaks{
	print("line "+i+": "+something_but_split_at_linebreaks[i]);
}

Let's not split at line breaks.

Code:
string something;
file_to_map("data/butts.txt", something);
foreach i in something{
	print("line "+i+": "+something[i]);
}

That almost gives you what you want, but something needs to be an array. In the map something is line i-1 in the file.

Code:
string[i] something;
file_to_map("data/butts.txt", something);
foreach i in something{
	print("line "+i+": "+something[i]);
}

I have not tried this either but expect it to work provided KoLmafia, your OS and text editor all agree on what an End of Line is. If it doesn't work let's have the error message. If it doesn't do what you want then I'm not sure what you are asking.

What you might have missed is that using string[int] used as a map gives you one line per array entry fo file_to_map has already stripped out the EOL.
 

yojimbos_law

New member
That's not how file_to_map() works. It won't treat a line number as an implicit key for your map. It requires a file that is precisely a tab separated list of keys follow by a map value on each line to be added into the specified map.

What your suggested script does (after replacing "string" with "string[int]") is (1) find no lines in butts.txt which are precisely an int followed by a tab followed by a string, (2) iterates over all zero keys in a still empty string[int] map, and (3) terminate successfully having printed all zero statements that it has been asked to print. It does not return an error message (because there are no errors). Here's a slightly modified version paired with its output to illustrate point (1). https://i.imgur.com/bOmCQcU.png
 

lostcalpolydude

Developer
Staff member
I did some digging into the history for file_to_map(), to see why that was added instead of something like file_to_string(). Apparently people were making files that resembled fullness.txt before mafia included that type of data internally, so it was a good way to make that type of data accessible. In addition, before visit_url() was added, using echo plus file_to_string() would have allowed for easier automation of new content probably, something holatuwol was clearly against.

At this point, I see no down side to adding something like this, with everything else that can already be done. I know I've seen Ezandora add random tabs to files just so file_to_map() can be used to verify that they exist.

I don't know if it would be better to return a single string/buffer (leaving the headache of variable line breaks to the scripter), or a string[int] split by line. The latter case probably copies file_to_map() but stops earlier, so without looking at the code it seems easy enough. I don't know what a good name for that function would be though.
 

Veracity

Developer
Staff member
file_to_array?

This is the kind of thing I'd take a personal interest in, but, as I an announced over a year ago, unless it affects me personally, my new job precludes my being too involved.
 
Last edited:

fronobulax

Developer
Staff member
Sorry. Some days my only contribution to society is to serve as a bad example. I know I did a lot of work trying to use a string[int] in file_to_map so that result i was line i-1 of the file. Looking back, I never solved the problem in ash and resorted to introducing a new column in the data file. But all I remembered was my success at solving my problem and not what I finally had to do.

I did look at trying to have file_to_map act as if the line number was present so it could build the map I wanted. I was trying to inject a phantom column at some point so everything downstream remained unchanged. That ultimately failed because of code in common with reading versioned files and an unwillingness to change a lot of working code to support what (at the time) was a feature only of interest to me. Other than adding a new ash function I also stumbled on how to distinguish between a string[int] that was supposed to be populated with the line number and string[int] where the int was in the file. file_to_array() addresses that.
 

fronobulax

Developer
Staff member
If we have file_to_array do we also want array_to_file? Is there any other case we need to support besides string[int]?
 

yojimbos_law

New member
If we have file_to_array do we also want array_to_file? Is there any other case we need to support besides string[int]?
I don't think there's any need for array_to_file(), since map_to_file() already provides a way to do that with minimal effort. As for the latter, my understanding (based on troubleshooting help I received with runtime issues when manipulating very large matrices indexed by strings) is that string interning will likely cause issues for people trying to repeatedly manipulate the values of string[int] maps and that those people should be able to avoid such issues by converting those maps to buffer[int] types; since that workaround already exists, I don't think this function would really need to support it further, but I could be missing something given my lack of expertise.
 

fronobulax

Developer
Staff member
I asked out of a sense of composition. To continue on my series of error prone pontifications I was thinking that using map_to file to write a string[int] would result in a file with line number, tab, string whereas the original file was just string.

Not seeing the relevance of string interning and not seeing how the existence of array_to_file and file_to_array changes that but that could just be me needing to pay more attention.
 

lostcalpolydude

Developer
Staff member
I have string[int] file_to_array( String filename ) written, and it seems to work. I'll likely commit it tomorrow, in case I think of something to change.

This change also adds sessions to the list of folders that can generally be accessed for file reading/writing. That seems fine since session_logs() already provided access to read from there.

Unlike file_to_map(), if the file doesn't exist, the return value is an empty array. Since that's easy to check using count( array ) == 0, that seems fine.
 

yojimbos_law

New member
I asked out of a sense of composition. To continue on my series of error prone pontifications I was thinking that using map_to file to write a string[int] would result in a file with line number, tab, string whereas the original file was just string.

Not seeing the relevance of string interning and not seeing how the existence of array_to_file and file_to_array changes that but that could just be me needing to pay more attention.

Yeah, your understanding of map_to_file() applied to a string[int] here seems correct. The thing is, once you import the file with file_to_array(), you can manipulate it however you want. I suspect many of my applications will involve taking a string[int] given by file_to_array(), appending each string in there to a buffer, doing stuff and/or things to that buffer (most likely lots of group_string()-related parsing shenanigans), and then throwing the result or the buffer itself into a file with map_to_file(). Throwing the buffer into a file would result in a file that would be identical to the initial file with the exception of maybe a trailing or leading tab (if you, say, make it into a buffer[string] with the empty string as the only key) and no line breaks (unless you add those back in while appending). I guess what I'm trying to say is that aggregate manipulations that already exist in ASH can be used to make map_to_file() behave practically identically to the proposed array_to_file() with minimal effort on the user's end.

Regarding string interning, I'm just saying that people doing lots of manipulations on the string[int] resulting from file_to_array() might unwittingly run into issues that they wouldn't if they were doing those to a buffer[int] aggregate, speaking as someone who has had problems in that vein. I don't think the existence of either array_to_file() or file_to_array() would necessarily exacerbate that, just that people might be less likely to encounter it if file_to_array() could return a buffer[int]. As described above, it's very straightforward for a user to convert a string[int] to a buffer or buffer-based aggregate, so people who encounter issues already have tools at their disposal to address them, just slightly less intuitively than if file_to_arry() could be told to populate a buffer[int]. I'm certainly not suggesting spending any time at all addressing this, just sharing a potential complication of a new feature encouraging people to muck around with large-scale string manipulations.
 

Veracity

Developer
Staff member
A couple comments:

1) When I refer to an "array" of strings in ASH, I am referring to the thing you get from, for example:

string[10] array1; // elements indexed 0 - 9, an actual Java array underneath, not a map
string[] array2 = { abc, def, ghi }; // elements indexed 0 - 2, ditto

Which is to say an "array of strings" behaves programatically like "string [int]", except it will throw a runtime exception if you don't stay within the bounds.

2) String values, as read by file_to_map, effectively intern strings: they are "unescaped" by storing them, character by character, into a buffer, turning \t and \n and \\ into tab and newline and backslash, respectively, and then getting buffer.toString(). That detaches each string from the line it was read in and the page it came from, and so on.

3) String values processed by map_to_file do the opposite transformation; newlines and tabs and backslashes turn into \n, \t, and \\.

4) Given that file_array stores one line per string, should I assume you'd like the same transformation done to allow those escape characters?

5) And both escape and unescape could be optimized to not bother going through a buffer if there are no characters that will be transformed, but in that case, string interning will have to be done explicitly, rather than as a side effect of moving to a buffer and back.
 

Veracity

Developer
Staff member
You actually did "file to string[int]" - i.e., file to map of strings indexed by integer.
Most programs will not notice the difference, although it is (minutely) less efficient under the hood.
Also no "escape/unescape" transformation for reading/writing. Whatever, that can be done in the user program, and yojimbos_law had nothing to say to my question about whether that was important to him.

Given that, I guess this is implemented.
 

Veracity

Developer
Staff member
I'm going to reopen this, since I personally want something like the original suggestion.

map_to_file, file_to_map, file_to_array, and load_html (whatever the heck that is; added 12(!) years ago by holatuwol) allow you to transfer data between KoLmafia local data directories and KoLmafia internal data structures.

I want this:

buffer file_to_buffer( string filepath )
int buffer_to_file( buffer data, string filepath )

I'm currently using map_to_file. My strings can be 10s of thousands of characters long and contain newlines. KoLmafia escapes the newlines (and tabs and backslashes) on writing and unescapes them on reading. That's perfectly fine when the data is being written and read and manipulated by KoLmafia (per the OP's use case), but is less convenient when the file is being written with the intent of humans looking at it.

My use case is saving a page of HTML received from KoLmafia, complete with line breaks. Not having an index + tab, and not having line breaks, tabs, and backslashes escape, will make manipulation by humans - as opposed to ASH scripts - easier.
 

Veracity

Developer
Staff member
Yeah. Revision 19439 adds:

buffer file_to_buffer( string filepath )
boolean buffer_to_file( buffer data, string filepath )

Code:
buffer source;

source.append( "Line 1" );
source.append( "\n" );
source.append( "Line 2" );
source.append( "\n" );
source.append( "Line 3" );
source.append( "\n" );

string filename = "test-buffer-file.txt";

if ( buffer_to_file( source, filename ) ) {
    buffer saved = file_to_buffer( filename );
    if ( source == saved ) {
	print( "buffer retrieved from file identical to source" );
    } else {
	print( "buffer retrieved from file differs from source" );
    }
 } else {
    print( "failed to write buffer to '" + filename + "'" );
 }
yields

Code:
[color=green]> test-buffer-file.ash[/color]

buffer retrieved from file identical to source
 
Top