Feature Functions as first-class objects in ASH

Veracity

Developer
Staff member
Here's a proposal for adding "function" as a first-class data type in ASH.

I've put a fair amount of thought into it and see how I would find it useful - and how I could implement it - but now it's your turn to ponder it and offer suggestions and criticisms.

- What is missing that YOU would find useful?
- What is unclear or unspecified?
- What is wrong?

Please keep in mind that ASH is a strongly typed language and that function parameters and return values have types.
Please keep in mind that although I drew on both Java and Javascript for inspiration, this proposal is neither.
(Both of those languages have features that don't make sense in ASH - and ASH can allow things that neither allows.)

Please restrict suggestions to this specific aspect of ASH - the ability to use function objects in variables and parameters and invoke them with arguments.
I am aware, for example, that having GENERIC aggregates would make this even more useful.
It would be nice if ASH had a "list" type - and array (int[]) is sort of like one - but that is built on a Java array, rather than, say, an ArrayList, so you can't add or remove elements from it. (We could fix that by changing the internal implementation to ArrayList, which would not affect any existing programs, but which would add capabilities.)

Even so, I think this proposal would allow you to write things like:

MAP filter(MAP, PREDICATE)
MAP map(MAP, FUNCTION)
VALUE reduce(MAP, FUNCTION)
whereby you could accept a map variable, process its values, and produce another map or a reduced value.

Let's get going. :)

----

Here are a couple of functions:

Code:
int plus(int x, int y) {
    return x + y;
}

int times(int x, int y) {
    return x * y;
}

What type are they?

function int (int, int)
typedef function int (int, int) binop;

What type is this?

Code:
void biconsumer(int x, int y)
{
    // Do something with the inputs that has a side effect
}

What type is that?

function void (int, int)
typedef function void (int, int) biconsumer;

I use typedefs in every ASH program I write - for aggregates, especially. I'm not sure I've seen them in anybody else's scripts, but I think they are incredibly useful - and equally so with function types. For clarity, I'm going to use the typedefs in further examples, but you don't HAVE to do that.

How can I declare some variables?

Code:
binop op1 = plus;
binop op2 = op1;
binop op3 = (x, y) -> x * 2 + y * 2;
binop op4 = (x, y) -> { return (x ** 2) + (2 * x * y) + y;};

(Oooh. Lambdas! Or are they Arrow Functions? Neither, precisely, but we'll talk more about them later

How can I use those?

Code:
print(op1(5, 6));
// 11
print(op3(2, 3));
// 13

int apply(binop op, int x, int y)
{
    return op(x, y);
}

print(apply((x, y) -> x * 2 - y * 2), 10, 5);
// 10

binop[string] ops = {
    "+": (x, y) -> x + y,
    "-": (x, y) -> x - y,
    "*": (x, y) -> x * y,
    "/": (x, y) -> x / y
};

int eval(string op, int x, int y)
{
    return apply(ops[op], x, y);
}

print(eval("*", eval("+", 2, 3), eval("+", 4, 5)));
// 45

OK, let's talk about lambdas.

They are either an "expression" or a "statement", depending on whether the function type returns a value or is void.
They will be used for defining the initial value of a variable which is a function type.

Code:
binop bo1;
binop bo2 = (x, y) -> INT_INIT_VALUE;
binop bo3 = (x, y) -> { return INT_INIT_VALUE };

biconsumer bc1;
biconsumer bc2 = (x, y) -> {};

Looks like they take a statement or a block. Is that a scope?
Unlike Java or Javascript, yes, it is.
The expression is a single statement: "return VALUE";
The block is just like a block elsewhere in ASH.

The other languages allow you to "bind" local variables (if the lambda/arrow function is inside a function) as long as they are "final".
ASH does not have "final" variables.
And although ASH is perfectly happy to let you declare functions inside other functions (or any scope) (and I do that all the time, to clarify the function code and reduce global namespace bloat), we really cannot let lambdas have access to functions and vars in an inner scope.
Therefore, ASH lambdas will be in the global scope - and will have access to all global functions and variables that are lexically visible to them, just like other global functions.

Your turn.
 
There are a few ash functions that currently accept strings that are the names of functions. There is special handling to let js pass functions as those arguments instead; e.g., in adv1, js users can input a combat filter function as an actual function. e.g., I can do
Code:
function handleCombat(page, foe, round) {
  return "Hello World";
}

adv1(Location.get("noob cave"), -1, handleCombat)
while in ash for the analogous code I would pass the string "handleCombat".

I assume that in a world where ash lets you pass handleCombat directly, we no longer need any of the special handling code for js.

Do we expect that if I defined handleCombat as an arrow function I would still be able to pass its name to adv1 and other functions like that? Or would that only work with traditional functions? I assume that they are more or less the same under the hood and so it would work with arrows or traditional functions, but I'm not sure.

Should you be able to overload an arrow function?
 

heeheehee

Developer
Staff member
The other languages allow you to "bind" local variables (if the lambda/arrow function is inside a function) as long as they are "final".
Maybe in Java, but that's actually atypical, as far as I'm aware.
JavaScript:
function makeCounter(a) {
    // `a` below is bound to the provided function argument
    return () =>  a += 1;
}

> counter = makeCounter(5)
() =>  a += 1
> counter()
6
> counter()
7

Closures have the ability to provide a degree of encapsulation and statefulness that can be used to emulate object-oriented programming even if the language does not otherwise support it. I don't know offhand of a compelling reason why I'd need that, though.

And although ASH is perfectly happy to let you declare functions inside other functions (or any scope) (and I do that all the time, to clarify the function code and reduce global namespace bloat), we really cannot let lambdas have access to functions and vars in an inner scope.
I'm guessing this is due to an implementation detail, where Scopes are defined on parse, and we don't create a new Scope for each function call? It seems like a reasonable decision on its face, but may be surprising behavior. It would be good to have clear error messages when that's tripped, although that would complicate the implementation.
 

Veracity

Developer
Staff member
There are a few ash functions that currently accept strings that are the names of functions. There is special handling to let js pass functions as those arguments instead; e.g., in adv1, js users can input a combat filter function as an actual function. e.g., I can do
Code:
function handleCombat(page, foe, round) {
  return "Hello World";
}
Fascinating. In my ASH scripts, I use combat filters defined like this:

Code:
string bust_ghost_filter( int round, monster mon, string page )

Which is to say, its type is function string (int, monster, string).

adv1 directs FightRequest to use it by calling

Code:
      Macrofier.setMacroOverride(filterFunction.toString(), controller);

and Macrofier has code execute the ASH function in the context of the currently executing script.

I can envision adv1 overloads:

Code:
boolean adv1(location, int, string);
boolean adv1(location, int, function string (int, monster, string));

and allowing AshRuntime to accept a Function object and a list of parameters, rather than just a function name and a list of parameters.

That would be a fun little feature - after we have Function objects.

I assume that in a world where ash lets you pass handleCombat directly, we no longer need any of the special handling code for js.

The special code is in javascript/LibraryFunctionStub and the RuntimeLibrary functions of interest are "adventure", "adv1", and "run_combat".
That calls Macrofier.setJavaScriptMacroOverride to have a BaseFunction (and scope and thisArg) such that it can callback your JS function.

Which is to say, for JS, it already sets up the Function to be invoked in the JS runtime.

Since invoking an ASH function in an AshRuntime is quite different than invoking a JS function in a JavascriptRuntime, the special code will still be necessary.

Do we expect that if I defined handleCombat as an arrow function I would still be able to pass its name to adv1 and other functions like that? Or would that only work with traditional functions? I assume that they are more or less the same under the hood and so it would work with arrows or traditional functions, but I'm not sure.

I am not proposing that arrow functions have names. Traditional functions with names can have the same function type. Either can be passed to functions that accept that function type.

For the combat filter functions, if I make versions that accept a function object rather than a function name, they will automatically accept a function with the expected function type, whether it is traditional or arrow.

Should you be able to overload an arrow function?
How would that work? Is the intent, say, to have one function which operates on ints and another on floats? Can you explain your use case a bit more, please?
 

Veracity

Developer
Staff member
Maybe in Java, but that's actually atypical, as far as I'm aware.
JavaScript:
function makeCounter(a) {
    // `a` below is bound to the provided function argument
    return () =>  a += 1;
}

> counter = makeCounter(5)
() =>  a += 1
> counter()
6
> counter()
7
We sort of have something similar to that.
Code:
int counter()
{
    static int count;
    return ++count;
}

print(counter());
print(counter());
print(counter());
yields
Code:
> static-scope

1
2
3
Given that, you could do:
Code:
function int () counter = () -> {
    static int count = 0;
    return ++count;
};
"static" variables are not "final", but they do retain their values after their scope exits.

I could inherit static non-top-level variables into the scope I create for an arrow function.
That would be tricky.

I'm guessing this is due to an implementation detail, where Scopes are defined on parse, and we don't create a new Scope for each function call? It seems like a reasonable decision on its face, but may be surprising behavior. It would be good to have clear error messages when that's tripped, although that would complicate the implementation.
Scopes are created on parse, as you say. They have access to all lexically visible functions and variables, since there is a "parent" scope available at parse time.

If I set the "parent" scope to the top level global scope, as proposed, you'll get compile errors trying to use functions or variables defined in what looks like it should be your parent scope. Java and Javascript have similar special errors when you try to reference a non-final variable.

If I want to add a "closure" feature to arrow functions, that could be added later.
 
How would [overloading arrow functions] work? Is the intent, say, to have one function which operates on ints and another on floats? Can you explain your use case a bit more, please?
I don't actually have an intended use case here, admittedly, it was just an open question I had upon reading your post.

It was mostly motivated by the fact that ash (to my knowledge) lacks union types and generic types, and so I think you'd need an overload to specify a single function that can operate on, say, a skill, item, or familiar. But I think the lack of union and generic types also means there isn't really a scenario where you would need to overload an arrow function, which I guess closes that in a nice little bow.

I sort of imagined a situation where a script would have a list of skills and familiars, and want to check to see if the user has them all. But I don't think we can make a (Skill | Familiar)[int], so you can't actually end up in a scenario where you want a single callback function applied to multiple different data types.

I am not proposing that arrow functions have names. Traditional functions with names can have the same function type. Either can be passed to functions that accept that function type.

I think that makes sense. Only traditional functions can be passed using their names, and arrows can only be passed directly.
 

Ryo_Sangnoir

Developer
Staff member
All the functions coincidentally take two parameters, but I assume that's not necessary ;) A useful function can take any number of parameters, including zero.

A motivating use-case for JavaScript at the moment is chaining filter, sort, map on arrays for quick CLI queries. An example is:

Code:
js Monster.all().filter(m =>itemDropsArray(m).some(x => x.type == "0")).sort((a,b) => (a.id - b.id))

This gets all monsters with unknown, non-conditional drops. In ASH, a long form (as you don't have the helpers some, filter etc.) would be something like:

Code:
monster[int] unknownItemMonsters;
int ind = 0;
foreach m in $monsters[] {
  foreach j, ar in item_drops_array(m) {
    if (ar.type == "0") {
      unknownItemMonsters[ind++] = m;
      break;
    }
  }
}

int id(monster m) {
 return m.id;
}
sort unknownItemMonsters by id(value);
dump(unknownItemMonsters);

and this reveals that there's already something vaguely like a lambda -- the "sort" keyword takes a function of two variables named "index" and "value". It also reveals the benefit lambdas can give to such queries.
 

Veracity

Developer
Staff member
It was mostly motivated by the fact that ash (to my knowledge) lacks union types and generic types, and so I think you'd need an overload to specify a single function that can operate on, say, a skill, item, or familiar. But I think the lack of union and generic types also means there isn't really a scenario where you would need to overload an arrow function, which I guess closes that in a nice little bow.

You can roll your own unions using records.

Code:
record union
{
    string type;
    item item;
    skill skill;
    familiar familiar;
};

union to_union(item obj)
{
    union retval = { type : "item", item : obj };
    return retval;
}

union to_union(skill obj)
{
    union retval = { type : "skill", skill : obj };
    return retval;
}

union to_union(familiar obj)
{
    union retval = { type : "familiar", familiar : obj };
    return retval;
}

string to_string(union u)
{
    switch (u.type) {
    case "item": return u.item;
    case "skill": return u.skill;
    case "familiar": return u.familiar;
    default: return "";
    }
}

union uitem = $item[ seal tooth ];
union uskill = $skill[ Saucegeyser ];
union ufamiliar = $familiar[ Feather Boa Constrictor ];

print(uitem);
print(uskill);
print(ufamiliar);

yields

Code:
> union.ash

seal tooth
Saucegeyser
Feather Boa Constrictor

Notice that if you define A to_A(B), ASH will find your function and auto-coerce from B to A.
That is especially useful where at least one of A and B is a record or a typedef.
(I discovered that it did NOT auto-coerce the "return" statement; return { type : "familiar", familiar : obj } did not work. That seems like a bug.)
 

heeheehee

Developer
Staff member
I could inherit static non-top-level variables into the scope I create for an arrow function.
That would be tricky.

Scopes are created on parse, as you say. They have access to all lexically visible functions and variables, since there is a "parent" scope available at parse time.
I believe the traditional (unoptimized) interpreter implements lexically-scoped closures by creating a new scope on each function call (with the parent scope being where the function is defined), which allows you to capture the value of any provided arguments and local variables.

Given that, you could do:
Code:
function int () counter = () -> {
static int count = 0;
return ++count;
};
"static" variables are not "final", but they do retain their values after their scope exits.
Would I be able to, then, write:

Code:
function int() makeCounter() {
  return () -> {
    static int count = 0;
    return ++count;
  };
}
function int() counterA = makeCounter();
function int() counterB = makeCounter();

print(counterA()); // 0
print(counterA()); // 1
print(counterB()); // 0
print(counterA()); // 2
print(counterB()); // 1
?

To be clear: I don't have a specific use case in mind for this. And I suspect the answer with your proposed implementation is "no, only one Scope will exist, and will have a single static variable shared across all lambdas.
 

Veracity

Developer
Staff member
I like closures. I think I can work my way there in several phases.

1) Arrow functions have a scope, built at compile time, with ASH's lexical scoping rules, which depend on strong typing and a single-pass compiler: functions and variables are available if they have been declared at the time the function scope is compiled (or, for functions, have a forward reference.)

ASH does not hoist variables or functions to the top of the scope they are declared in. I think that doing so depends on either a typeless language (where you don't care about the type of parameters or return values) or requires at least two passes (the first to get all the user defined data types and the signature of all functions.)

For simplicity of implementation, this phase will compile the arrow function as if it is at top level.

2) Arrow function have a scope - and are at top level - but additional variables are added to it before the body is compiled. In particular, those that persist after the current scope finishes - like static variables - and also function parameters. Somehow. They are added as variables at compile time, but are "bound" to the current values when the function is created at execution time? That's not quite "creating a new scope" on function call, but it's similar to how we "bind" function parameters to values when you call the function.

3) Making a closure with statics declared inside the arrow function requires more thought. I reminded myself how I implemented statics: every static is a block, and if it is (or contains) a variable declaration, it marks it "static" and adds it to the parent scope's variables. Presumabley step 2 will have given me an ArrowScope, or something, which will need to have additional handling for static variables. I don't know what or how. Yet.

I think I'm ready to go on this project - with phase 1 arrow functions, which are still useful.
Phase 2 and phase 3 arrow functions can come later and successively add functionality.
 
Top