Zarqon's Challenging Regex Challenge Betwixt Challengingness!

zarqon

Well-known member
Ok, my non-regex-phobic scripting friends! The time has come.

I am issuing a challenge.

Write either 1) a matcher, or 2) a contains_text() wrapper function that accounts for the Sword Behind. This would have a wide variety of uses and would almost certainly be added to ZLib as a contains_text() alternative. Presently, scripters need to be careful to avoid matching strings that contain the prepositions listed here.

For example in the following text:

Beside the river and in the woods, of grandmother's house we go.

If you are wearing the sword, that.contains_text("Over the river and through the woods") will return false. But this wrapper function should account for the sword's swapping of prepositions and return true.

I'm pretty sure this is not solely a regex problem, but will undoubtedly involve regexes in the solving. It gets even more interesting when you consider that prepositions with punctuation following do not match! Good luck!
 
Basically just be taking the full list of possible prepositions and removing them from the list of possibilities, both in the original comparison and the match text... right?
 
Good algorithm. But be careful. In

"Grandmother's house is where we are going to."

The "to" does not match because it has a punctuation mark. Which additional characters prevent the match? Is it only punctuation? Does capitalization matter? These things will take a little poking.
 
Since the sword works in chat you could possibly test a bunch of sentences and see what gets changed and how in order to figure that sort of thing out, right?
I would be willing to test things for you if needed, my clan chat is almos talways empty anyway so I wouldn't bother anyone :)
 
Here's a start:
PHP:
string preposition_expression = "\\b(?:about|above|across|after|against|along|among|around|at|before|behind|below||beneath|beside|between|beyond|by|down|during|except|for|from|in|inside|into|like|near|of|off|on|onto|out|outside|over|past|through|throughout|to|under|up|upon|with|within|without)\\b";

//Arbitrary choice of "above" for designating prepositions
boolean hasGameText(string garbledText, string matchText)
{
    garbledText = create_matcher(preposition_expression, garbledText).replace_all("above");
    matchText = create_matcher(preposition_expression, matchText).replace_all("above");
    return contains_text(garbledText, matchText);
}

I hope passing a buffer to this function does not change its contents. Also, there has to be a better way than replacing every preposition with "above".
 
Last edited:
This should work:
Code:
boolean contains_text_sbip( string haystack , string needle )
{
	boolean[string] preps;
	preps["about"] = true;
	preps["above"] = true;
	preps["across"] = true;
	preps["after"] = true;
	preps["against"] = true;
	preps["along"] = true;
	preps["among"] = true;
	preps["around"] = true;
	preps["at"] = true;
	preps["before"] = true;
	preps["behind"] = true;
	preps["below"] = true;
	preps["beneath"] = true;
	preps["beside"] = true;
	preps["between"] = true;
	preps["beyond"] = true;
	preps["by"] = true;
	preps["down"] = true;
	preps["during"] = true;
	preps["except"] = true;
	preps["for"] = true;
	preps["from"] = true;
	preps["in"] = true;
	preps["inside"] = true;
	preps["into"] = true;
	preps["like"] = true;
	preps["near"] = true;
	preps["of"] = true;
	preps["off"] = true;
	preps["on"] = true;
	preps["onto"] = true;
	preps["out"] = true;
	preps["outside"] = true;
	preps["over"] = true;
	preps["past"] = true;
	preps["through"] = true;
	preps["throughout"] = true;
	preps["to"] = true;
	preps["under"] = true;
	preps["up"] = true;
	preps["upon"] = true;
	preps["with"] = true;
	preps["within"] = true;
	preps["without"] = true;
	
	boolean[string] punct;
	punct["!"] = true;
	punct["@"] = true;
	punct["#"] = true;
	punct["$"] = true;
	punct["%"] = true;
	punct["^"] = true;
	punct["&"] = true;
	punct["*"] = true;
	punct["("] = true;
	punct[")"] = true;
	punct["_"] = true;
	punct["+"] = true;
	punct["{"] = true;
	punct["}"] = true;
	punct["|"] = true;
	punct["["] = true;
	punct["]"] = true;
	punct["\\"] = true;
	punct[";"] = true;
	punct["'"] = true;
	punct[":"] = true;
	punct["\""] = true;
	punct["<"] = true;
	punct[","] = true;
	punct[">"] = true;
	punct["."] = true;
	punct["?"] = true;
	punct["/"] = true;
	needle = needle.replace_string( "." , "\\." ).replace_string( "[" , "\\[" ).replace_string( "]" , "\\]" ).replace_string( "\\" , "\\\\" ).replace_string( "*" , "\\*" ).replace_string( "+" , "\\+" ).replace_string( "^" , "\\^" ).replace_string( "$" , "\\$" ).replace_string( "(" , "\\(" ).replace_string( ")" , "\\)" );
	boolean is_prep = false;
	boolean found_punct = false;
	foreach prep,val in preps
	{
		found_punct = false;
		is_prep = false;
		foreach punct,val2 in punct
		{
			if( needle.contains_text( prep ) && !found_punct )
			{
				is_prep = true;
			}
			if( needle.contains_text( prep + punct ) )
			{
				is_prep = false;
				found_punct = true;
			}
			else if( needle.containst_text( punct + prep ) )
			{
				is_prep = false;
				found_punct = true;
			}
		}
		if( is_prep )
		{
			needle = needle.replace_string( prep , "\\w+" );
		}
	}
	matcher m = create_matcher( needle , haystack );
	return m.find();
}

It fails your test in the original post, but only because in game, any preposition with a capital letter in it is ignored. Only fully lowercase prepositions with no punctuation surrounding them are changed. I included all the punctuation I could type on my keyboard, but I'm sure more probably exist.

EDIT: Forgot "(" and ")" in my giant replace_string() block to get correct possible regex symbols
 
Last edited:
Here is a version that ignores prepositions that have punctuations before or after them:

PHP:
string preposition_expression = "(?<![^\\w\\s])\\b(?:about|above|across|after|against|along|among|around|at|before|behind|below||beneath|beside|between|beyond|by|down|during|except|for|from|in|inside|into|like|near|of|off|on|onto|out|outside|over|past|through|throughout|to|under|up|upon|with|within|without)\\b(?![^\\w\\s])";

//Arbitrary choice of "above" for designating prepositions
boolean hasGameText(string garbledText, string matchText)
{
    garbledText = create_matcher(preposition_expression, garbledText).replace_all("above");
    matchText = create_matcher(preposition_expression, matchText).replace_all("above");
    return contains_text(garbledText, matchText);
}

Though I don't think fixing punctuation is necessary.
PHP:
hasGameText("Grandmother's house without where we are going to.", "house is where we are going to.");
This function call will return true no matter whether the "to." is replaced. I'd rather not include lookaheads/lookbehinds in my regular expressions.

@Alhifar: I'm afraid your code has some problems; using "\\w+" to match arbitrary prepositions is not a good idea.
 
Silly Zarqon and his regex-phobia, it was as simple as:
PHP:
string preposition_expression = "\\b(?:about|above|across|after|against|along|among|around|at|before|behind|below||beneath|beside|between|beyond|by|down|during|except|for|from|in|inside|into|like|near|of|off|on|onto|out|outside|over|past|through|throughout|to|under|up|upon|with|within|without)\\b";

So: \b for "word boundary" (whitespace or start/end of string), (?: ... ) matches without creating a capture group, and the OR syntax is clear.

I think you might have answered this already, but why don't you like/trust RegEx?

EDIT: oh well, I the regex was edited...
 
Last edited:
Here is a version that ignores prepositions that have punctuations before or after them:

PHP:
string preposition_expression = "(?<![^\\w\\s])\\b(?:about|above|across|after|against|along|among|around|at|before|behind|below||beneath|beside|between|beyond|by|down|during|except|for|from|in|inside|into|like|near|of|off|on|onto|out|outside|over|past|through|throughout|to|under|up|upon|with|within|without)\\b(?![^\\w\\s])";

//Arbitrary choice of "above" for designating prepositions
boolean hasGameText(string garbledText, string matchText)
{
    garbledText = create_matcher(preposition_expression, garbledText).replace_all("above");
    matchText = create_matcher(preposition_expression, matchText).replace_all("above");
    return contains_text(garbledText, matchText);
}

Question: Why are there 2 | between below and beneath?
 
When I was a small child, I was bitten by a very mean regex (which the owner had probably mistreated). Ever since then, I've never been able to trust them.

The other reason is that I've never taken the time to really try to understand them. And from a certain point of view, they obfuscate code. The advantages are usually worth it, but mafia scripting isn't one of those areas where a few ms of processing time or a few bytes of memory really matters.

Thank you all, my fearless friends. This is lovely, and I will shortly insert this into excise() using tongs and thick-ply rubber gloves. I will possibly also add a contains_game_text().
 
Back
Top