RegEx fun

Winterbay

Active member
In an attempt to learn a bit more about regex and have some fun with the crimbo stuff I whipped up the following scriptlet.
Code:
record godis
{
	int itemid;
	string itemname;
	int amount;
	int value;
	int worth;
};
godis[int] sweets;
int i = 0;

string url = visit_url("crimbo11.php?place=tradeincandy");
matcher candies = create_matcher("<option value=\\\"(\\d{1,4})\\\">(\\w{1,20}|\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20})\\s\\((\\d{1,4})\\)\\s\\((\\d{1,4})\\scandy", url);
while(find(candies))
{
	sweets[i].itemid = to_int(group(candies,0));
	sweets[i].itemid = to_int(group(candies,1));
	sweets[i].itemname = to_string(group(candies,2));
	sweets[i].amount = to_int(group(candies,3));
	sweets[i].value = to_int(group(candies,4));
	sweets[i].worth = sweets[i].value * sweets[i].amount;
	i = i + 1;
}

for j from 0 to i - 1
{
	print("Your " + sweets[j].itemname + " are worth " + sweets[j].worth + " candy credits.");
}

Which gives the following output:
Code:
> candies

The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
The string "
Your Angry Farmer candy are worth 490 candy credits.
Your bananagate are worth 2300 candy credits.
Your bazookafish bubble gum are worth 35 candy credits.
Your black candy heart are worth 500 candy credits.
Your brown sugar cane are worth 15 candy credits.
Your candied kobold are worth 70 candy credits.
Your candy brain are worth 10 candy credits.
Your candy cane are worth 90 candy credits.
Your candy kneecapping stick are worth 100 candy credits.
Your candy knuckles are worth 300 candy credits.
Your candy stake are worth 250 candy credits.
Your chocolate cigar are worth 7500 candy credits.
Your chocolate filthy lucre are worth 470 candy credits.
Your chocolate turtle totem are worth 350 candy credits.
Your Cold Hots candy are worth 20 candy credits.
Your crazy little Turkish delight are worth 70 candy credits.
Your CRIMBCOIDS mints are worth 200 candy credits.
Your Crimbo fudge are worth 25 candy credits.
Your Daffy Taffy are worth 185 candy credits.
Your Elvish delight are worth 20 candy credits.
Your fancy but probably evil chocolate are worth 750 candy credits.
Your fancy chocolate are worth 500 candy credits.
Your fancy chocolate car are worth 1500 candy credits.
Your fudge bunny are worth 1000 candy credits.
Your fudge spork are worth 100 candy credits.
Your Fudgie Roll are worth 75 candy credits.
Your green candy heart are worth 14 candy credits.
Your green gummi ingot are worth 120 candy credits.
Your gummi ammonite are worth 4000 candy credits.
Your gummi belemnite are worth 3000 candy credits.
Your gummi trilobite are worth 4000 candy credits.
Your Gummy Brains are worth 900 candy credits.
Your honey stick are worth 20 candy credits.
Your jawbruiser are worth 500 candy credits.
Your kumquartz are worth 1900 candy credits.
Your lavender candy heart are worth 12 candy credits.
Your licorice garrote are worth 300 candy credits.
Your licorice root are worth 300 candy credits.
Your Lobos Mints are worth 900 candy credits.
Your marzipan skull are worth 120 candy credits.
Your Necbro wafers are worth 550 candy credits.
Your Now and Earlier are worth 100 candy credits.
Your Nuclear Blastball are worth 200 candy credits.
Your orange candy heart are worth 83 candy credits.
Your pack of chewing gum are worth 5 candy credits.
Your pair of pearidot earrings are worth 250 candy credits.
Your peanut brittle shield are worth 150 candy credits.
Your pearidot are worth 2200 candy credits.
Your peppermint sprout are worth 750 candy credits.
Your pile of candy are worth 240 candy credits.
Your pink candy heart are worth 96 candy credits.
Your pixellated candy heart are worth 80 candy credits.
Your red gummi ingot are worth 50 candy credits.
Your Rock Pops are worth 165 candy credits.
Your Senior Mints are worth 10 candy credits.
Your Steal This Candy are worth 40 candy credits.
Your strawberyl are worth 1300 candy credits.
Your Sugar Cog are worth 70 candy credits.
Your sugar shard are worth 150 candy credits.
Your sugar shirt are worth 25 candy credits.
Your Sweet Sword are worth 900 candy credits.
Your Tasty Fun Good rice candy are worth 45 candy credits.
Your tourmalime are worth 2200 candy credits.
Your tourmalime tourniquet are worth 250 candy credits.
Your vitachoconutriment capsule are worth 1250 candy credits.
Your white candy heart are worth 107 candy credits.
Your white chocolate chips are worth 240 candy credits.
Your yellow candy heart are worth 144 candy credits.
Your Yummy Tummy bean are worth 25 candy credits.

So... Where does the fifth group come from? I am almost certain I have only 4 in the matcher.
Also, the message "The string "" isn't very helpful, what is it supposed to mean?

Also, also, is there a better way to match 1-5 words of unkown lenght?
 

StDoodle

Minion
If you don't know how many matches you're going to make, I recommend trimming off as much as you can that isn't a match first (with another matcher, maybe), then using while(matcher.find()) to loop through the rest. If you need more help I'd be happy to look closer tonight after work, but the nature of this means I'd really need to look at the source for the page being visited.
 

slyz

Developer
Code:
(\\w{1,20}|\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}|\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20}\\s\\w{1,20})
:(

EDIT: the "The string "" message is because of an extra slash after "value=".
Here is the regex I used, you can modify it to suit your needs:
PHP:
string html = visit_url( "crimbo11.php?place=tradeincandy" );
matcher opt_matcher = create_matcher( "<option[^>]+>(.+?) \\((?:\\d+)\\) \\((\\d+) candy credit each\\)</option>", html );

record candyinfo {
	item it;
	float credit;
	int price;
};

candyinfo [ int ] candy_credit;

int c = 0;
while ( opt_matcher.find() )
{
	candy_credit[ c ].it = opt_matcher.group( 1 ).to_item();
	candy_credit[ c ].credit = opt_matcher.group( 2 ).to_int();
	candy_credit[ c ].price = candy_credit[ c ].it.get_price();
	c += 1;
}
 
Last edited:

Winterbay

Active member
The problem is that if I remove that \ (and make it just two instead of the three) I get an error message:
Code:
Expected ), found ( (candies.ash, line 13)

Ok, so refined a bit:
Code:
matcher candies = create_matcher("(\\d{1,4})\">.*?\\s\\((\\d{1,4})\\)\\s\\((\\d{1,4})\\scandy credit each\\)", url);

It still gives me one group more than I think I should get (i.e. 4 instead of 3) with the first group being the entire matcher. So that:
Code:
group 0: 617">Angry Farmer candy (98) (5 candy credit each)
group 1: 617
group 2: 98
group 3: 5
 
Last edited:

Veracity

Developer
Staff member
group 0 is defined to be "the entire matched string". groups 1 - n are the various groups within your pattern.
 

xKiv

Active member
Isn't group "0" supposed to be the entire matched part of the string? (which it is, in your example: '617">Angry Farmer candy (98) (5 candy credit each)')

A way to get unknown amount of words is probably something like
Code:
((\\w+\\s)*\\w+)
and then ignore the inner group while getting results.
Also, why are you limiting word-length to 20?
 

Winterbay

Active member
Isn't group "0" supposed to be the entire matched part of the string? (which it is, in your example: '617">Angry Farmer candy (98) (5 candy credit each)')

A way to get unknown amount of words is probably something like
Code:
((\\w+\\s)*\\w+)
and then ignore the inner group while getting results.
Also, why are you limiting word-length to 20?

I limited it to 20 because I couldn't find a single word in the candy-list that had more than 20 characters per word so felt that it would encompass all of it :)
 

slyz

Developer
Since you know there is always a number between parenthesis after the item name, and a > just before, you can just do:
Code:
>(.+?) \\((?:\\d+)\\)

EDIT: added the >
 
Last edited:
Top