getGroup: Difference between revisions
m (Typographical tweaks) |
No edit summary |
||
Line 35: | Line 35: | ||
'''Example explained''' | '''Example explained''' | ||
First off, normally you only need one {{code|\}} in a regex statement, but because MT uses regex itself and the statement is preparsed you need to double escape it, so {{code|\\}}. | First off, escapes = "\" are used to let the character in question NOT be what it usually is. E.g. "d" is the alphabetical character "d"; "\d" however is thus NOT "d" and with that it gets a 'regex' meaning, in this case 'digit', so 1,2,3,4,5,6,7,8,9 or 0. The same the other way round, e.g. "." means "any character" if you actually want to find a "." in the text you thus use \. so its NOT the regex "any character" but just a ".". | ||
Now the tricky bit: in maptool ALL escapes ("\") are eaten by the maptool parser UNLESS they are preceded by an escape themselves. This happens BEFORE the regex is parsed by the regex parser. THUS ALL ESCAPES MUST BE ESCAPED !! So in the above examples "\d" becomes "\\d" and "\." becomes "\\.". Really trick it becomes when you want to find the "\" character. This is a regex symbol hence it needs to be escaped: "\\" but as its in maptool every escape must be escaped so it becomes "\\\\"! | |||
normally you only need one {{code|\}} in a regex statement, but because MT uses regex itself and the statement is preparsed you need to double escape it, so {{code|\\}}. | |||
*{{code|S}} = 'everything that is NOT a whitespace' | *{{code|S}} = 'everything that is NOT a whitespace' | ||
*{{code|s}} = 'whitespace' | *{{code|s}} = 'whitespace' |
Revision as of 11:39, 23 July 2015
getGroup() Function
Usage
getGroup(id, match, group)
Where
id
- is the id returned by strfind()match
- is the number of the match found by strfind()group
- is the number of the capture group found by strfind()
Example
[h: id = strfind("this is a test", "(\\S+)\\s(\\S+)\\s*")]
match 1, group 0 = [getGroup(id, 1, 0)]<br>
match 1, group 1 = [getGroup(id, 1, 1)]<br>
match 1, group 2 = [getGroup(id, 1, 2)]<br>
match 2, group 0 = [getGroup(id, 2, 0)]<br>
match 2, group 1 = [getGroup(id, 2, 1)]<br>
match 2, group 2 = [getGroup(id, 2, 2)]<br>
Returns:
match 1, group 0 = this is match 1, group 1 = this match 1, group 2 = is match 2, group 0 = a test match 2, group 1 = a match 2, group 2 = test
Example explained
First off, escapes = "\" are used to let the character in question NOT be what it usually is. E.g. "d" is the alphabetical character "d"; "\d" however is thus NOT "d" and with that it gets a 'regex' meaning, in this case 'digit', so 1,2,3,4,5,6,7,8,9 or 0. The same the other way round, e.g. "." means "any character" if you actually want to find a "." in the text you thus use \. so its NOT the regex "any character" but just a ".".
Now the tricky bit: in maptool ALL escapes ("\") are eaten by the maptool parser UNLESS they are preceded by an escape themselves. This happens BEFORE the regex is parsed by the regex parser. THUS ALL ESCAPES MUST BE ESCAPED !! So in the above examples "\d" becomes "\\d" and "\." becomes "\\.". Really trick it becomes when you want to find the "\" character. This is a regex symbol hence it needs to be escaped: "\\" but as its in maptool every escape must be escaped so it becomes "\\\\"!
normally you only need one \
in a regex statement, but because MT uses regex itself and the statement is preparsed you need to double escape it, so \\
.
S
= 'everything that is NOT a whitespace's
= 'whitespace'+
= '1 or more'*
= '0 or more'
Have a look here for an overview.
Second important thing to know is that a group is defined by '('
parenthesis')'
: (group1)(group2)(etc.)
, where group '0'
returns the entire search result.
So \\S
means grab the first none-whitespace you encounter, \\S+
means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace.
Hence the regex statement looks for (word)whitespace(word)0 or more whitespace
, where every 'parenthesized part' (in this case the 2 \\S+
aka "word") is a group. This will deliver 2 matches: 'this is'
and 'a test'
. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group '1'
will return the first (\\S+)
part and group '2'
will return the second (\\S+)
of the regex statement. These are respectively (for the first match): 'this'
and 'is'
.
In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups:
- The first group
'0'
returns the ENTIRE match. - Every group after that will return partial matches that are within
()
.
\
while in MT you need \\
.