getGroup: Difference between revisions
No edit summary |
No edit summary |
||
Line 42: | Line 42: | ||
Have a look [http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/png/ here] for an overview. | Have a look [http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/png/ here] for an overview. | ||
Second important thing to know is that a 'group' is defined by '('parenthesis')': "(group1)(group2)(etc.)", where | Second important thing to know is that a 'group' is defined by '('parenthesis')': "(group1)(group2)(etc.)", where group 0 returns the entire search result. | ||
So {{code|\\S}} means grab the first none-whitespace you encounter, {{code|\\S+}} means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace. | So {{code|\\S}} means grab the first none-whitespace you encounter, {{code|\\S+}} means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace. | ||
Hence the regex statement looks for ''{{code|(word)whitespace(word)0 or more whitespace}}'', where every 'parenthesized part' (in this case the 2 '\\S+' aka 'word') is a group. This will deliver 2 matches: {{code|'this is'}} and {{code|'a test'}}. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group {{code|'1'}} will return the first {{code|(\\S+)}} part and group {{code|'2'}} will return the second {{code|(\\S+)}} of the regex statement. These are respectively (for the first match): {{code|'this'}} and {{code|'is'}}. | |||
In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups: | |||
* The first group {{code|'0'}} returns the ENTIRE match. | * The first group {{code|'0'}} returns the ENTIRE match. | ||
* Every group after that will return partial matches that are within {{code|()}}. | * Every group after that will return partial matches that are within {{code|()}}. | ||
[http://www.gskinner.com/RegExr/ Here a link] to test your regex statements (remember that for this applet you only use one {{code|\}} while in MT you need {{code|\\}}. | [http://www.gskinner.com/RegExr/ Here a link] to test your regex statements (remember that for this applet you only use one {{code|\}} while in MT you need {{code|\\}}. |
Revision as of 09:50, 3 July 2013
getGroup() Function
Usage
getGroup(id, match, group)
Where
id
- is the id returned by strfind()match
- is the number of the match found by strfind()group
- is the number of the capture group found by strfind()
Example
[h: id = strfind("this is a test", "(\\S+)\\s(\\S+)\\s*")]
match 1, group 0 = [getGroup(id, 1, 0)]<br>
match 1, group 1 = [getGroup(id, 1, 1)]<br>
match 1, group 2 = [getGroup(id, 1, 2)]<br>
match 2, group 0 = [getGroup(id, 2, 0)]<br>
match 2, group 1 = [getGroup(id, 2, 1)]<br>
match 2, group 2 = [getGroup(id, 2, 2)]<br>
Returns:
match 1, group 0 = this is match 1, group 1 = this match 1, group 2 = is match 2, group 0 = a test match 2, group 1 = a match 2, group 2 = test
Example explained
First off, normally you only need one \
in a regex statement, but because MT uses regex itself and the statement is preparsed you need to double escape it, so \\
.
S
= 'everything that is NOT a whitespace's
= 'whitespace'+
= '1 or more'*
= '0 or more'
Have a look here for an overview.
Second important thing to know is that a 'group' is defined by '('parenthesis')': "(group1)(group2)(etc.)", where group 0 returns the entire search result.
So \\S
means grab the first none-whitespace you encounter, \\S+
means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace.
Hence the regex statement looks for (word)whitespace(word)0 or more whitespace
, where every 'parenthesized part' (in this case the 2 '\\S+' aka 'word') is a group. This will deliver 2 matches: 'this is'
and 'a test'
. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group '1'
will return the first (\\S+)
part and group '2'
will return the second (\\S+)
of the regex statement. These are respectively (for the first match): 'this'
and 'is'
.
In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups:
- The first group
'0'
returns the ENTIRE match. - Every group after that will return partial matches that are within
()
.
\
while in MT you need \\
.