getGroup: Difference between revisions

From RPTools Wiki
Jump to navigation Jump to search
No edit summary
m (Typographical tweaks)
Line 42: Line 42:
Have a look [http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/png/ here] for an overview.
Have a look [http://www.addedbytes.com/download/regular-expressions-cheat-sheet-v2/png/ here] for an overview.


Second important thing to know is that a 'group' is defined by '('parenthesis')': "(group1)(group2)(etc.)", where group 0 returns the entire search result.  
Second important thing to know is that a group is defined by {{code|'('}}parenthesis{{code|')'}}: {{code|(group1)(group2)(etc.)}}, where group {{code|'0'}} returns the entire search result.  


So {{code|\\S}} means grab the first none-whitespace you encounter, {{code|\\S+}} means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace.
So {{code|\\S}} means grab the first none-whitespace you encounter, {{code|\\S+}} means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace.
Hence the regex statement looks for ''{{code|(word)whitespace(word)0 or more whitespace}}'', where every 'parenthesized part' (in this case the 2 '\\S+' aka 'word') is a group. This will deliver 2 matches: {{code|'this is'}} and {{code|'a test'}}. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group {{code|'1'}} will return the first {{code|(\\S+)}} part and group {{code|'2'}} will return the second {{code|(\\S+)}} of the regex statement. These are respectively (for the first match): {{code|'this'}} and {{code|'is'}}.
Hence the regex statement looks for ''{{code|(word)whitespace(word)0 or more whitespace}}'', where every 'parenthesized part' (in this case the 2 {{code|\\S+}} aka ''"word"'') is a group. This will deliver 2 matches: {{code|'this is'}} and {{code|'a test'}}. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group {{code|'1'}} will return the first {{code|(\\S+)}} part and group {{code|'2'}} will return the second {{code|(\\S+)}} of the regex statement. These are respectively (for the first match): {{code|'this'}} and {{code|'is'}}.


In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups:
In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups:

Revision as of 11:21, 3 July 2013

getGroup() Function

Introduced in version 1.3b48
Returns the specified capture group for the specified match that was found using strfind().

Usage

getGroup(id, match, group)

Where

  • id - is the id returned by strfind()
  • match - is the number of the match found by strfind()
  • group - is the number of the capture group found by strfind()

Example

[h: id = strfind("this is a test", "(\\S+)\\s(\\S+)\\s*")]
match 1, group 0 = [getGroup(id, 1, 0)]<br>
match 1, group 1 = [getGroup(id, 1, 1)]<br>
match 1, group 2 = [getGroup(id, 1, 2)]<br>
match 2, group 0 = [getGroup(id, 2, 0)]<br>
match 2, group 1 = [getGroup(id, 2, 1)]<br>
match 2, group 2 = [getGroup(id, 2, 2)]<br>

Returns:

match 1, group 0 = this is
match 1, group 1 = this 
match 1, group 2 = is 
match 2, group 0 = a test
match 2, group 1 = a 
match 2, group 2 = test 

Example explained

First off, normally you only need one \ in a regex statement, but because MT uses regex itself and the statement is preparsed you need to double escape it, so \\.

  • S = 'everything that is NOT a whitespace'
  • s = 'whitespace'
  • + = '1 or more'
  • * = '0 or more'

Have a look here for an overview.

Second important thing to know is that a group is defined by '('parenthesis')': (group1)(group2)(etc.), where group '0' returns the entire search result.

So \\S means grab the first none-whitespace you encounter, \\S+ means grap the first none-whitespace you encounter AND ALL characters after that until you encounter a whitespace. Hence the regex statement looks for (word)whitespace(word)0 or more whitespace, where every 'parenthesized part' (in this case the 2 \\S+ aka "word") is a group. This will deliver 2 matches: 'this is' and 'a test'. The first match is match 1, the second match 2. Where each match again consists out of 3 groups. Group '1' will return the first (\\S+) part and group '2' will return the second (\\S+) of the regex statement. These are respectively (for the first match): 'this' and 'is'.

In summary: a search result can have multiple matches, and each match can consist out of 1 or more groups:

  • The first group '0' returns the ENTIRE match.
  • Every group after that will return partial matches that are within ().
Here a link to test your regex statements (remember that for this applet you only use one \ while in MT you need \\.