Character Pattern

Pat Abort

Syntax: Pat Abort()

Description: Generates a pattern value that causes the entire match to fail immediately with no further backup and retry.

JMP Version Added: Before version 14


source = "xxxxx";
n = 0;
pattern = Pat Succeed() + Pat Arb() >> xs + Expr(
    Show( xs );
    n = n + 1;
    If( n > 16,
        Pat Abort(),
        Pat Fail()
    );
);
rc = Pat Match( source, pattern, NULL, FULLSCAN );

Pat Altern

Syntax: Pat Altern( pat1, pat2, ... )

Description: Generates a pattern value that matches any one of the supplied patterns. Generally written as pat1 | pat2 | ....

JMP Version Added: Before version 14


Pat Match(
    "123456789",
    ((Pat Pos( 2 ) + "1") | (Pat Pos( 1 ) + "2") | (Pat Pos( 0 ) + "3")) >> result
);
result;

Pat Any

Syntax: Pat Any( string )

Description: Generates a pattern value that will match any one character in the string.

JMP Version Added: Before version 14


operators = Pat Any( "*+-/" );
text = "abc+def";
Pat Match( text, operators >> op );
op;

Pat Arb

Syntax: Pat Arb( pattern )

Description: Generates a pattern value that matches zero or more characters.

JMP Version Added: Before version 14


Pat Match(
    "123nonnumeric456",
    Pat Span( "0123456789" ) + Pat Arb() >> result + Pat Span( "0123456789" )
);
result;

Pat Arb No

Syntax: Pat Arb No( pattern )

Description: Generates a pattern value that matches its argument zero or more times. Same as patRepeat(pattern,0,infinity,RELUCTANT); (*? in regex).

JMP Version Added: Before version 14


Pat Match(
    "xyz aaaaabbbbbb@ccc no c is matched because reluctant",
    Pat Arb No( "a" ) >> a + Pat Arb No( "b" ) >> b + "@" + Pat Arb No( "c" ) >> c
);
" a=" || a || " b=" || b || " c=" || c;

Pat At

Syntax: Pat At( variable )

Description: Generates a pattern value that matches zero characters and assigns the current cursor position to variable. Generally written as patpos()>>variable.

JMP Version Added: Before version 14


Pat Match( "123456789", Pat Len( 2 ) + Pat At( result ) );
result;

Pat Break

Syntax: Pat Break( string )

Description: Generates a pattern value that matches zero or more characters not in the string, stopping before a (required) character in the string.

JMP Version Added: Before version 14


b = "- ";
Pat Match( "one two three-", Pat Repeat( Pat Break( b ) >> word + Pat Any( b ) ) );
word;

Pat Concat

Syntax: Pat Concat( pat1, pat2, ... )

Description: Generates a pattern value that matches each of the supplied patterns in turn. Generally written as pat1 + pat2 + ....

JMP Version Added: Before version 14


num = Pat Break( "," );
sep = ",";
Pat Match( "1.3,7.9,8.66", num + sep + num >> result + sep + num );
result;

Pat Conditional

Syntax: Pat Conditional( pattern, variable )

Description: Generates a pattern value that matches the supplied pattern and stores the matched text in variable on success. Generally written as pattern >? variable.

JMP Version Added: Before version 14


a = "unchanged";
b = "unchanged";
Pat Match( "123456789", (Pat Len( 2 ) >? a | Pat Len( 1 ) >? b) + "2" );
" a=" || a || " b=" || b;

Pat Fail

Syntax: Pat Fail()

Description: Generates a pattern value that always fails to match going forward, forcing the matcher to retry alternatives.

JMP Version Added: Before version 14


source = "xxxxx";
n = 0;
pattern = Pat Succeed() + Pat Arb() >> xs + Expr(
    Show( xs );
    n = n + 1;
    If( n > 16,
        Pat Abort(),
        Pat Fail()
    );
);
rc = Pat Match( source, pattern, NULL, FULLSCAN );

Pat Fence

Syntax: Pat Fence()

Description: Generates a pattern value that matches zero characters going forwards and fails when backing up, causing the match to fail. Also used to trim down the pattern backup stack.

JMP Version Added: Before version 14


rc = Pat Match( "123456789", (Pat Len( 1 ) | Pat Len( 2 )) >> result + Pat Fence() + "3" );
"rc=" || Char( rc ) || " result=" || result;

Pat Immediate

Syntax: Pat Immediate( pattern, variable )

Description: Generates a pattern value that matches the supplied pattern and immediately stores the matched text in variable. Generally written as pattern >> variable.

JMP Version Added: Before version 14


a = "unchanged";
b = "unchanged";
Pat Match( "123456789", (Pat Len( 2 ) >> a | Pat Len( 1 ) >> b) + "2" );
" a=" || a || " b=" || b;

Pat Len

Syntax: Pat Len( n )

Description: Generates a pattern value that matches n characters.

JMP Version Added: Before version 14


Pat Match( "123456789", Pat Len( 2 ) + Pat Len( 3 ) >> result );
result;

Pat Look Ahead

Syntax: Pat Look Ahead( pattern, <0|1> )

Description: A zero width pattern match after the current position. The second optional argument defaults to 0. 1 designates a negative match, or a non-match.

JMP Version Added: Before version 14

Example 1


Test = "These are Bob's sons' nails.";
While( /* repeat the match until it fails */Pat Match(
        Test,
        "s" + Pat Look Ahead( "'" ),
        "z"
    ), /* find an s that IS followed by an apostrophe and replace it with z */
    Print( test )
);

Example 2


Test = "These are Bob's sons' nails.";
While( /* repeat the match until it fails */Pat Match(
        Test,
        "s" + Pat Look Ahead( "'", 1 ),
        "z"
    ), /* find an s that is NOT followed by an apostrophe and replace it with z */
    Print( test )
);

Example 3


Test = "a bb ccc dddd";
While( /* keep repeating the match until it won't match */
    Pat Match(
        Test,
        Pat Len( 1 ) >> xxx/* find any character */
        + Pat Look Behind( Expr( xxx ) + Expr( xxx ) ) /* back up 2 positions, which includes the character just found */
        + Pat Look Ahead( Expr( xxx ) /* and look ahead one position */ ),
        "@" /* replacement for the middle character of a triple */
    ),
    Print( test ) /* show each intermediate result */
);

Pat Look Behind

Syntax: Pat Look Behind( pattern, <0|1> )

Description: A zero width pattern match before the current position. The second optional argument defaults to 0. 1 designates a negative match, or a non-match.

JMP Version Added: Before version 14

Example 1


Test = "These are Bob's sons' nails.";
While( /* repeat the match until it fails */Pat Match(
        Test,
        Pat Look Behind( "'" ) + "s",
        "z"
    ), /* find an s that IS preceded by an apostrophe and replace it with z */Print( test )
);

Example 2


Test = "These are Bob's sons' nails.";
While( /* repeat the match until it fails */Pat Match(
        Test,
        Pat Look Behind( "'", 1 ) + "s",
        "z"
    ), /* find an s that is NOT preceded by an apostrophe and replace it with a z */
    Print( test )
);

Example 3


Test = "a bb ccc dddd";
While( /* keep repeating the match until it won't match */
    Pat Match(
        Test,
        Pat Len( 1 ) >> xxx/* find any character */
        + Pat Look Behind( Expr( xxx ) + Expr( xxx ) ) /* back up 2 positions, which includes the character just found */
        + Pat Look Ahead( Expr( xxx ) /* and look ahead one position */ ),
        "@" /* replacement for the middle character of a triple */
    ),
    Print( test ) /* show each intermediate result */
);

Pat Match

Syntax: Pat Match( source, pattern, <replacement> )

Description: Executes the pattern match in the pattern variable against the string in the source variable; optional replacement text replaces the matched text.

JMP Version Added: Before version 14


string = "John Smith";
Pat Match(
    string,
    Pat Break( " " ) >> first + Pat Span( " " ) + Pat Rem() >> last,
    last || ", " || first
);
string;

Pat Not Any

Syntax: Pat Not Any( string )

Description: Generates a pattern value that will match any one character not in the string.

JMP Version Added: Before version 14


delimiter = ";,-";
text = "fish,dog,cat,";
Pat Match( text, Pat Repeat( Pat Not Any( delimiter ) ) >> word + Pat Any( delimiter ) );
word;

Pat Pos

Syntax: Pat Pos( n )

Description: Generates a pattern value that matches zero characters if the cursor is at position n. With no argument, the Pat Pos() function returns the cursor position for >> or >? assignment: patpos()>>variable.

JMP Version Added: Before version 14


Pat Match(
    "ab3defghi",
    Pat Pos( 2 ) + Pat Len( 1 ) >> v/*v=3*/+ Expr( Pat Len( v ) )
    +Pat Pos( /* no argument returns current position = 6 */ ) >> result
);
result;

Pat R Pos

Syntax: Pat R Pos( n )

Description: Generates a pattern value that matches zero characters if the cursor is n characters from the end.

JMP Version Added: Before version 14


Pat Match( "quick brown fox", Pat R Pos( 3 ) + Pat Rem() >> result );
result;

Pat R Tab

Syntax: Pat R Tab( n )

Description: Generates a pattern value that matches zero or more characters to move the cursor forward to n characters before the end.

JMP Version Added: Before version 14


Pat Match( "123456789", "23" + Pat R Tab( 2 ) >> result );
result;

Pat Regex

Syntax: Pat Regex( string )

Description: Generates a pattern value that matches the regular expression in the string.

JMP Version Added: Before version 14


string = "John Smith";
Regex Match( string, Pat Regex( "([^ ]+)([ ]+)([^ ]+)" ), "\3, \1" );
string;

Pat Rem

Syntax: Pat Rem()

Description: Generates a pattern value that matches the remainder of the text.

JMP Version Added: Before version 14


Pat Match( "the quick fox", Pat R Pos( 3 ) + Pat Rem() >> result );
result;

Pat Repeat

Syntax: Pat Repeat( pattern, <min=1>, <max=infinity>, <GREEDY or RELUCTANT=GREEDY> )

Description: Generates a pattern value that matches the supplied pattern between min and max times.

JMP Version Added: Before version 14


Pat Match(
    "xyz aaaaabbbbbbccc 3 c is matched because greedy",
    Pat Repeat( "a" ) >> a + Pat Repeat( "b" ) >> b + Pat Repeat( "c" ) >> c
);
" a=" || a || " b=" || b || " c=" || c;

Pat Span

Syntax: Pat Span( string )

Description: Generates a pattern value that matches one or more characters in the string.

JMP Version Added: Before version 14


sp = Pat Span( "0123456789.-" );
Pat Match( "junk=-33.44e33", sp >> result );
result;

Pat String

Syntax: Pat String( string )

Description: Generates a pattern value that matches the string. Generally the string can be used without using the Pat String() function.

JMP Version Added: Before version 14


x = Pat String( "a" || "b" );
Pat Match(
    "acbdbababc",
    Pat Arb() >> before + Pat Repeat( x ) >> match + Pat Rem() >> after
);
"before=" || before || " match=" || match || " after=" || after;

Pat Succeed

Syntax: Pat Succeed()

Description: Generates a pattern value that always matches zero characters, even when backing up.

JMP Version Added: Before version 14


source = "xxxxx";
n = 0;
pattern = Pat Succeed() + Pat Arb() >> xs + Expr(
    Show( xs );
    n = n + 1;
    If( n > 16,
        Pat Abort(),
        Pat Fail()
    );
);
rc = Pat Match( source, pattern, NULL, FULLSCAN );

Pat Tab

Syntax: Pat Tab( n )

Description: Generates a pattern value that matches zero or more characters to move the cursor forward to position n.

JMP Version Added: Before version 14


Pat Match( "123456789", "23" + Pat Tab( 6 ) >> result );
result;

Pat Test

Syntax: Pat Test( expression )

Description: Generates a pattern value that matches zero characters if the expression is nonzero. The expression is re-evaluated during each test, as if Expr() was used.

JMP Version Added: Before version 14


nCats = 0;
whichCat = 3;
string = "catch a catnapping cat in a catsup factory";
rc = Pat Match(
    string,
    "cat" + Pat Test(
        nCats = nCats + 1;
        nCats == whichCat;
    ),
    "dog"
);
string;

Regex Match

Syntax: Regex Match( source, pattern, <replacement | NULL>, <MATCHCASE> )

Description: Executes a regular expression match and returns a list of the entire matched text and the matches for each back reference created by an open parenthesis. Optionally, the third argument specifies a replacement string for the entire match; the replacement string can use back references.

JMP Version Added: Before version 14



source = "believe";
// [aeiou] matches exactly one vowel
// .*? is a reluctant (vs greedy) match. try it without the ? to see the greedy behavior
// \1 is a back reference to the first ( group -- [aeiou] is inside the first ( group
matches = Regex Match(
    source, // a variable allows updating some text
    "([aeiou])(.*?)(\1)", // a regex with parens makes back references
    ">\2<" // the match is replaced by text that uses a back reference
);
Show( source, matches );
// results:
// source = "b>li<ve";
// matches = {"elie", "e", "li", "e"};
// notes:
// matches[1] is the entire match AND the part that will be replaced
// matches[2] is back ref \1  this is the letter e matched by [aeiou]
// matches[3] is back ref \2  this is the letter li matched by .*?
// matches[4] is back ref \3  this is another letter e match by \1, which was an e
//
// the * operator is greedy by default, taking as many characters as it can, and
// only backing up if required. Adding the ? makes it reluctant, taking characters
// one at a time and allowing the remaining pattern to have a chance earlier.