dcsimg

What are Regular Expressions?

Products

Webtrends Analytics 8.x
Webtrends Analytics 9.x

Cause

Regular expressions provide a powerful means for matching patterns of characters. Regular expressions (REs) are understood by a number of commands including ed, ex, sed, awk, grep, egrep, expr and even vi.

Resolution

Building Regular Expressions

Most regular expressions you will ever need to use are very simple, often
consisting of a few basic elements.

Example 1:
If you wanted to match all of the values that begin with “couch,” your regular expression would be as follows:

^couch

Example 2:
If you wanted to match all the values that end with “couch,” your regular expression would be as follows:

couch$

Example 3:
In some cases, you may have an either/or situation. In this case you would use the pipe symbol (|) to combine two regular expressions.
For example, couch|chair would match a value containing either couch or (|) chair, i.e. blue_chair, chair_55, big_couch_55, etc.

Example 4:
In this example, you are trying to match three months of your product news. You might use the following regular expression to define a qualifying page URL that contains any product news HTML pages from January, February or March:
/product/news/(jan|feb|mar)/.+.htm

Literally, this reads:

Match any item (most likely a URL), containing the following:

/product/news/ , followed by either jan, feb, or mar , followed by / and one
or more of any character (.+), followed by .htm.

This would return the following URLs:

/product/news/jan/chair.htm
/product/news/feb/mirror.htm
/product/news/mar/couch.htm
/product/news/jan/table.htm
/product/news/jan/table.html

but not:

/product/news/jan/chair.asp
/product/news/jan/chair.gif
/product/news/jan/.htm
/product/news/apr/chair.htm

Example 5:
In this example, we want to match all URLs that indicate that an individual product in the furniture category has been registered. We would use the following regular expression to define our qualifying page URL: ^/product/furniture/.+/register.htm

Literally, this reads:
Match all URLs that begin with /product/furniture/, followed by one or more occurrences of any character, followed by /register.htm.

The following URLs would be matched:

/product/furniture/couch/register.htm
/product/furniture/chair/register.htm
/product/furniture/couch/register.htm
/product/furniture/bedroom/armoire/register.htm

but not

/product/furniture/index.htm

Comparing Regular Expressions with Wildcards

Refer to the table below to see how you might use a wildcard or regular expression to .accomplish the same thing.

*Wildcard (*) * *Regular *Meaning*
Expression*
*chair* chair contains chair
*chair chair$ ends with chair
chair* ^chair begins with chair
chair (no wildcard) ^chair$ is exactly chair

Matching Order Rules

There are several rules involved with how regular expression matching
occurs:
1. The first match found takes priority over other matches found if there are two matching input strings.
2. The left-most match takes priority in a list of concatenated expressions.
3. The matches found using *, +, and ? are considered longest first.
4. Nested constructs are evaluated from the outside in.

See More Information for a list of regular expressions.

More Information

 *Regular Expression Syntax* The basic element of a regular expression can be any of the following: *Basic element *Example* * a single Matches anything containing the single character character to be matched. For example, a would match cause, bat, fan, *and *ant. You can also combine several characters together, in which case a match would be anything containing those characters in that combination. For instance, ball would match basketball, ballerina *and* ballroom followed by a Allows special characters to be used as a single single character. For example, "." has special meaning.character The only way you can use it to mean just"period" would be to precede it by . This is especially useful when describing paths, i.e. .html$ (anything ending in .html). Other characters that need to be preceded by if they are to be used without special meaning are the following: , . , $, *, ?, +, [, ], (, ), |. $ Matches anything t.hat ends with the value; i.e., cause$ would match cause *and* because. ^ Matches anything that begins with the value; i.e., ^couch would match couches *and* couch. . Matches any single character; i.e., cou*h would match couth, couch, *and* cough. range [ ] Matches a sequence of characters, which are enclosed in brackets "[ ]", i.e., [0-9] would match on any decimal digit. If the sequence is preceded by a caret (^), it matches any single character not from the sequence. For example, [^a-z] would match on anything that is not a letter of the alphabet. | | Joins multiple expressions