Regular Expressions are a powerful pattern matching language that is part ofmany modern programming languages. Regular Expressions allow you to apply a patternto an input string and return a list of the matches within the text. Regularexpressions also allow text to be replaced using replacement patterns. It is avery powerful version of find and replace.
There are two parts to learning Regular Expressions;
This article introduces you to the Regular Expression syntax. After learning thesyntax for Regular Expressions you can use it many different languages as thesyntax is fairly similar between languages.
Microsoft's .NET Framework contains a set of classes for working with RegularExpressions in the System.Text.RegularExpressions namespace.
When learning Regular Expressions, it helps to have a tool that you can use totest Regex patterns. Rad Software has a Free Regular Expression Tool available for download that will helpas you go through the article.
Regular Expressions are similar to find and replace in that ordinary charactersmatch themselves. If I want to match the word "went" the Regular Expressionpattern would be "went".
Text: Anna Jones and a friend went to lunchRegex: wentMatches: Anna Jones and a friend went to lunchwent
The following are special characters when working with Regular Expressions. Theywill be discussed throughout the article.
. $ ^ { [ ( | ) * + ? \The full stop or period character (.) is known as dot. It is awildcard that will match any character except a new line (\n). Forexample if I wanted to match the 'a' character followed by any two characters.
Text: abc def ant cowRegex: a..Matches: abc def ant cowabcant
If the Singleline option is enabled, a dot matches any characterincluding the new line character.
Backslash and a lowercase 'w' (\w) is a character class that willmatch any word character. The following Regular Expression matches 'a' followedby two word characters.
Text: abc anaconda ant cow appleRegex: a\w\wMatches: abc anaconda ant cow appleabcanaantapp
Backslash and an uppercase 'W' (\W) will match any non-wordcharacter.
White-space can be matched using \s (backslash and 's'). Thefollowing Regular Expression matches the letter 'a' followed by two wordcharacters then a white space character.
Text: "abc anaconda ant"Regex: a\w\w\sMatches:"abc "
Note that ant was not matched as it is not followed by a white spacecharacter.
White-space is defined as the space character, new line (\n), formfeed (\f), carriage return (\r), tab (\t)and vertical tab (\v). Be careful using \s as it can lead tounexpected behaviour by matching line breaks (\n and \r).Sometimes it is better to explicitly specify the characters to match instead ofusing \s. e.g. to match Tab and Space use [\t\0x0020]
The digits zero to nine can be matched using \d (backslash andlowercase 'd'). For example, the following Regular Expression matches any threedigits in a row.
Text: 123 12 843 8472Regex: \d\d\dMatches: 123 12 843 8472123843847
The square brackets are used to specify a set of single characters to match. Anysingle character within the set will match. For example, the following RegularExpression matches any three characters where the first character is either 'd'or 'a'.
Text: abc def ant cowRegex: [da]..Matches: abc def ant cowabcdefant
The caret (^) can be added to thestart of the set of characters to specify that none of the characters in thecharacter set should be matched. The following Regular Expression matches anythree character where the first character is not 'd' and not 'a'.
Text: abc def ant cowRegex: [^da]..Matches:"bc ""ef ""nt ""cow"
Ranges of characters can be matched using the hyphen (-). thefollowing Regular Expression matches any three characters where the secondcharacter is either 'a', 'b', 'c' or 'd'.
Text: abc pen nda umlRegex: .[a-d].Matches: abc pen nda umlabcnda
Ranges of characters can also be combined together. the following RegularExpression matches any of the characters from 'a' to 'z' or any digit from '0'to '9' followed by two word characters.
Text: abc no 0aa i8iRegex: [a-z0-9]\w\wMatches: abc no 0aa i8iabc0aai8i
The pattern could be written more simply as [a-z\d]
Quantifiers let you specify the number of times that an expression must match.The most frequently used quantifiers are the asterisk character (*)and the plus sign (+). Note that the asterisk (*) isusually called the star when talking about Regular Expressions.
The star tells the Regular Expression to match the character, group, orcharacter class that immediately precedes it zero or more times. Thismeans that the character, group, or character class is optional, it can bematched but it does not have to match. The following Regular Expression matchesthe character 'a' followed by zero or more word characters.
Text: Anna Jones and a friend owned an anacondaRegex: a\w*Options: IgnoreCaseMatches: Anna Jones and a friend owned an anacondaAnnaandaananaconda
The plus sign tells the Regular Expression to match the character, group, orcharacter class that immediately precedes it one or more times. Thismeans that the character, group, or character class must be found at least once.After it is found once it will be matched again if it follows the first match.The following Regular Expression matches the character 'a' followed by at leastone word character.
Text: Anna Jones and a friend owned an anacondaRegex: a\w+Options: IgnoreCaseMatches: Anna Jones and a friend owned an anacondaAnnaandananaconda
Note that "a" was not matched as it is not followed by any word characters.
To specify an optional match use the question mark (?). Thequestion mark matches zero or one times. The following Regular Expressionmatches the character 'a' followed by 'n' then optionally followed by another'n'.
Text: Anna Jones and a friend owned an anacondaRegex: an?Options: IgnoreCaseMatches: Anna Jones and a friend owned an anacondaAnaanaananaa
The minimum number of matches required for a character, group, or characterclass can be specified with the curly brackets ({n}). Thefollowing Regular Expression matches the character 'a' followed by a minimum oftwo 'n' characters. There must be two 'n' characters for a match to occur.
Text: Anna Jones and Anne owned an anacondaRegex: an{2}Options: IgnoreCaseMatches: Anna Jones and Anne owned an anacondaAnnAnnA range of matches can be specified by curly brackets with two numbers inside ({n,m}).The first number (n) is the minimum number of matches required, the second (m)is the maximum number of matches permitted. This Regular Expression matches thecharacter 'a' followed by a minimum of two 'n' characters and a maximum of three'n' characters.
Text: Anna and Anne lunched with an anaconda annnnnexRegex: an{2,3}Options: IgnoreCaseMatches: Anna and Anne lunched with an anaconda annnnnexAnnAnnannnThe Regex stops matching after the maximum number of matches has been found.
To specify that a match must occur at the beginning of a string use the caretcharacter (^). For example, I want a Regular Expression pattern tomatch the beginning of the string followed by the character 'a'.
Text: an anaconda ate Anna JonesRegex: ^aMatches: an anaconda ate Anna Jones"a" at position 1
The pattern above only matches the a in "an".
Note that the caret (^) has different behaviour when used insidethe square brackets.
If the Multiline option is on, the caret (^) will match thebeginning of each line in a multiline string rather than only the start of thestring.
To specify that a match must occur at the end of a string use the dollarcharacter ($). If the Multiline option is on then the pattern willmatch at the end of each line in a multiline string. This Regular Expressionpattern matches the word at the end of the line in a multiline string.
Text: "an anacondaate AnnaJones"Regex: \w+$Options: Multiline, IgnoreCaseMatches:Jones
Microsoft have an online reference for Regex in .NET: Regular Expression Syntax on MSDN
To learn more about Regular Expression syntax see the next article: C# Regular Expression (Regex) Examples in .NET
聯(lián)系客服