91黄色视频在线播放_ 處理正則表達式的java包:regexp

處理正則表達式的java包:regexp

　　雖然apache認為JakartaORO是一個(gè)更完備的正則表達式處理包，但regexp的應用也是非常廣泛，大概是因為它的簡(jiǎn)單吧。下面是regexp的學(xué)習筆記。

1、下載安裝

下載源碼

cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic login
password: anoncvs
cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic checkout jakarta-regexp

或下載編譯好的包

wget http://apache.linuxforum.net/dist/jakarta/regexp/binaries/jakarta-regexp-1.3.tar.gz

2、基本情況

1)Regexp是一個(gè)由100%純java正則式處理包，是Jonathan Locke捐給Apache軟件基金會(huì )的。他最初開(kāi)發(fā)這個(gè)軟件是在1996年，在時(shí)間的考驗面前RegExp表達非常堅挺:)。它包括完整的Javadoc文檔，以及一個(gè)簡(jiǎn)單的Applet來(lái)做可視化調試和兼容性測試.

2)RE類(lèi)regexp包中非常重要的一個(gè)類(lèi)，它是一個(gè)高效的、輕量級的正則式計算器/匹配器的類(lèi)，RE是regularexpression的縮寫(xiě)。正則式是能夠進(jìn)行復雜的字符串匹配的模板，而且當一個(gè)字符串能匹配某個(gè)模板時(shí)，你可以抽取出那些匹配的部分，這在進(jìn)行文本解析時(shí)非常有用。下面討論一下正則式的語(yǔ)法。
　　為了編譯一個(gè)正則式，你需要簡(jiǎn)單地以模板為參數構造一個(gè)RE匹配器對象來(lái)完成，然后就可調用任一個(gè)RE.match方法來(lái)對一個(gè)字符串進(jìn)行匹配檢查，如果匹配成功/失敗，則返回真/假值。例如：

RE r = new RE("a*b");
boolean matched = r.match("aaaab");

　　RE.getParen可以取回匹配的字符序列，或者匹配的字符序列的某一部分（如果模板中有相應的括號的話(huà)），以及它們的位置、長(cháng)度等屬性。如：

RE r = new RE("(a*)b"); // Compile expression
boolean matched = r.match("xaaaab"); // Match against "xaaaab"

String wholeExpr = r.getParen(0); // wholeExpr will be ‘a(chǎn)aaab‘
String insideParens = r.getParen(1); // insideParens will be ‘a(chǎn)aaa‘

int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1
int endWholeExpr = r.getParenEnd(0); // endWholeExpr will be index 6
int lenWholeExpr = r.getParenLength(0); // lenWholeExpr will be 5

int startInside = r.getParenStart(1); // startInside will be index 1
int endInside = r.getParenEnd(1); // endInside will be index 5
int lenInside = r.getParenLength(1); // lenInside will be 4

　　RE支持正則式的后向引用，如：

([0-9]+)=\1
匹配 n=n (象 0=0 or 2=2)這樣的字符串

3)RE支持的正則式的語(yǔ)法如下：
字符

unicodeChar	Matches any identical unicode character
\	Used to quote a meta-character (like ‘*‘)
\\	Matches a single ‘\‘ character
\0nnn	Matches a given octal character
\xhh	Matches a given 8-bit hexadecimal character
\\uhhhh	Matches a given 16-bit hexadecimal character
\t	Matches an ASCII tab character
\n	Matches an ASCII newline character
\r	Matches an ASCII return character
\f	Matches an ASCII form feed character

字符集

[abc]	簡(jiǎn)單字符集
[a-zA-Z]	帶區間的字符集
[^abc]	字符集的否定

標準POSIX 字符集

[:alnum:]	Alphanumeric characters.
[:alpha:]	Alphabetic characters.
[:blank:]	Space and tab characters.
[:cntrl:]	Control characters.
[:digit:]	Numeric characters.
[:graph:]	Characters that are printable and are also visible.(A space is printable, but not visible, while an `a‘ is both.)
[:lower:]	Lower-case alphabetic characters.
[:print:]	Printable characters (characters that are not control characters.)
[:punct:]	Punctuation characters (characters that are not letter,digits, control characters, or space characters).
[:space:]	Space characters (such as space, tab, and formfeed, to name a few).
[:upper:]	Upper-case alphabetic characters.
[:xdigit:]	Characters that are hexadecimal digits.

非標準的 POSIX樣式的字符集

[:javastart:]	Start of a Java identifier
[:javapart:]	Part of a Java identifier

預定義的字符集

.	Matches any character other than newline
\w	Matches a "word" character (alphanumeric plus "_")
\W	Matches a non-word character
\s	Matches a whitespace character
\S	Matches a non-whitespace character
\d	Matches a digit character
\D	Matches a non-digit character

邊界匹配符

^	Matches only at the beginning of a line
$	Matches only at the end of a line
\b	Matches only at a word boundary
\B	Matches only at a non-word boundary

貪婪匹配限定符

A*	Matches A 0 or more times (greedy)
A+	Matches A 1 or more times (greedy)
A?	Matches A 1 or 0 times (greedy)
A{n}	Matches A exactly n times (greedy)
A{n,}	Matches A at least n times (greedy)

非貪婪匹配限定符

A*?	Matches A 0 or more times (reluctant)
A+?	Matches A 1 or more times (reluctant)
A??	Matches A 0 or 1 times (reluctant)

邏輯運算符

AB	Matches A followed by B
A\|B	Matches either A or B
(A)	Used for subexpression grouping
(?:A)	Used for subexpression clustering (just like grouping but no backrefs)

后向引用符

\1	Backreference to 1st parenthesized subexpression
\2	Backreference to 2nd parenthesized subexpression
\3	Backreference to 3rd parenthesized subexpression
\4	Backreference to 4th parenthesized subexpression
\5	Backreference to 5th parenthesized subexpression
\6	Backreference to 6th parenthesized subexpression
\7	Backreference to 7th parenthesized subexpression
\8	Backreference to 8th parenthesized subexpression
\9	Backreference to 9th parenthesized subexpression

RE運行的程序先經(jīng)過(guò)RECompiler類(lèi)的編譯. 由于效率的原因，RE匹配器沒(méi)有包括正則式的編譯類(lèi). 實(shí)際上，如果要預編譯1個(gè)或多個(gè)正則式，可以通過(guò)命令行運行‘recompile‘類(lèi),如

java org.apache.regexp.recompile a*b

則產(chǎn)生類(lèi)似下面的編譯輸出（最后一行不是）：

// Pre-compiled regular expression "a*b"
char[] re1Instructions =
{
0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000d, 0x0041,
0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047,
0x0000, 0xfff6, 0x007c, 0x0000, 0x0003, 0x004e, 0x0000,
0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000,
0x0000,
};
REProgram re1 = new REProgram(re1Instructions);
RE r = new RE(re1);

　　通過(guò)利用預編譯的req來(lái)構建RE匹配器對象，可以避免運行時(shí)進(jìn)行編譯的成本。如果需要動(dòng)態(tài)的構造正則式，則可以創(chuàng )建單獨一個(gè)RECompiler對象，并利用它來(lái)編譯每個(gè)正則式。注意，RE 和 RECompiler都不是threadsafe的（出于效率的原因）, 因此當多線(xiàn)程運行時(shí)，你需要為每個(gè)線(xiàn)程分別創(chuàng )建編譯器和匹配器。

3、例程

1)regexp包中帶有一個(gè)applet寫(xiě)的小程序，運行如下:

java org.apache.regexp.REDemo

2)Jeffer Hunter寫(xiě)了一個(gè)例程，可以下載。
3)regexp自帶的測試例程，也很有參考價(jià)值。它把所有正則式及相關(guān)的字符串以及結果都放在一個(gè)單獨的文件里，在$REGEXPHOME/docs/RETest.txt中。當然，這個(gè)例程的運行也要在$REGEXPHOME目錄下。

cd $REGEXPHOME
java org.apache.regexp.RETest

參考資料
1、 Jeffrey Hunter‘s README_regular_expressions.txt |
http://www.idevelopment.info/topics/topics.cgi?LEVEL=programming

2、The Jakarta Site – CVS Repository
http://jakarta.apache.org/site/cvsindex.html

本站僅提供存儲服務(wù)，所有內容均由用戶(hù)發(fā)布，如發(fā)現有害或侵權內容，請點(diǎn)擊舉報。

欧美性猛交XXXX免费看蜜桃,成人网18免费韩国,亚洲国产成人精品区综合,欧美日韩一区二区三区高清不卡,亚洲综合一区二区精品久久

處理正則表達式的java包:regexp

1、下載安裝

2、基本情況

3、例程

1、下載安裝

2、基本情況

3、例程