明天的筆記本‧Tomorrow notebook

正規表示式與python re筆記

正規表示式常用筆記：

正則表達式中有3種類型的括號
  方括號 「[「和花括號「{「 。
  方括號"["內是需要匹配的字符，花括號"{"內是指定匹配字符的數量。
  圓括號「(「 則是用來分組的。

  . 任一字完 「 . 」是不包含換行的任意字元，也就是「 . 」碰到 ＼n 就停了。要代表包含換行的任意字元可以寫 [＼s＼S] 
  * 任意長度
  : 表示正則式的開始
  $: 表示正則式的結束
  |: 或					 
  a?  零或一個 a（若要比對? 字元，請使用 ＼?）
  a+  一或多個 a（若要比對+ 字元，請使用 ＼+）
  a*	零或多個 a（若要比對* 字元，請使用 ＼*）
  a{4}/	四個 a
  a{5,10}/	五至十個 a
  a{5,}/	至少五個 a
  a{,3}/	至多三個 a
  a.{5}b/	a 和 b中間夾五個（非換行）字元

非貪婪 
    使用? 例如:
    ＼$\_SESSION['(.*?)]
    更多介紹:https://disp.cc/b/11-2q1S
		


===================常用範例====================
包含斷行的任一文字
    [＼s＼S]+

找html標籤
    找a標籤含斷行
        代碼：<a＼s+[＼s＼S]*?>([＼s＼S]*?)<＼/a＼s*>
        結果：<a href="#" target="_blank">xxxxxx含斷行xxxxx</a>

    找b標籤
        代碼：<b(?:>|＼s+([＼s＼S]*?)>)([＼sS]*?)<＼/b＼s*>
        結果：<b>xxx</b>
        


col-xs-offset-1　取代為　offset-xs-1
    正規式搜尋：＼s{0}col?＼-?(.＼S)?＼-offset?-(.＼D{0})?
    正規式取代：　offset-$1-$2
    說明：＼s{0}    非空白開始減一字元
	　col?     非貪婪(不想要中間有空白,就要一開始就加非貪婪?)
	　(.＼S)    不含空白任一字元
	　(.＼D{0}) 數字的任一字元並減一個字元

([＼d＼n＼s＼w＼"＼'＼`＼$＼,＼(＼)＼.]*)   找不含分號 ; 的字串
	sprintf＼(([＼d＼n＼s＼w＼"＼'＼`＼$＼,＼(＼)＼.]*)＼)＼;
	
時間格式：2019-01-02 03:04
	＼d{4}[＼/＼-](0?[1-9]|1[012])[＼/＼-](0?[1-9]|[12][0-9]|3[01])＼s+(1[012]|0?[1-9]){1}:(0?[1-5]|[0-6][0-9]){1}

Python 的 re 模組：

import re

=======================
範例1
text = 'https://news.google.com/?hl=zh-TW&gl=TW&ceid=TW:zh-Hant'
# re.search("要尋找的文字","要被取代的內文")
match_object = re.search(r"news.google.com/", text)
if match_object:
    
    print("比對成功")
else:
    print("比對失敗")

=======================
範例2
if re.match(regex, content) is not None:
  blah..


=======================
呼叫協助
help(re.compile(r''))

Python 的 re 模組的取代字串：

=======================
replace方法：不能用正規式
#取代字串：
  text1 = 'aaaabbbbcccc'#原字串
  text = text1.replace("bbbb", "---")#原字串.replace('尋找字串','取代字串')
  print(text)

=======================
re.sub方法：
取代字串：
  text1 = '：aaaa/bbbb/cccc/dddd/eeee/ffff'#原字串
  text2 = re.compile("bbbb.*dddd")#要尋找的字串
  text = text2.sub("----",text1)#原字串.replace('尋找字串','原字串')

=======================
re.match方法：
match_object = re.match(r'(w*) (w*)','hello world!')
print(match_object.expand(r'2 1'))
#-> 'world hello'

match_object.group([group1, ...]):回傳分組內的匹配內容
match_object = re.match(r'(w*) (w*)(?P<tt>.*)','hello world!!!') print(match_object.group())
#-> 'hello world!!!'
print(match_object.group(0))
#-> 'hello world!!!'
print(match_object.group(1))
#-> 'hello'
print(match_object.group(3))
#-> '!!!'
print(match_object.group('tt'))
#-> '!!!'

match_object.groups(default=None):將分組輸出為tuple
match_object = re.match(r'(w*) (w*)(?P<tt>.*)','hello world!!!') print(match_object.groups())
#-> ('hello', 'world', '!!!')

match_object.groupdict(default=None):將分組輸出為dict(需命名)
match_object = re.match(r'(w*) (w*)(?P<tt>.*)','hello world!!!') print(match_object.groupdict())
#-> {'tt': '!!!'}

match_object.start([group]):輸出第n分組的第一個字的索引
match_object = re.match(r'(d*) (d*)(?P<tt>.*)','012345 789!!') print(match_object.start())
#-> 0
print(match_object.start(2))
#-> 7

match_object.end([group]):輸出第n分組的最後一個字的索引
match_object = re.match(r'(d*) (d*)(?P<tt>.*)','012345 789!!')
print(match_object.end())
#-> 12
print(match_object.end(1))
#-> 6

match_object.span([group]):輸出(開始索引,結尾索引)
match_object = re.match(r'(d*) (d*)(?P<tt>.*)','012345 789!!')
print(match_object.span())
#-> (0, 12)
print(match_object.span(3))
#-> (10, 12)
print(match_object.span('tt'))
#-> (10, 12)

match_object.lastindex:回傳分組的最後一個索引直
match_object = re.search(r'.(d)(d)(d)(d)(a)','1234a1234a1234') print(match_object.lastindex)
#-> 5
match_object = re.search(r'.(d+)(a)','1234a1234a1234')
print(match_object.lastindex)
#-> 2
match_object.lastgroup:回傳分組(有命名)的最後一個名稱(key)
match_object = re.search(r'(?P<first>d*)(?P<second>a)','1234a1234a1234') print(match_object.lastgroup)
#-> second

match_object.string:回傳被匹配的字串
match_object = re.search(r'(?P<first>d*)(?P<second>a)','1234a1234a1234') print(match_object.string)
#-> 1234a1234a1234
print(match_object.group())
#-> 1234a
print(match_object.groups())
#-> ('1234', 'a')



===================python 常用範例====================
#驗證是否為數字
match_object = re.match(r'^([sd]+)$', text)
if match_object:            
    #print("比對成功")
else:
    #print("比對失敗")

re 常用的函數：

compile(pattern)-------------------------------pattern配對字串，回傳 re.compile() 物件。
search(pattern, string, flags=0)---------------pattern找尋第一個配對字串 ，沒有找到回傳 None 。
match(pattern, string, flags=0)----------------比對 pattern 是否與 string 的開頭相符，不相符就回傳 None 。
fullmatch(pattern, string, flags=0)------------完全比對 pattern 是否與 string 的開頭相符，不相符就回傳 None 。
split(pattern, string, maxsplit=0, flags=0)----將 string 以配對形式字串 pattern 拆解，結果回傳拆解後的串列。
findall(pattern, string, flags=0)--------------從 string 中找到所有的 pattern ，結果回傳所有 pattern 的串列。
finditer(pattern, string, flags=0)-------------從 string 中找到所有的 pattern ，結果回傳所有 pattern 的迭代器。
sub(pattern, repl, string, count=0, flags=0)---依據 pattern 及 repl 對 string 進行處理，結果回傳處理過的新字串。
subn(pattern, repl, string, count=0, flags=0)--依據 pattern 及 repl 對 string 進行處理，結果回傳處理過的序對。
escape(pattern)--------------------------------將 pattern 中的特殊字元加入反斜線，結果回傳新字串。
purge()----------------------------------------清除正規運算式的內部緩存。

好站連結：

維基百科