python之正则表达式

Z时代
2024-01-10
分类：综合

python之正则表达式[Python基础]

1,什么是正则表达式？

正则表达式（regular expression）是用来简洁表达一组字符串的表达式。

2,作用是什么？

①表达文本类型的特征。 ②同时查找或替换一组字符串。 ③匹配字符串的全部或部分。

3,常用的操作符：

操作符

说明

例子

表示任何单个字符

[]

字符集，对单个字符给出取值范围

[abc]表示a,b,c,[a-z]表示a-z单个字符

[^]

非字符集，对单个字符给出排除范围

[^abc]表示除a,b,c之外的单个字符

前一个字符0次或无限次扩展

abc*表示ab,abc,abcc,abccc等等

前一个字符1次或无限次扩展

abc+表示abc,abcc,abccc等等

？

前一个字符出现或者不出现

abc表示ab,abc

左右表达式任意一个

abc|def表示abc,def

{m}

扩展前一个字符m次

ab{4}c表示abbbbc

{m,n}

扩展前一个字符m到n次，含m,n

ab{1,2}c表示abc,abbc

匹配字符串开头

^abc表示abc且在字符串的开头

匹配字符串结尾

abc$表示abc且在字符串的结尾

()

分组标记，内部只能使用 | 操作符

（abc）表示abc，（abc | def）表示abe、def

数字，等价于[0,9]

单词字符，等价于[A-Za-z0-9_]

4,正则表达式的一些语法实例

正则表达式

对应的字符串

P(Y|YT|YTH|YTHO)?N

"PN","PYN","PYTN","PYTHN","PYTHON"

PYTHON+

"PYTHON","PYTHONN","PYTHONNN".......

PY[TH]ON

"PYTON","PYHON"

PY[^TH]?ON

"PYON","PYAON","PYBON","PYCON"......

PY{:3}N

"PN","PYN","PYYN","PYYYN"

5,经典的正则表达式实例

^[A-Za-z]+$

由26个字母组成的字符串

^[A-Za-z0-9]+$

由26个字母和数字组成的字符串

^-?d+$

整数形式的字符串

^[0-9]*[1-9][0-9]*$

正整数形式的字符串

[1-9]d{5}

中国境内的邮政编码

[u4e00-u9fa5]

匹配中文字符

d{3}-d{8}|d{4}-d{7}

国内的电话号码，010-12345678

[1-9]?d

0-99

1d{2}

100-199

2[0-4]d

200-249

25[0-5]

250-255

(([1-9]?d|1d{2}|2[0-4]d|25[0-5]).){3}([1-9]?d|1d{2}|2[0-4]d|25[0-5])

匹配ip地址

6,re库的基本使用

re库的主要功能函数

re.search()

在一个字符串中搜索匹配正则表达式的第一个位置，返回match对象

re.match()

从一个字符串的开始位置起匹配正则表达式，返回match对象

re.findall()

搜索字符串，以列表类型返回全部能匹配的子串

re.split()

将一个字符串按照正则表达式匹配结果进行分割，返回列表类型

re.finditer()

搜索字符串，返回一个匹配结果的迭代类型，每个迭代元素是match对象

re.sub()

在一个字符串中替换所有匹配正则表达式的子串，返回替换后的字符串

①search(pattern, string, flags=0)

pattern：正则表达式的字符串或原生字符串表示
string：待匹配字符串
flags：正则表达式使用时的控制标记

1import re
2 match = re.search(r"[1-9]d{5}", "haha 723300")
3if match:
4print(match.group())
5
6 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
7 723300
8
9 Process finished with exit code 0

②match(pattern,string,flags=0)

需要注意的是 match 函数是从字符串开始处开始查找，如果开始处不匹配，则不再继续寻找，若找到返回值为一个 match 对象，找不到时返回 None

 1import re
 2 match = re.match(r"[1-9]d{5}", "haha 723300")
 3print(type(match))
 4 match = re.match(r"[1-9]d{5}", "723300 haha")
 5if match:
 6print(match.group())
 7
 8 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
 9 <class"NoneType">
10 723300
11
12 Process finished with exit code 0

match

可见search与match的区别在于：
match要求待匹配的子串必须在字符串的起始位置，否则查找不到，而search则无此要求

③findall（pattern，string，flags=0）

 1import re
 2 c = re.findall(r"[1-9]d{5}", "haha723300 xixi612203")
 3print(type(c))
 4print(c)
 5
 6 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
 7 <class"list">
 8 ["723300", "612203"]
 9
10 Process finished with exit code 0

findall

④split(pattern，string，maxsplit=0，flags=0)

maxsplit：最大分割数，剩余部分作为最后一个元素输出

 1import re
 2 a = re.split(r"[1-9]d{5}", "haha723300 xixi612203")
 3print(type(a))
 4print(a)
 5
 6 a = re.split(r"[1-9]d{5}", "haha723300 xixi612203", maxsplit=1)
 7print(a)
 8
 9 str1 = "name: hpl, age: 18"
10 b = re.split(r":|,", str1)
11print(b)
12
13
14 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
15 <class"list">
16 ["haha", " xixi", ""]
17 ["haha", " xixi612203"]
18 ["name", " hpl", " age", " 18"]
19
20 Process finished with exit code 0

split

⑤finditer(pattern，string，flags=0)

 1import re
 2for m in re.finditer(r"[1-9]d{5}", "haha723300 xixi612203"):
 3if m:
 4print(m.group())
 5
 6 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
 7 723300
 8 612203
 9
10 Process finished with exit code 0

finditer

⑥sub(pattern，repl，string，count=0，flags=0)

repl：替换匹配字符串的字符串
count：匹配的最大替换次数

1import re
2 m = re.sub(r"[1-9]d{5}", "love", "haha723300 xixi612203")
3if m:
4print(m)
5
6 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
7hahalove xixilove
8
9 Process finished with exit code 0

sub

7,re库的match对象

属性：
string 待匹配的文本
re 匹配时使用的pattern对象（正则表达式）
pos 正则表达式搜索文本的开始位置
endpos 正则表达式搜索文本的结束位置

方法：
group() 获得匹配后的字符串
start() 匹配字符串在原始字符串的开始位置
end() 匹配字符串在原始字符串的结束位置
span() 返回（start）…（end）

 1import re
 2 match = re.search(r"[1-9]d{5}", "haha723300 xixi612203")
 3print(match.string)
 4print(match.re)
 5print(match.pos)
 6print(match.endpos)
 7print(match.group())
 8print(match.start())
 9print(match.end())
10print(match.span())
11
12 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
13haha723300 xixi612203
14 re.compile("[1-9]d{5}")
150
16 21
17 723300
18 4
19 10
20 (4, 10)
21
22 Process finished with exit code 0

re库的match对象

8,re库的贪婪匹配和最小匹配

①re库默认采用贪婪匹配，即输出匹配最长的子串

1import re
2 match = re.search(r"PY.*N","PYANBNCNDN")
3print(match.group())
4
5 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
6PYANBNCNDN
7
8 Process finished with exit code 0

贪婪匹配

②最小匹配的方法：在扩展操作符后加？

最小匹配操作符

操作符

说明

*？

前一个字符0次或无限次扩展,最小匹配

+？

前一个字符1次或无限次扩展,最小匹配

？？

前一个字符0次或1次扩展，最小匹配

[m,n]?

扩展前一个字符m至n次(含n),最小匹配

1import re
2 match = re.search(r"PY.*?N","PYANBNCNDN")
3print(match.group())
4
5 G:Project1venvScriptspython.exe G:/Project1/practice/lianxi2.py
6PYAN
7
8 Process finished with exit code 0

最小匹配

以上是 python之正则表达式的全部内容，来源链接： utcz.com/z/538027.html

python之正则表达式

7,re库的match对象

8,re库的贪婪匹配和最小匹配

其他人也看了：