Python-如何在string.replace中输入正则表达式?
我需要一些帮助来声明正则表达式。我的输入如下:
this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files
with<[3> such tags </[3>
所需的输出是:
this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files
with such tags
我已经试过了:
#!/usr/bin/pythonimport os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
for line in reader:
line2 = line.replace('<[1> ', '')
line = line2.replace('</[1> ', '')
line2 = line.replace('<[1>', '')
line = line2.replace('</[1>', '')
print line
我也尝试过此方法(但似乎我使用了错误的regex语法):
line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '')
line2 = line.replace('<[*>', '')
line = line2.replace('</[*>', '')
我不想replace从1到99 进行硬编码。。。
回答:
这个经过测试的代码段应该做到这一点:
import reline = re.sub(r"</?\[\d+>", "", line)
编辑:这是一个注释的版本,说明其工作方式:
line = re.sub(r""" (?x) # Use free-spacing mode.
< # Match a literal '<'
/? # Optionally match a '/'
\[ # Match a literal '['
\d+ # Match one or more digits
> # Match a literal '>'
""", "", line)
正则表达式很有趣!但我强烈建议你花一两个小时来学习基础知识。对于初学者,你需要了解哪些特殊字符:需要转义的“元字符”(即,前面加反斜杠-字符类的内外规则是不同的。)在以下位置有一个出色的在线教程:www .regular-expressions.info。你在那里度过的时间将使自己获得很多倍的回报。
以上是 Python-如何在string.replace中输入正则表达式? 的全部内容, 来源链接: utcz.com/qa/436338.html