Python-如何在string.replace中输入正则表达式？

一尘不染

Python-如何在string.replace中输入正则表达式？

python

我需要一些帮助来声明正则表达式。我的输入如下：

this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. 
and there are many other lines in the txt files
with<[3> such tags </[3>

所需的输出是：

this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. 
and there are many other lines in the txt files
with such tags

我已经试过了：

#!/usr/bin/python
import os, sys, re, glob
for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
    for line in reader: 
        line2 = line.replace('<[1> ', '')
        line = line2.replace('</[1> ', '')
        line2 = line.replace('<[1>', '')
        line = line2.replace('</[1>', '')

        print line

我也尝试过此方法（但似乎我使用了错误的regex语法）：

    line2 = line.replace('<[*> ', '')
    line = line2.replace('</[*> ', '')
    line2 = line.replace('<[*>', '')
    line = line2.replace('</[*>', '')

我不想replace从1到99 进行硬编码。。。

阅读 3709

2020-02-19

共1个答案

一尘不染

这个经过测试的代码段应该做到这一点：

import re
line = re.sub(r"</?\[\d+>", "", line)

编辑：这是一个注释的版本，说明其工作方式：

line = re.sub(r"""
  (?x) # Use free-spacing mode.
  <    # Match a literal '<'
  /?   # Optionally match a '/'
  \[   # Match a literal '['
  \d+  # Match one or more digits
  >    # Match a literal '>'
  """, "", line)

正则表达式很有趣！但我强烈建议你花一两个小时来学习基础知识。对于初学者，你需要了解哪些特殊字符：需要转义的“元字符”（即，前面加反斜杠-字符类的内外规则是不同的。）在以下位置有一个出色的在线教程：www .regular-expressions.info。你在那里度过的时间将使自己获得很多倍的回报。

2020-02-19