小能豆

有没有办法处理Python中重叠的关键字括号?

python

假设我有一个关键字列表和一个标记为部分的输入字符串,我希望我的代码在部分中找到这些关键字并将它们放在 [] 方括号内。然而,我的关键词有时会相互重叠。

keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"]

为了解决这个问题,我按长度对它们进行排序,以便优先考虑较长的关键字。但是,当我运行代码时,有时我会得到双括号或嵌套括号(我认为这是因为它仍然将它们检测为有效关键字)

我试过这个:

import re

keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"]

keywords.sort(key=len, reverse=True)

section = "alpha alpha beta alpha beta charlie alpha beta charlie delta"
section = section.replace("'", "’").replace("\"", "”")
section_lines = section.split('\n')
for i, line in enumerate(section_lines):
    if not line.startswith('#'):
        section_lines[i] = re.sub(r'-',' ',line)
        for x in range(4):
            section_lines[i] = re.sub(r'\b' + f"{keywords[x]}" + r'\b', f"[{keywords[x]}]", section_lines[i], flags=re.IGNORECASE)
            section_lines[i] = section_lines[i].replace("[[", "[").replace("]]", "]")

section = '\n'.join(section_lines)
section = section.replace("   "," ").replace("  "," ")

print(section)

不要介意线路分割,它是为了处理多条线路的另一部分。

我想要:[alpha] [alpha beta] [alpha beta charlie] [alpha beta charlie delta]

但我得到的是: [alpha] [alpha] beta] [alpha] beta] charlie] [alpha] beta] charlie] delta]


阅读 46

收藏
2023-11-16

共1个答案

小能豆

您可以将排序后的关键字加入交替模式,而不是用不同的关键字替换字符串 4 次,每次都可能替换上一次迭代中的替换字符串:

import re

keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"]
keywords.sort(key=len, reverse=True)
section = "alpha alpha beta alpha beta charlie alpha beta charlie delta"
print(re.sub(rf"\b({'|'.join(keywords)})\b", r'[\1]', section))

这输出:

[alpha] [alpha beta] [alpha beta charlie] [alpha beta charlie delta]
2023-11-16