假设我有一个关键字列表和一个标记为部分的输入字符串,我希望我的代码在部分中找到这些关键字并将它们放在 [] 方括号内。然而,我的关键词有时会相互重叠。
keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"]
为了解决这个问题,我按长度对它们进行排序,以便优先考虑较长的关键字。但是,当我运行代码时,有时我会得到双括号或嵌套括号(我认为这是因为它仍然将它们检测为有效关键字)
我试过这个:
import re keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"] keywords.sort(key=len, reverse=True) section = "alpha alpha beta alpha beta charlie alpha beta charlie delta" section = section.replace("'", "’").replace("\"", "”") section_lines = section.split('\n') for i, line in enumerate(section_lines): if not line.startswith('#'): section_lines[i] = re.sub(r'-',' ',line) for x in range(4): section_lines[i] = re.sub(r'\b' + f"{keywords[x]}" + r'\b', f"[{keywords[x]}]", section_lines[i], flags=re.IGNORECASE) section_lines[i] = section_lines[i].replace("[[", "[").replace("]]", "]") section = '\n'.join(section_lines) section = section.replace(" "," ").replace(" "," ") print(section)
不要介意线路分割,它是为了处理多条线路的另一部分。
我想要:[alpha] [alpha beta] [alpha beta charlie] [alpha beta charlie delta]
但我得到的是: [alpha] [alpha] beta] [alpha] beta] charlie] [alpha] beta] charlie] delta]
您可以将排序后的关键字加入交替模式,而不是用不同的关键字替换字符串 4 次,每次都可能替换上一次迭代中的替换字符串:
import re keywords = ["alpha", "alpha beta", "alpha beta charlie", "alpha beta charlie delta"] keywords.sort(key=len, reverse=True) section = "alpha alpha beta alpha beta charlie alpha beta charlie delta" print(re.sub(rf"\b({'|'.join(keywords)})\b", r'[\1]', section))
这输出:
[alpha] [alpha beta] [alpha beta charlie] [alpha beta charlie delta]