Python-正则表达式，用于检测＆循环的分号终止的C ++

一尘不染

Python-正则表达式，用于检测＆循环的分号终止的C ++

python

在我的Python应用程序中，我需要编写一个与以分号（）终止的C ++ for或while循环匹配的正则表达式;。例如，它应与此匹配：

for (int i = 0; i < 10; i++);

…但是不是这个：

for (int i = 0; i < 10; i++)

乍一看，这似乎很琐碎，直到你意识到左括号和右括号之间的文本可能包含其他括号，例如：

for (int i = funcA(); i < funcB(); i++);

我正在使用python.re模块。现在，我的正则表达式如下所示（我留了我的评论，以便你可以更轻松地理解它）：

# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\(  # match the initial opening parenthesis
    # Now make a named group 'balanced' which matches a balanced substring.
    (?P<balanced>
        # A balanced substring is either something that is not a parenthesis:
        [^()]
        | # …or a parenthesised string:
        \( # A parenthesised string begins with an opening parenthesis
            (?P=balanced)* # …followed by a sequence of balanced substrings
        \) # …and ends with a closing parenthesis
    )*  # Look for a sequence of balanced substrings
\)  # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*

这对于上述所有情况都非常适用，但是一旦你尝试使for循环的第三部分包含一个函数，它就会中断，如下所示：

for (int i = 0; i < 10; doSomethingTo(i));

我认为这是有问题的，因为只要你在开始和结束括号之间放置一些文本，“平衡”组就会匹配包含该文本的文本，因此该(?P=balanced)部分不再起作用，因为它不匹配（由于事实括号内的文本是不同的）。

在我的Python代码中，我使用VERBOSE和MULTILINE标志，并创建正则表达式，如下所示：

REGEX_STR = r"""# match any line that begins with a "for" or "while" statement:
^\s*(for|while)\s*
\(  # match the initial opening parenthesis
    # Now make a named group 'balanced' which matches
    # a balanced substring.
    (?P<balanced>
        # A balanced substring is either something that is not a parenthesis:
        [^()]
        | # …or a parenthesised string:
        \( # A parenthesised string begins with an opening parenthesis
            (?P=balanced)* # …followed by a sequence of balanced substrings
        \) # …and ends with a closing parenthesis
    )*  # Look for a sequence of balanced substrings
\)  # Finally, the outer closing parenthesis.
# must end with a semi-colon to match:
\s*;\s*"""

REGEX_OBJ = re.compile(REGEX_STR, re.MULTILINE| re.VERBOSE)

有人可以建议对此正则表达式进行改进吗？对于我来说，变得太复杂了以至于无法理解。

阅读 557

2020-02-22

共1个答案

一尘不染

你可以编写一个非常简单的例程来执行此操作，而无需使用正则表达式：

设置一个位置计数器pos，使其指向你for或之后的左方括号while。
将方括号计数器设置openBr为0。
现在继续递增pos，读取各个位置的字符，并openBr在看到方括号时增加，而在看到方括号时减少。这将在开始一次增加它，在“第一左括号for (”，递增和递减多一些对之间的一些支架，并将其设置回0当你for托架关闭。
所以，不要当openBr是0一次。

停止位置是的右括号for(...)。现在，你可以检查后面是否有分号。

2020-02-22