小能豆

Python regex pipe doesn't seem to work. What's wrong and how do I fix it?

py

I think I’m missing something obvious, and I’m hoping that a second pair of eyes will see it.

I want to check that a particular value is NOT a string that looks like a time on a 12-hour clock, so I’m trying to check for the presence of the character strings “AM” or “PM” in the string. Here’s a simplification of the regex with which I’m working - print(re.match('.*AM|PM.*', '1:00 PM')).

When I run it I get the returned value None. That’s definitely not what I’m trying to get.

I’ve tried re.match('[AM|PM]', 'PM') and re.match('AM|PM$', 'PM'). Those return None too.

I can get a match with re.match('AM|PM', 'PM'). If I use re.match('.*AM|PM.*', '11:00 AM') then it returns a match and everything is fine. Similarly, re.match('.*PM|AM.*', '2:00 PM') returns a match.

What do I have to do to get the “OR” section of my regex to work so that the first thing in this question will match?


阅读 72

收藏
2023-12-23

共1个答案

小能豆

In regular expressions, the | (pipe) character has a lower precedence than the concatenation, so the pattern .*AM|PM.* is interpreted as .*(AM) | (PM).* rather than .*(AM|PM).* as you intend.

To fix this, you should use parentheses to explicitly indicate the scope of the alternation. Here’s the corrected regex:

import re

result = re.match('.*(AM|PM).*', '1:00 PM')
print(result.group(1) if result else "No match")

In this regex, .*(AM|PM).*, the parentheses (AM|PM) form a capturing group that includes either “AM” or “PM”. This way, the alternation is applied only to “AM” and “PM” as a whole.

Now, when you run the code with '1:00 PM', it should match correctly, and result.group(1) will give you the matched “AM” or “PM”. If there’s no match, result will be None.

2023-12-23