I’m doing some RegEx, which will run inside the OpenTelemetry Collector against log data, where I require named capture groups to pull out groups of data from the log.
I wanted to adhere to the OpenTelemetry Semantic Conventions around processes. These Semantic Conventions though require the use of “.” as a separator - for example “process.command”, “process.executable.name” etc.
This is where I’m having difficulty… RegEx101 doesn’t seem to like having a “.” in a named capture group, which is a reliable tool in my own experience so I’m inclined to believe it.
Is there a way to allow this?
Example RegEx:
^(?P<process.executable.name>[^\[\]]+)\[?(?P<process.pid>\d+)?\]?$
Example with dots (not working):
Note that escaping the dots with a “" doesn’t work - already tried it.
To demonstrate the RegEx itself does actually work without the breaking dots, using underscores instead of dots:
In regular expressions, the use of dots (.
) in named capture groups is not directly supported. The reason is that dots are typically reserved for specifying any character in regex patterns.
However, you can achieve the desired result by using non-capturing groups (?: ... )
instead of named capturing groups. This way, you can structure your regex to match the overall pattern but use non-capturing groups to represent the semantic conventions. Here’s an example based on your provided regex:
^(?:(?P<process>\S+)\.)?(?P<executable>\S+?)\[(?P<pid>\d+)?\]?$
In this regex:
(?: ... )
is a non-capturing group that groups the entire semantic part (e.g., process.
).(?P<process>\S+)\.
is capturing the process
part without the trailing dot.(?P<executable>\S+?)
is capturing the executable
part.(?P<pid>\d+)?
is capturing the optional pid
part.You can then use the captured groups process
, executable
, and pid
in your logic.
Here’s how you might modify your regex to work with the OpenTelemetry Semantic Conventions:
^(?:(?P<process>\S+)\.)?(?P<executable>\S+?)\[(?P<pid>\d+)?\]?$
Feel free to adjust it based on your specific needs. This regex should work in capturing the required information while adhering to the OpenTelemetry Semantic Conventions.