I’m doing some RegEx, which will run inside the OpenTelemetry Collector against log data, where I require named capture groups to pull out groups of data from the log.
I wanted to adhere to the OpenTelemetry Semantic Conventions around processes. These Semantic Conventions though require the use of “.” as a separator - for example “process.command”, “process.executable.name” etc.
This is where I’m having difficulty… RegEx101 doesn’t seem to like having a “.” in a named capture group, which is a reliable tool in my own experience so I’m inclined to believe it.
Is there a way to allow this?
Example RegEx:
^(?P<process.executable.name>[^\[\]]+)\[?(?P<process.pid>\d+)?\]?$
Example with dots (not working):
Note that escaping the dots with a “" doesn’t work - already tried it.
To demonstrate the RegEx itself does actually work without the breaking dots, using underscores instead of dots:
In regular expressions, the use of dots (.) in named capture groups is not directly supported. The reason is that dots are typically reserved for specifying any character in regex patterns.
.
However, you can achieve the desired result by using non-capturing groups (?: ... ) instead of named capturing groups. This way, you can structure your regex to match the overall pattern but use non-capturing groups to represent the semantic conventions. Here’s an example based on your provided regex:
(?: ... )
^(?:(?P<process>\S+)\.)?(?P<executable>\S+?)\[(?P<pid>\d+)?\]?$
In this regex:
process.
(?P<process>\S+)\.
process
(?P<executable>\S+?)
executable
(?P<pid>\d+)?
pid
You can then use the captured groups process, executable, and pid in your logic.
Here’s how you might modify your regex to work with the OpenTelemetry Semantic Conventions:
Feel free to adjust it based on your specific needs. This regex should work in capturing the required information while adhering to the OpenTelemetry Semantic Conventions.