小能豆

Is there any way to have a literal "." in the name of a RegEx named capture group?

go

I’m doing some RegEx, which will run inside the OpenTelemetry Collector against log data, where I require named capture groups to pull out groups of data from the log.

I wanted to adhere to the OpenTelemetry Semantic Conventions around processes. These Semantic Conventions though require the use of “.” as a separator - for example “process.command”, “process.executable.name” etc.

This is where I’m having difficulty… RegEx101 doesn’t seem to like having a “.” in a named capture group, which is a reliable tool in my own experience so I’m inclined to believe it.

Is there a way to allow this?

Example RegEx:

^(?P<process.executable.name>[^\[\]]+)\[?(?P<process.pid>\d+)?\]?$

Example with dots (not working):

Dots Not Working

Note that escaping the dots with a “" doesn’t work - already tried it.

To demonstrate the RegEx itself does actually work without the breaking dots, using underscores instead of dots:

Underscores Working


阅读 75

收藏
2023-12-19

共1个答案

小能豆

In regular expressions, the use of dots (.) in named capture groups is not directly supported. The reason is that dots are typically reserved for specifying any character in regex patterns.

However, you can achieve the desired result by using non-capturing groups (?: ... ) instead of named capturing groups. This way, you can structure your regex to match the overall pattern but use non-capturing groups to represent the semantic conventions. Here’s an example based on your provided regex:

^(?:(?P<process>\S+)\.)?(?P<executable>\S+?)\[(?P<pid>\d+)?\]?$

In this regex:

  • (?: ... ) is a non-capturing group that groups the entire semantic part (e.g., process.).
  • (?P<process>\S+)\. is capturing the process part without the trailing dot.
  • (?P<executable>\S+?) is capturing the executable part.
  • (?P<pid>\d+)? is capturing the optional pid part.
  • The rest of the regex remains the same.

You can then use the captured groups process, executable, and pid in your logic.

Here’s how you might modify your regex to work with the OpenTelemetry Semantic Conventions:

^(?:(?P<process>\S+)\.)?(?P<executable>\S+?)\[(?P<pid>\d+)?\]?$

Feel free to adjust it based on your specific needs. This regex should work in capturing the required information while adhering to the OpenTelemetry Semantic Conventions.

2023-12-19