一尘不染

如何在Python中分割CamelCase

python

我想要达到的目标是这样的:

>>> camel_case_split("CamelCaseXYZ")
['Camel', 'Case', 'XYZ']
>>> camel_case_split("XYZCamelCase")
['XYZ', 'Camel', 'Case']

所以我搜索并找到了这个完美的正则表达式:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

作为下一个逻辑步骤,我尝试了:

>>> re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")
['CamelCaseXYZ']

为什么这不起作用,如何从python中的链接问题中获得结果?

编辑:解决方案摘要

我用一些测试用例测试了所有提供的解决方案:

string:                 ''
AplusKminus:            ['']
casimir_et_hippolyte:   []
two_hundred_success:    []
kalefranz:              string index out of range # with modification: either [] or ['']

string:                 ' '
AplusKminus:            [' ']
casimir_et_hippolyte:   []
two_hundred_success:    [' ']
kalefranz:              [' ']

string:                 'lower'
all algorithms:         ['lower']

string:                 'UPPER'
all algorithms:         ['UPPER']

string:                 'Initial'
all algorithms:         ['Initial']

string:                 'dromedaryCase'
AplusKminus:            ['dromedary', 'Case']
casimir_et_hippolyte:   ['dromedary', 'Case']
two_hundred_success:    ['dromedary', 'Case']
kalefranz:              ['Dromedary', 'Case'] # with modification: ['dromedary', 'Case']

string:                 'CamelCase'
all algorithms:         ['Camel', 'Case']

string:                 'ABCWordDEF'
AplusKminus:            ['ABC', 'Word', 'DEF']
casimir_et_hippolyte:   ['ABC', 'Word', 'DEF']
two_hundred_success:    ['ABC', 'Word', 'DEF']
kalefranz:              ['ABCWord', 'DEF']

总而言之,您可以说@kalefranz的解决方案与问题不符(请参阅最后一种情况),而@casimir et
hippolyte的解决方案占用了一个空格,因此违反了拆分不应更改各个部分的想法。其余两个替代方案之间的唯一区别是,我的解决方案返回一个在空字符串输入中包含空字符串的列表,而@
200_success的解决方案返回一个空列表。我不知道python社区在这个问题上的立场,所以我说:我对任何一个都很好。而且由于200_success的解决方案更简单,所以我接受了它作为正确的答案。


阅读 188

收藏
2020-12-20

共1个答案

一尘不染

正如@AplusKminus解释的那样,re.split()切勿在空模式匹配上拆分。因此,您应该尝试查找感兴趣的组件,而不是拆分。

Here is a solution using re.finditer() that emulates splitting:

def camel_case_split(identifier):
    matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
    return [m.group(0) for m in matches]
2020-12-20