Split col by any delimiter and Uppercase values

小能豆

Split col by any delimiter and Uppercase values

I’m trying to split a column by the last ' - ' that is followed by all uppercase strings letters.

Below, I have a df with Value containing various combinations. I want to split the col into two individuals columns, whereby, everything before the last ' - ' and uppercase letters.

I’ve got Last column correct but not First column.

df = pd.DataFrame({
   'Value': [
        'Juan-Diva - HOLLS',
        'Carlos - George - ESTE BAN - BOM',
        'Javier Plain - Hotham Ham - ALPINE',
        'Yul - KONJ KOL MON'],
   })

option 1)

df[['First', 'l']] = df['Value'].str.split(' - ', n=1, expand=True)

df['Last'] = df['Value'].str.split('- ').str[-1]

option 2)

# Regular expression pattern
pattern = r'^(.*) - ([A-Z\s]+)$'

# Extract groups into two new columns
df[['First', 'Last']] = df['Value'].str.extract(pattern)

option 3)

df[["First", "Last"]] = df["Value"].str.rsplit(" - ", n=1, expand=True)

None of these options return the intended output.

intended output:

                       First            Last
0                  Juan-Diva           HOLLS
1            Carlos - George  ESTE BAN - BOM
2  Javier Plain - Hotham Ham          ALPINE
3                        Yul    KONJ KOL MON

阅读 67

2023-12-12

共1个答案

小能豆

You can achieve the desired output by using the rsplit method with a specified n parameter to control the number of splits. In this case, you want to perform only one split from the right side. Here’s how you can do it:

import pandas as pd

df = pd.DataFrame({
   'Value': [
        'Juan-Diva - HOLLS',
        'Carlos - George - ESTE BAN - BOM',
        'Javier Plain - Hotham Ham - ALPINE',
        'Yul - KONJ KOL MON'],
   })

df[['First', 'Last']] = df['Value'].str.rsplit(' - ', n=1, expand=True)

print(df)

This will give you the intended output:

                       Value                    First            Last
0          Juan-Diva - HOLLS                Juan-Diva           HOLLS
1  Carlos - George - ESTE BAN - BOM          Carlos - George  ESTE BAN - BOM
2  Javier Plain - Hotham Ham - ALPINE  Javier Plain - Hotham Ham          ALPINE
3                  Yul - KONJ KOL MON                        Yul    KONJ KOL MON

The key here is to use rsplit with n=1 to perform only one split from the right side. This way, you split the string into two parts at the last occurrence of ' - '.

2023-12-12