小能豆

Merge columns inside 2d numpy array

py

I have a 2D numpy array from an Orange data table. How to merge the text from three columns into one column? All the solutions I have seen so far are about merging multiple arrays, but not about merging inside a single array.

My current attempt:

import numpy as np

out_data = np.hstack(in_data[:, 'ContextBefore'], in_data[:, 'Hit'], in_data[:, 'ContextAfter'])

print(out_data[:10])

It fails with “TypeError: hstack() takes 1 positional argument but 3 were given”.

Input data is a CSV file with the first rows being this:

"No.","Date","Genre","Bibl","URL","ContextBefore","Hit","ContextAfter"

"1","2018-01-18","Zeitung","Die Zeit, 18.01.2018, Nr. 01","http://www.zeit.de/2018/01/opernsaenger-erfolg-musikgeschaeft-karriere","Nun habe ich sie wiedergetroffen, in Leipzig, Karlsruhe, Paris und Berlin.","Es ist eine Reise rund um die Frage, wovon Erfolg abhängt.","Da war Benedikt Zeitner, Bariton, der so von sich überzeugt war, dass wir uns manchmal fragten, ob er je eine Träne vergoss."

"2","2018-01-18","Zeitung","Die Zeit, 18.01.2018, Nr. 01","http://www.zeit.de/2018/01/opernsaenger-erfolg-musikgeschaeft-karriere","Er lässt einen nicht los.","Und ich, auf dieser Reise, die auch eine Reise in meine eigene Vergangenheit ist, denke:","Es fehlt mir auch."

Parsing the CSV takes place in Orange with the “CSV file import” module and the Python code I’m writing resides in a “Python script” module. The last three columns with text should be merged.


阅读 60

收藏
2023-12-16

共1个答案

小能豆

You can use the numpy.core.defchararray.add function to concatenate strings column-wise. Here’s how you can modify your code:

import numpy as np

# Assuming in_data is your 2D numpy array
# Replace 'ContextBefore', 'Hit', 'ContextAfter' with the actual column indices if needed
out_data = np.core.defchararray.add(
    np.core.defchararray.add(in_data[:, 'ContextBefore'], in_data[:, 'Hit']),
    in_data[:, 'ContextAfter']
)

# Print the first 10 elements
print(out_data[:10])

This code uses np.core.defchararray.add to concatenate the strings from the specified columns into a single column (out_data). Adjust the column names or indices accordingly based on your actual data.

2023-12-16