I have a 2D numpy array from an Orange data table. How to merge the text from three columns into one column? All the solutions I have seen so far are about merging multiple arrays, but not about merging inside a single array.
My current attempt:
import numpy as np out_data = np.hstack(in_data[:, 'ContextBefore'], in_data[:, 'Hit'], in_data[:, 'ContextAfter']) print(out_data[:10])
It fails with “TypeError: hstack() takes 1 positional argument but 3 were given”.
Input data is a CSV file with the first rows being this:
"No.","Date","Genre","Bibl","URL","ContextBefore","Hit","ContextAfter" "1","2018-01-18","Zeitung","Die Zeit, 18.01.2018, Nr. 01","http://www.zeit.de/2018/01/opernsaenger-erfolg-musikgeschaeft-karriere","Nun habe ich sie wiedergetroffen, in Leipzig, Karlsruhe, Paris und Berlin.","Es ist eine Reise rund um die Frage, wovon Erfolg abhängt.","Da war Benedikt Zeitner, Bariton, der so von sich überzeugt war, dass wir uns manchmal fragten, ob er je eine Träne vergoss." "2","2018-01-18","Zeitung","Die Zeit, 18.01.2018, Nr. 01","http://www.zeit.de/2018/01/opernsaenger-erfolg-musikgeschaeft-karriere","Er lässt einen nicht los.","Und ich, auf dieser Reise, die auch eine Reise in meine eigene Vergangenheit ist, denke:","Es fehlt mir auch."
Parsing the CSV takes place in Orange with the “CSV file import” module and the Python code I’m writing resides in a “Python script” module. The last three columns with text should be merged.
You can use the numpy.core.defchararray.add function to concatenate strings column-wise. Here’s how you can modify your code:
numpy.core.defchararray.add
import numpy as np # Assuming in_data is your 2D numpy array # Replace 'ContextBefore', 'Hit', 'ContextAfter' with the actual column indices if needed out_data = np.core.defchararray.add( np.core.defchararray.add(in_data[:, 'ContextBefore'], in_data[:, 'Hit']), in_data[:, 'ContextAfter'] ) # Print the first 10 elements print(out_data[:10])
This code uses np.core.defchararray.add to concatenate the strings from the specified columns into a single column (out_data). Adjust the column names or indices accordingly based on your actual data.
np.core.defchararray.add
out_data