一个pandas DataFrame列duration包含timedelta64[ns]如下所示。如何将它们转换为秒?
duration
timedelta64[ns]
0 00:20:32 1 00:23:10 2 00:24:55 3 00:13:17 4 00:18:52 Name: duration, dtype: timedelta64[ns]
我尝试了以下
print df[:5]['duration'] / np.timedelta64(1, 's')
但是得到了错误
Traceback (most recent call last): File "test.py", line 16, in <module> print df[0:5]['duration'] / np.timedelta64(1, 's') File "C:\Python27\lib\site-packages\pandas\core\series.py", line 130, in wrapper "addition and subtraction, but the operator [%s] was passed" % name) TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
也尝试过
print df[:5]['duration'].astype('timedelta64[s]')
但收到错误
Traceback (most recent call last): File "test.py", line 17, in <module> print df[:5]['duration'].astype('timedelta64[s]') File "C:\Python27\lib\site-packages\pandas\core\series.py", line 934, in astype values = com._astype_nansafe(self.values, dtype) File "C:\Python27\lib\site-packages\pandas\core\common.py", line 1653, in _astype_nansafe raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype)) TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
在当前版本的Pandas(版本0.14)中,这可以正常工作:
In [132]: df[:5]['duration'] / np.timedelta64(1, 's') Out[132]: 0 1232 1 1390 2 1495 3 797 4 1132 Name: duration, dtype: float64
这是较旧版本的Pandas / NumPy的解决方法:
In [131]: df[:5]['duration'].values.view('<i8')/10**9 Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64和datetime64数据在内部存储为8字节整数(dtype '<i8')。因此,以上将timedelta64s视为8字节整数,然后进行整数除法将纳秒转换为秒。
'<i8'
请注意,您需要NumPy 1.7或更高版本才能使用datetime64 / timedelta64s。