我有两个numpy数组light_points和time_points,想对这些数据使用一些时间序列分析方法。
然后我尝试了这个:
import statsmodels.api as sm import pandas as pd tdf = pd.DataFrame({'time':time_points[:]}) rdf = pd.DataFrame({'light':light_points[:]}) rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light)) #rdf.index = pd.DatetimeIndex(tdf['time'])
这有效,但没有做正确的事。确实,测量值不是均匀地间隔开的,如果我只是将time_points pandas DataFrame声明为帧的索引,则会出现错误:
rdf.index = pd.DatetimeIndex(tdf['time']) decomp = sm.tsa.seasonal_decompose(rdf) elif freq is None: raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index") ValueError: You must specify a freq or x must be a pandas object with a timeseries index
我不知道该如何纠正。另外,似乎TimeSeries不建议使用大熊猫。
TimeSeries
我尝试了这个:
rdf = pd.Series({'light':light_points[:]}) rdf.index = pd.DatetimeIndex(tdf['time'])
但这给了我长度上的不匹配:
ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements
但是,我不明白它的来源,因为rdf [‘light’]和tdf [‘time’]的长度相同…
最终,我尝试将rdf定义为pandas系列:
rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))
我得到这个:
ValueError: You must specify a freq or x must be a pandas object with a timeseries index
然后,我尝试改为用
pd.TimeSeries(time_points[:])
它给我在season_decompose方法行上的错误:
AttributeError: 'Float64Index' object has no attribute 'inferred_freq'
如何处理空间不均匀的数据?我当时正在考虑通过在现有值之间添加许多未知值并使用插值法“评估”这些点来创建一个近似均匀间隔的时间数组,但是我认为可以找到一种更干净,更轻松的解决方案。
seasonal_decompose()要求freq是作为DateTimeIndex元信息的一部分提供的,可以pandas.Index.inferred_freq由用户推断,也可以由用户推断为,integer它给出每个周期的周期数。例如,每月12次(从docstring到seasonal_mean):
seasonal_decompose()
freq
DateTimeIndex
pandas.Index.inferred_freq
integer
docstring
seasonal_mean
def seasonal_decompose(x, model="additive", filt=None, freq=None): """ Parameters ---------- x : array-like Time series model : str {"additive", "multiplicative"} Type of seasonal component. Abbreviations are accepted. filt : array-like The filter coefficients for filtering out the seasonal component. The default is a symmetric moving average. freq : int, optional Frequency of the series. Must be used if x is not a pandas object with a timeseries index.
def seasonal_decompose(x, model="additive", filt=None, freq=None): """ Parameters ---------- x : array-like Time series model : str {"additive", "multiplicative"} Type of seasonal component. Abbreviations are accepted. filt : array-like The filter coefficients for filtering out the seasonal
component. The default is a symmetric moving average. freq : int, optional Frequency of the series. Must be used if x is not a pandas object with a timeseries index.
为了说明-使用随机样本数据:
length = 400 x = np.sin(np.arange(length)) * 10 + np.random.randn(length) df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value']) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28 Freq: W-SUN decomp = sm.tsa.seasonal_decompose(df) data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1) data.columns = ['series', 'trend', 'seasonal', 'resid'] Data columns (total 4 columns): series 400 non-null float64 trend 348 non-null float64 seasonal 400 non-null float64 resid 348 non-null float64 dtypes: float64(4) memory usage: 15.6 KB
到目前为止,一切都很好-现在从中随机删除元素DateTimeIndex以创建空间不均匀的数据:
df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))] <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21 Data columns (total 1 columns): value 222 non-null float64 dtypes: float64(1) memory usage: 3.5 KB df.index.freq None df.index.inferred_freq None
seasonal_decomp在此数据上运行“有效”:
seasonal_decomp
``` decomp = sm.tsa.seasonal_decompose(df, freq=52)
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1) data.columns = ['series', 'trend', 'seasonal', 'resid'] DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07 Data columns (total 4 columns): series 224 non-null float64 trend 172 non-null float64 seasonal 224 non-null float64 resid 172 non-null float64 dtypes: float64(4) memory usage: 8.8 KB
```
问题是- 结果有多有用。即使数据之间没有缺口,也无法使季节模式的推断复杂化(请参阅发行说明.interpolate()中的示例使用,也可以使此过程符合以下条件:statsmodels
.interpolate()
statsmodels
Notes ----- This is a naive decomposition. More sophisticated methods should be preferred. The additive model is Y[t] = T[t] + S[t] + e[t] The multiplicative model is Y[t] = T[t] * S[t] * e[t] The seasonal component is first removed by applying a convolution filter to the data. The average of this smoothed series for each period is the returned seasonal component.