Setup
import pandas as pd
df = pd.DataFrame(
[
[5777, 5385, 5419, 4887],
[4849, 3759, 4539, 3381],
[4971, 3824, 4645, 3424],
[4827, 3459, 4552, 3153],
[5207, 3670, 4876, 3358],
],
index=pd.to_datetime(['2001-01-01',
'2002-01-01',
'2003-01-01',
'2004-01-01',
'2005-01-01']),
columns=pd.MultiIndex.from_tuples(
[('Total nonfarm', 'Hires'), ('Total nonfarm', 'Job Openings'),
('Total private', 'Hires'), ('Total private', 'Job Openings')]
)
)
print df
Total nonfarm Total private
Hires Job Openings Hires Job Openings
2001-01-01 5777 5385 5419 4887
2002-01-01 4849 3759 4539 3381
2003-01-01 4971 3824 4645 3424
2004-01-01 4827 3459 4552 3153
2005-01-01 5207 3670 4876 3358
Try:
df.T.groupby(level=0).diff(-1).dropna().T
Total nonfarm Total private
Hires Hires
2001-01-01 392.0 532.0
2002-01-01 1090.0 1158.0
2003-01-01 1147.0 1221.0
2004-01-01 1368.0 1399.0
2005-01-01 1537.0 1518.0
要应用其他变换(例如比率),您可以执行以下操作:
print df.T.groupby(level=0).apply(lambda x: np.exp(np.log(x).diff(-1))).dropna().T
Total nonfarm Total private
Hires Hires
2001-01-01 1.072795 1.108860
2002-01-01 1.289971 1.342502
2003-01-01 1.299948 1.356600
2004-01-01 1.395490 1.443704
2005-01-01 1.418801 1.452055
Or:
print df.T.groupby(level=0).apply(lambda x: x.div(x.shift(-1))).dropna().T
Total nonfarm Total private
Hires Hires
2001-01-01 1.072795 1.108860
2002-01-01 1.289971 1.342502
2003-01-01 1.299948 1.356600
2004-01-01 1.395490 1.443704
2005-01-01 1.418801 1.452055
要重命名列并与原始数据框合并,您可以:
df2 = df.T.groupby(level=0).diff(-1).dropna().T
df2.columns = pd.MultiIndex.from_tuples(
[('Total nonfarm', 'difference'),
('Total private', 'difference')])
pd.concat([df, df2], axis=1).sort_index(axis=1)
好像:
Total nonfarm Total private \
Hires Job Openings difference Hires Job Openings
2001-01-01 5777 5385 392.0 5419 4887
2002-01-01 4849 3759 1090.0 4539 3381
2003-01-01 4971 3824 1147.0 4645 3424
2004-01-01 4827 3459 1368.0 4552 3153
2005-01-01 5207 3670 1537.0 4876 3358
difference
2001-01-01 532.0
2002-01-01 1158.0
2003-01-01 1221.0
2004-01-01 1399.0
2005-01-01 1518.0