Using rename_axis
+ reset_index
+ melt
:
df.rename_axis('Source')\
.reset_index()\
.melt('Source', value_name='Weight', var_name='Target')\
.query('Source != Target')\
.reset_index(drop=True)
Source Target Weight
0 B A 1.0
1 C A 0.8
2 D A 0.0
3 A B 0.5
4 C B 0.0
5 D B 0.0
6 A C 0.5
7 B C 0.0
8 D C 1.0
9 A D 0.0
10 B D 0.0
11 C D 0.2
melt
已作为函数引入DataFrame
对象截至0.20
,对于旧版本,您需要pd.melt
反而:
v = df.rename_axis('Source').reset_index()
df = pd.melt(
v,
id_vars='Source',
value_name='Weight',
var_name='Target'
).query('Source != Target')\
.reset_index(drop=True)
Timings
x = np.random.randn(1000, 1000)
x[[np.arange(len(x))] * 2] = 0
df = pd.DataFrame(x)
%%timeit
df.index.name = 'Source'
df.reset_index()\
.melt('Source', value_name='Weight', var_name='Target')\
.query('Source != Target')\
.reset_index(drop=True)
1 loop, best of 3: 139 ms per loop
# Wen's solution
%%timeit
df.values[[np.arange(len(df))]*2] = np.nan
df.stack().reset_index()
10 loops, best of 3: 45 ms per loop