我有这个 PySpark DataFrame
df = pd.DataFrame(np.array([
["[email protected] /cdn-cgi/l/email-protection",2,3], ["[email protected] /cdn-cgi/l/email-protection",5,5],
["[email protected] /cdn-cgi/l/email-protection",8,2], ["[email protected] /cdn-cgi/l/email-protection",9,3]
]), columns=['user','movie','rating'])
sparkdf = sqlContext.createDataFrame(df, samplingRatio=0.1)
user movie rating
[email protected] /cdn-cgi/l/email-protection 2 3
[email protected] /cdn-cgi/l/email-protection 5 5
[email protected] /cdn-cgi/l/email-protection 8 2
[email protected] /cdn-cgi/l/email-protection 9 3
我需要添加一个新列,其中包含按用户排名
我想要这个输出
user movie rating Rank
[email protected] /cdn-cgi/l/email-protection 2 3 1
[email protected] /cdn-cgi/l/email-protection 5 5 1
[email protected] /cdn-cgi/l/email-protection 8 2 2
[email protected] /cdn-cgi/l/email-protection 9 3 3
我怎样才能做到这一点?