grouped = GROUP table BY userid;
X = FOREACH grouped GENERATE group as userid,
table.clickcount as clicksbag,
table.pagenumber as pagenumberbag;
Now X
将:
{(155,{(2),(3),(1)},{(12),(133),(144)},
(156,{(6),(7)},{(1),(5)}}
现在您需要使用内置 UDF BagToTuple http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/builtin/BagToTuple.html:
output = FOREACH X GENERATE userid,
BagToTuple(clickbag) as clickcounts,
BagToTuple(pagenumberbag) as pagenumbers;
output
现在应该包含您想要的内容。您也可以将输出步骤合并到合并步骤中:
output = FOREACH grouped GENERATE group as userid,
BagToTuple(table.clickcount) as clickcounts,
BagToTuple(table.pagenumber) as pagenumbers;