我有两个不同大小的 data.frame,我正在寻找最有效的方法来将字符串从一个 data.frame 匹配到另一个 data.frame,并提取一些相关信息。
这是一个例子:
两个初始 data.frames,a 和 b,以及所需的结果:
a = data.frame(term = c("red", "salad", "rope", "ball", "tent", "plane", "gift", "meat"),
age = c(30, 24, 52, 44, 73, 44, 33, 12),
visits = c(5, 1, 3, 2, 8, 5, 19, 3))
b = data.frame(string = c("the red ball went over the fence",
"sorry to see that your tent fell down",
"the ball fell into the red salad",
"serious people eat peanuts on Sundays"))
desired_result = data.frame(string = b$string,
num_matches = c(2, 1, 3, 0),
avg_age = c(37, 73, 32.66667, NA),
avg_visits = c(3.5, 8, 2.66667, NA))
以下是更易读格式的 data.frames:
> a
term age visits
1 red 30 5
2 salad 24 1
3 rope 52 3
4 ball 44 2
5 tent 73 8
6 plane 44 5
7 gift 33 19
8 meat 12 3
> b
string
1 the red ball went over the fence
2 sorry to see that your tent fell down
3 the ball fell into the red salad
4 serious people eat peanuts on Sundays
> desired_result
string num_matches avg_age avg_visits
1 the red ball went over the fence 2 37.00000 3.50000
2 sorry to see that your tent fell down 1 73.00000 8.00000
3 the ball fell into the red salad 3 32.66667 2.66667
4 serious people eat peanuts on Sundays 0 NA NA
- num_matches 是“字符串”中“术语”的数量
- avg_age 是“string”中找到的“term”的平均年龄
- avg_visits 是“string”中找到的“term”的平均访问次数
关于如何有效地实施这一点有什么想法吗?
谢谢。