Google Scholar 是否有可供我们在研究应用程序中使用的 API?


我正在开展一个研究出版物和合作项目,其中有文献检索功能。 Google Scholar 似乎可以工作,因为它是一个开源工具,但是当我研究 Google Scholar 时,我找不到任何有关它具有 API 的信息。

有谷歌学术的 API 吗?

没有官方的 Google Scholar API

有第三方解决方案,例如免费的scholarly支持的Python包profile, author, cite and organic结果 (search_pubs似乎是获得有机结果的方法,尽管方法名称让我感到困惑)。

请注意,通过使用scholarly如果持续没有请求速率限制,Google 可能会阻止您的 IP (@RadioControlled 提到。明智地使用它。

此外,还有一个scrape-google-scholar-py模块可让您提取几乎所有 Google Scholar 页面。

或者,有一个谷歌学术 API来自 SerpApi,这是一个付费 API,具有免费计划,支持organic, cite, profile, author结果并绕过 SerpApi 后端上的所有阻止,因此它不会阻止您的 IP,并且它会处理抓取的合法部分。

使用以下命令解析配置文件结果的示例代码scholarly using search_by_keyword method:

import json
from scholarly import scholarly

# will paginate to the next page by default
authors = scholarly.search_keyword("biology")

for author in authors:
    print(json.dumps(author, indent=2))

# part of the output:

  "container_type": "Author",
  "filled": [],
  "scholar_id": "LXVfPc8AAAAJ",
  "url_picture": "",
  "name": "Eric Lander",
  "affiliation": "Broad Institute",
  "email_domain": "",
  "interests": [
  "citedby": 552013
... other author results


from google_scholar_py import CustomGoogleScholarProfiles
import json

parser = CustomGoogleScholarProfiles()
data = parser.scrape_google_scholar_profiles(
print(json.dumps(data, indent=2))


    "name": "Adam Lobel",
    "link": "",
    "affiliations": "Blizzard Entertainment",
    "interests": [
      "Emotion regulation"
    "email": "Verified email at",
    "cited_by_count": 3593
  }, # other results...

使用以下命令解析有机结果的示例代码Google 学术搜索结果 API来自 SerpApi:

import json
from serpapi import GoogleScholarSearch

# search parameters
params = {
    "api_key": "Your SerpApi API key",
    "engine": "google_scholar_profiles",
    "hl": "en",                            # language
    "mauthors": "biology"                  # search query

search = GoogleScholarSearch(params)
results = search.get_dict()

# only first page results
for result in results["profiles"]:
    print(json.dumps(result, indent=2))

# part of the output:
  "name": "Masatoshi Nei",
  "link": "",
  "serpapi_link": "",
  "author_id": "VxOmZDgAAAAJ",
  "affiliations": "Laura Carnell Professor of Biology, Temple University",
  "email": "Verified email at",
  "cited_by": 384074,
  "interests": [
      "title": "Evolution",
      "serpapi_link": "",
      "link": ""
      "title": "Evolutionary biology",
      "serpapi_link": "",
      "link": ""
      "title": "Molecular evolution",
      "serpapi_link": "",
      "link": ""
      "title": "Population genetics",
      "serpapi_link": "",
      "link": ""
      "title": "Phylogenetics",
      "serpapi_link": "",
      "link": ""
  "thumbnail": ""
... other results

有一个专门的使用 Python 抓取历史 Google Scholar 结果我在 SerpApi 上的博客文章展示了如何将历史性的 2017-2021 Organic、Cite Google Scholar 结果抓取到 CSV、SQLite。

还有一篇关于在 R 中抓取 Google Scholar,如果你不是 Python 爱好者。

免责声明,我为 SeprApi 工作


