这里有一些不同的可能方法;使用适合您的任何一个。我下面所有的代码示例都使用requests http://docs.python-requests.org/en/master/对于 API 的 HTTP 请求;你可以安装requests
with pip install requests
如果你有皮普。他们也都使用媒体维基API https://www.mediawiki.org/wiki/API:Main_page,并且两个使用query https://www.mediawiki.org/wiki/API:Query终点;如果您需要文档,请点击这些链接。
1. 使用以下命令直接从 API 获取整个页面或页面“提取”的纯文本表示形式extracts
prop
请注意,此方法仅适用于 MediaWiki 网站文本提取扩展 https://www.mediawiki.org/wiki/Extension:TextExtracts。这尤其包括维基百科,但不包括一些较小的 Mediawiki 网站,例如,http://www.wikia.com/ http://www.wikia.com/
你想点击这样的网址
分解它,我们在那里得到以下参数(记录在https://www.mediawiki.org/wiki/Extension:TextExtracts#query+extracts https://www.mediawiki.org/wiki/Extension:TextExtracts#query+extracts):
-
action=query
, format=json
, and title=Bla_Bla_Bla
都是标准 MediaWiki API 参数
-
prop=extracts
让我们使用 TextExtracts 扩展
-
exintro
限制对第一节标题之前内容的响应
-
explaintext
使响应中的摘录为纯文本而不是 HTML
然后解析 JSON 响应并提取摘录:
>>> import requests
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'query',
... 'format': 'json',
... 'titles': 'Bla Bla Bla',
... 'prop': 'extracts',
... 'exintro': True,
... 'explaintext': True,
... }
... ).json()
>>> page = next(iter(response['query']['pages'].values()))
>>> print(page['extract'])
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
2. 使用以下命令获取页面的完整 HTMLparse
端点,解析它并提取第一段
MediaWiki 有一个parse终点 https://www.mediawiki.org/wiki/API:Parsing_wikitext#parse你可以用类似的 URL 来访问获取页面的 HTML。然后你可以使用 HTML 解析器来解析它,例如lxml http://lxml.de/(首先安装它pip install lxml
) 提取第一段。
例如:
>>> import requests
>>> from lxml import html
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'parse',
... 'page': 'Bla Bla Bla',
... 'format': 'json',
... }
... ).json()
>>> raw_html = response['parse']['text']['*']
>>> document = html.document_fromstring(raw_html)
>>> first_p = document.xpath('//p')[0]
>>> intro_text = first_p.text_content()
>>> print(intro_text)
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
3. 自己解析维基文本
您可以使用query
用于获取页面的维基文本的 API,使用它来解析它mwparserfromhell
(首先使用安装它pip install mwparserfromhell
),然后使用将其缩减为人类可读的文本strip_code http://mwparserfromhell.readthedocs.io/en/latest/api/mwparserfromhell.html#mwparserfromhell.wikicode.Wikicode.strip_code. strip_code
在撰写本文时还不能完美地工作(如下面的示例所示),但希望能够改进。
>>> import requests
>>> import mwparserfromhell
>>> response = requests.get(
... 'https://en.wikipedia.org/w/api.php',
... params={
... 'action': 'query',
... 'format': 'json',
... 'titles': 'Bla Bla Bla',
... 'prop': 'revisions',
... 'rvprop': 'content',
... }
... ).json()
>>> page = next(iter(response['query']['pages'].values()))
>>> wikicode = page['revisions'][0]['*']
>>> parsed_wikicode = mwparserfromhell.parse(wikicode)
>>> print(parsed_wikicode.strip_code())
{{dablink|For Ke$ha's song, see Blah Blah Blah (song). For other uses, see Blah (disambiguation)}}
"Bla Bla Bla" is the title of a song written and recorded by Italian DJ Gigi D'Agostino. It was released in May 1999 as the third single from the album, L'Amour Toujours. It reached number 3 in Austria and number 15 in France. This song can also be heard in an added remixed mashup with L'Amour Toujours (I'll Fly With You) in its US radio version.
Background and writing
He described this song as "a piece I wrote thinking of all the people who talk and talk without saying anything". The prominent but nonsensical vocal samples are taken from UK band Stretch's song "Why Did You Do It"''.
Music video
The song also featured a popular music video in the style of La Linea. The music video shows a man with a floating head and no arms walking toward what appears to be a shark that multiplies itself and can change direction. This style was also used in "The Riddle", another song by Gigi D'Agostino, originally from British singer Nik Kershaw.
Chart performance
Chart (1999-00)PeakpositionIreland (IRMA)Search for Irish peaks23
References
External links
Category:1999 singles
Category:Gigi D'Agostino songs
Category:1999 songs
Category:ZYX Music singles
Category:Songs written by Gigi D'Agostino