我想获取网站的 HTML 代码,然后从该 HTML 文件中获取某个元素。
有些东西可以获取 HTML 代码,例如 ajax 和 jquery。我正在使用节点并希望它是完整的JavaScript。另外,我不知道如何从中获取某个元素。
我已经在 python 中完成了此操作,但我需要在 javascript 中完成此操作。为了简单起见。我们以网站为例——https://example.com https://example.com。这是网站 HTML 代码的主体。
<body>
<div>
#Some Stuff
</div>
</body>
我想要获得 div 类,让我们来吧<div>
to be <div class="test">
让事情变得更容易。
最后,我想得到-的内容<div class="test">
像这样-
<div class="test">
#Some Stuff
</div>
提前致谢
对于 Node.js 有两个本机获取模块:http
and https
。如果您想使用 Node.js 应用程序进行抓取,那么您可能应该使用https
,获取页面的 html,用 html 解析器解析它,我推荐cheerio
。这是一个例子:
// native Node.js module
const https = require('https')
// don't forget to `npm install cheerio` to get the parser!
const cheerio = require('cheerio')
// custom fetch for Node.js
const fetch = (method, url, payload=undefined) => new Promise((resolve, reject) => {
https.get(
url,
res => {
const dataBuffers = []
res.on('data', data => dataBuffers.push(data.toString('utf8')))
res.on('end', () => resolve(dataBuffers.join('')))
}
).on('error', reject)
})
const scrapeHtml = url => new Promise((resolve, reject) =>{
fetch('GET', url)
.then(html => {
const cheerioPage = cheerio.load(html)
// cheerioPage is now a loaded html parser with a similar interface to jQuery
// FOR EXAMPLE, to find a table with the id productData, you would do this:
const productTable = cheerioPage('table .productData')
// then you would need to reload the element into cheerio again to
// perform more jQuery like searches on it:
const cheerioProductTable = cheerio.load(productTable)
const productRows = cheerioProductTable('tr')
// now we have a reference to every row in the table, the object
// returned from a cheerio search is array-like, but native JS functions
// such as .map don't work on it, so we need to do a manually calibrated loop:
let i = 0
let cheerioProdRow, prodRowText
const productsTextData = []
while(i < productRows.length) {
cheerioProdRow = cheerio.load(productRows[i])
prodRowText = cheerioProdRow.text().trim()
productsTextData.push(prodRowText)
i++
}
resolve(productsTextData)
})
.catch(reject)
})
scrapeHtml(/*URL TO SCRAPE HERE*/)
.then(data => {
// expect the data returned to be an array of text from each
// row in the table from the html we loaded. Now we can do whatever
// else you want with the scraped data.
console.log('data: ', data)
})
.catch(err => console.log('err: ', err)
快乐刮擦!
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)