这是我的 HTML 源代码
<li>
<a href="/info/some1>Item 1<br>
<span class="deets">111</span>
</a>
</li>
<li>
<a href="/info/some2>Item 2<br>
<span class="deets">222</span>
</a>
</li>
<li>
<a href="/info/some3>Item 3<br>
<span class="deets">333</span>
</a>
</li>
这是我的 Java 程序,用于获取内容并过滤 HTML 标签
try {
myurl = new URL("http://www.somewebsite.com");
HttpURLConnection con= (HttpURLConnection) myurl.openConnection();
InputStream result = con.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(result));
StringBuilder sb = new StringBuilder();
for(String line; (line = reader.readLine()) != null;)
//append all content & separate using line separator
sb.append(line).append(System.getProperty("line.separator"));
String final_result = sb.toString().replaceAll("\\<.*?\\>", "");
TextView tv=(TextView) findViewById(R.id.textView1);
tv.setText(final_result);
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
tv.setText("not working");
}
有没有更简单的方法使用 Jsoup 来使用 Java 而不是 Regex 来解析 HTML 内容
-
有没有办法只获取所需的内容。所以这里我只想要内容“Item 2 - 222”
<li>
<a href="/info/some2>Item 2<br>
<span class="deets">222</span>
</a>
</li>
尝试使用 jsoup 轻松解析:
// To parse the html page
Document doc = Jsoup.connect("http://www.website.com").get();
Document doc1 = Jsoup.parse("<html><head><title>First parse</title></head>" + "<body> <p>Parsed HTML into a doc.</p></body></html>");
String content = doc.body().text();
// To get specific elements such as links
Element links = doc.select("a[href]");
for(Element e: links){
System.out.println("link: " + e.attr("abs:href"));
}
要了解更多信息,请访问Jsoup 文档
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)