jsoup 1.11.3 发布,Java 的 HTML 解析器

jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOMCSS以及类似于JQuery的操作方法来取出和操作数据

jsoup的主要功能如下:

  1. 从一个URL,文件或字符串中解析HTML;

  2. 使用DOM或CSS选择器来查找、取出数据;

  3. 可操作HTML元素、属性、文本;

jsoup是基于MIT协议发布的,可放心使用于商业项目。

此次更新内容:

改进

  • CDATA
    sections are now treated as whitespace preserving (regardless of the containing element), and are round-tripped into output HTML.

  • Added support for Deflate
    encoding.

  • When parsing <pre>
    tags, skip the first newline if present.

  • Support nested quotes for attribute selection queries.

  • Character references from Windows-1252 that are not valid Unicode are mapped to the appropriate Unicode replacement.

  • Accept a custom SSL socket factory in Jsoup.Connection
    Note
    that  Connection.validateTLSCertificates()
    will be removed in the next release;  Connection.sslSocketFactory(SSLSocketFactory sslSocketFactory)
    provides a path to implement a workaround if you need to keep using a similar approach.

Bug 修复

  • Bugfix: A Mark has been invalidated
    exception was thrown when parsing some URLs on Android <= 6.

  • Bugfix: The Element.text()
    for  <div>One</div>Two
    was  OneTwo
    , not  One Two
    .

  • Bugfix: boolean attributes with empty string values were not collapsing in HTML output.

  • Bugfix: when using the XML Parser set to lowercase normalize tags, uppercase closing tags were not correctly handled.

  • Bugfix: when parsing from a URL, an end tag could be read incorrectly if it started on a buffer boundary.

完整内容请查看 发布主页
下载地址

原文 

https://www.oschina.net/news/95196/jsoup-1-11-3-released

本站部分文章源于互联网,本着传播知识、有益学习和研究的目的进行的转载,为网友免费提供。如有著作权人或出版方提出异议,本站将立即删除。如果您对文章转载有任何疑问请告之我们,以便我们及时纠正。

PS:推荐一个微信公众号: askHarries 或者qq群:474807195,里面会分享一些资深架构师录制的视频录像:有Spring,MyBatis,Netty源码分析,高并发、高性能、分布式、微服务架构的原理,JVM性能优化这些成为架构师必备的知识体系。还能领取免费的学习资源,目前受益良多

转载请注明原文出处:Harries Blog™ » jsoup 1.11.3 发布,Java 的 HTML 解析器

赞 (0)
分享到:更多 ()

评论 0

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址