- HTML 4.0 中未定义单引号和双引号
HTML 4.0 中未定义单引号,双引号定义为"
开始HTML2.0
- StringEscapeUtils 无法将这 2 个字符转义到各自的实体中
escapeXml11
in StringEscapeUtils
supports http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringEscapeUtils.html将单引号转换为'
.
例如:
StringEscapeUtils.escapeXml11("'"); //Returns '
StringEscapeUtils.escapeHtml4("\""); //Returns "
- 还有其他与字符串相关的工具可以做到这一点吗?
HTML实用程序 http://docs.spring.io/spring-framework/docs/3.2.3.RELEASE/javadoc-api/org/springframework/web/util/HtmlUtils.htmlSpring框架负责处理单引号和双引号,它还将值转换为十进制(例如'
& "
)。
以下示例取自此问题的答案question https://stackoverflow.com/questions/1265282/recommended-method-for-escaping-html-in-java:
import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &
HtmlUtils.htmlEscape("&")` //gives &
- HTML Entities 4.0 中没有定义单引号和双引号的原因是什么?
As per HTML 4 中的字符实体引用 http://www.w3.org/TR/html4/sgml/entities.html单引号未定义。从 HTML2.0 开始可以使用双引号。而支持单引号作为XHTML1.0 http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML.
- 将所有 unicode 字符编码为相应实体的工具或方法
作为对此问题的答案的一部分,提到了一个非常好且简单的java实现question https://stackoverflow.com/questions/1265282/recommended-method-for-escaping-html-in-java?answertab=oldest#tab-top.
以下是基于该答案的示例程序:
import org.apache.commons.lang3.StringEscapeUtils;
public class HTMLCharacterEscaper {
public static void main(String[] args) {
//With StringEscapeUtils
System.out.println("Using SEU: " + StringEscapeUtils.escapeHtml4("\" ¶"));
System.out.println("Using SEU: " + StringEscapeUtils.escapeXml11("'"));
//Single quote & double quote
System.out.println(escapeHTML("It's good"));
System.out.println(escapeHTML("\" Grit \""));
//Unicode characters
System.out.println(escapeHTML("This is copyright symbol ©"));
System.out.println(escapeHTML("Paragraph symbol ¶"));
System.out.println(escapeHTML("This is pound £"));
}
public static String escapeHTML(String s) {
StringBuilder out = new StringBuilder(Math.max(16, s.length()));
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&' || c == '\'') {
out.append("&#");
out.append((int) c);
out.append(';');
} else {
out.append(c);
}
}
return out.toString();
}
}
以下是我在寻求答案时遇到的一些有趣的链接:
- 用于排版的常见 HTML 实体 http://www.w3.org/wiki/Common_HTML_entities_used_for_typography
- 为什么不应该'用于转义单引号? https://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes
- 命名角色参考' http://www.w3.org/TR/xhtml1/#C_16
- HTML 撇号 https://stackoverflow.com/questions/419718/html-apostrophe