我有一个 XML 文件及其相应的 XSD 文件。在使用 StAX 解析器进行验证时,我附加了一个错误处理程序。基本上,我在格式良好的 XML 文件中遇到两种类型的错误。
1) 元素内的数据类型不正确,例如元素内的字符串应该具有整数。
2) 缺少元素:根据 XSD 必须存在的元素在 XML 中不存在。
使用 StAX 解析器和自定义错误处理程序,我能够纠正第一种类型的错误。但对于第二种类型,会触发 CHARACTER 事件,并且 TEXT 的值是紧邻的下一个元素的值。我不知道如何找出缺失的元素。另外,为什么触发 CHARACTER 事件而丢失的元素被完全忽略?
由于 StAX 解析器仅向前,有没有办法使用其他解析器纠正这两个错误?
import java.io.File;
import java.io.IOException;
import javax.xml.XMLConstants;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import javax.xml.validation.Validator;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLValidation {
public static void main(String[] args) {
XMLValidation xmlValidation = new XMLValidation();
System.out.println(xmlValidation.validateXMLSchema("PHSHumanSubjectsAndClinicalTrialsInfo-V1.0.xsd", "FullPHSHuman.xml"));
}
public boolean validateXMLSchema(String xsdPath, String xmlPath){
try {
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File(xsdPath));
StreamSource XML = new StreamSource(xmlPath);
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(XML);
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyErrorHandler(reader));
validator.validate(new StAXSource(reader));
} catch (IOException | SAXException | XMLStreamException e) {
System.out.println("Exception: "+e.getMessage() + " local message " + e.getLocalizedMessage() + " cause " + e.getCause());
return false;
}
return true;
}
}
class MyErrorHandler implements ErrorHandler {
private XMLStreamReader reader;
public MyErrorHandler(XMLStreamReader reader) {
this.reader = reader;
}
@Override
public void error(SAXParseException e) throws SAXException {
System.out.println("error");
warning(e);
}
@Override
public void fatalError(SAXParseException e) throws SAXException {
System.out.println("fatal error");
warning(e);
}
@Override
public void warning(SAXParseException e) throws SAXException {
if(reader.getEventType() == 1 || reader.getEventType() == 2) {
//The first type of error is detected here.
System.out.println(reader.getLocalName());
System.out.println(reader.getNamespaceURI());
}
if(reader.getEventType() == XMLStreamConstants.CHARACTERS) {
int start = reader. getTextStart();
int length = reader.getTextLength();
System.out.println(new String(reader.getTextCharacters(), start, length));
}
}
}
下面是格式良好的 XML 文件的片段:
<?xml version="1.0" encoding="UTF-8"?>
<PHSHumanSubjectsAndClinicalTrialsInfo:PHSHumanSubjectsAndClinicalTrialsInfo xmlns:PHSHumanSubjectsAndClinicalTrialsInfo="http://apply.grants.gov/forms/PHSHumanSubjectsAndClinicalTrialsInfo-V1.0" PHSHumanSubjectsAndClinicalTrialsInfo:FormVersion="1.0"
>
<!-- <PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
>Y: </PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator
>-->
<PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
>Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator1
>
<PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
>Y: Yes</PHSHumanSubjectsAndClinicalTrialsInfo:HumanSubjectsIndicator2
>
这里对 HumanSubjectsIndicator 元素进行注释以引发第二种情况。在这种情况下,“MyErrorHandler”中会触发 CHARACTER 事件。值“Y:Yes”是通过 reader.getTextCharacters() 获得的。该值对应于 HumanSubjectsIndicator1 元素(使用 getLocation() 方法找到该元素)。
有没有办法准确获取丢失元素的本地名称。如果不使用 StAX,那么使用其他解析器?
Thanks.