Last active
September 26, 2021 18:50
-
-
Save vincent-zurczak/23e0f626eaafab96cb32 to your computer and use it in GitHub Desktop.
HTML 5 validation in Java (based on the Nu HTML Checker)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- Add this in your POM --> | |
<dependency> | |
<groupId>nu.validator</groupId> | |
<artifactId>validator</artifactId> | |
<version>15.3.14</version> | |
<scope>test</scope> | |
<exclusions> | |
<exclusion> | |
<groupId>org.eclipse.jetty</groupId> | |
<artifactId>*</artifactId> | |
</exclusion> | |
</exclusions> | |
</dependency> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Verifies that a HTML content is valid. | |
* @param htmlContent the HTML content | |
* @return true if it is valid, false otherwise | |
* @throws Exception | |
*/ | |
public boolean validateHtml( String htmlContent ) throws Exception { | |
InputStream in = new ByteArrayInputStream( htmlContent.getBytes( "UTF-8" )); | |
ByteArrayOutputStream out = new ByteArrayOutputStream(); | |
SourceCode sourceCode = new SourceCode(); | |
ImageCollector imageCollector = new ImageCollector(sourceCode); | |
boolean showSource = false; | |
MessageEmitter emitter = new TextMessageEmitter( out, false ); | |
MessageEmitterAdapter errorHandler = new MessageEmitterAdapter( sourceCode, showSource, imageCollector, 0, false, emitter ); | |
errorHandler.setErrorsOnly( true ); | |
SimpleDocumentValidator validator = new SimpleDocumentValidator(); | |
validator.setUpMainSchema( "http://s.validator.nu/html5-rdfalite.rnc", new SystemErrErrorHandler()); | |
validator.setUpValidatorAndParsers( errorHandler, true, false ); | |
validator.checkHtmlInputSource( new InputSource( in )); | |
return 0 == errorHandler.getErrors(); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The errors flow from
SimpleDocumentValidator
into theMessageEmitterAdapter
into theTextMessageEmitter
into theByteArrayOutputStream
.To actually see them you'll have to call
errorHandler.end(...)
before readingout
.I agree a nicer way to 'programmatically' collect the errors would be great, but I didn't see anything particulary nice yet.