I am using a third party application and would like to change one of its files. The file is stored in XML but with an invalid doctype.
When I try to read use a it errors out becuase the doctype contains "file:///ReportWiz.dtd" (as shown, with quotes) and I get an exception for cannot find file. Is there a way to tell the docbuilder to ignore this? I have tried setValidate to false and setNamespaceAware to false for the DocumentBuilderFactory.
The only solutions I can think of are
- copy file line by line into a new file, omitting the offending line, doing what i need to do, then copying into another new file and inserting the offending line back in, or
- doing mostly the same above but working with a FileStream of some sort (though I am not clear on how I could do this..help?)
DocumentBuilderFactory docFactory = DocumentBuilderFactory
.newInstance();
docFactory.setValidating(false);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(file);
-
My first thought was dealing with it as a stream. You could make a new adapter at some level and just copy input to output except for the offending text.
If the file is shortish (under half a gig or so) you could also read the entire thing into a byte array and make your modifications there, then create a new stream from the byte array into your builder.
That's the advantage of the amazingly bulky way Java handles streams, you actually have a lot of flexibility.
Adam Lerman : could you maybe help me with some example code(or a link), this sounds a lot like what I want to do.Bill K : Looks like what you want to do is subclass FilterInputStream and overwrite read(). When your read is called, call super.read() to get the data, scan & modify the data, and return it. I'll fool around with it if I get some time, but it shouldn't be too hard.Bill K : Here is an example that has very simple filtering (it excludes unprintable characters from the stream I believe). http://www.cafeaulait.org/slides/sd2000west/javaio/44.html Your case is harder because you need to recognize a multi-character pattern. -
Another thing I was debating was storing all of the file in a string, then doing my manipulations and wiring the String out to a file.None of these seem clean or easy, but what is the best way to do this?
-
Handle resolution of the DTD manually, either by returning a copy of the DTD file (loaded from the classpath) or by returning an empty one. You can do this by setting an entity resolver on your document builder:
EntityResolver er = new EntityResolver() { @Override public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException { if ("file:///ReportWiz.dtd".equals(systemId)) { System.out.println(systemId); InputStream zeroData = new ByteArrayInputStream(new byte[0]); return new InputSource(zeroData); } return null; } };Adam Lerman : More complex then I needed. I didnt try this but I was really only looking for a way to ignore it completely. -
Tell your DocumentBuilderFactory to ignore the DTD declaration like this:
docFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);See here for a list of available features.
You also might find JDOM a lot easier to work with than org.w3c.dom:
org.jdom.input.SAXBuilder builder = new SAXBuilder(); builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); org.jdom.Document doc = builder.build(file);Adam Lerman : EXACTLY what I needed. THANKS!! Welcom to SO.
0 comments:
Post a Comment