score:5
i do wonder if 0x12 is valid even in xml 1.1. see this summary on 1.0 versus 1.1 differences. in particular:
in addition, xml 1.1 allows you to have control characters in your documents through the use of character references. this concerns the control characters #x1 through #x1f, most of which are forbidden in xml 1.0. this means that your document can now include the bell character, like this: . however, you still cannot have these characters appear directly in your documents; this violates the definition of the mime type used for xml (text/xml).
xerces can parse xml 1.1 but seems to expect the entity 
instead of the true 0x12 character:
val s = "<?xml version='1.1'?><root>\u0012</root>"
// causes an invalid xml character (unicode: 0x12)
//xml.loadxml(xml.source.fromstring(s), xml.parser)
val u = "<?xml version='1.1'?><root></root>"
val v = xml.loadxml(xml.source.fromstring(u), xml.parser)
println(v) // works
as suggested by lavinio, you may be able to filter out invalid characters. this does not take too many lines in scala:
val in = new inputstream {
val in0 = new fileinputstream("invalid.xml")
override def read():int = in0.read match { case 0x12=> read() case x=> x}
}
val x = xml.load(in)
score:3
0x12 is only valid in xml 1.1. if your xml file states that version, you might be able to turn on 1.1 processing support in your sax parser.
otherwise, the underlying parser is probably xerces, which, as a conforming xml parser, properly is complaining.
if you must handle these streams, i'd write a wrapper inputstream or reader around my input file, filter out the characters with invalid unicode values, and pass the rest on.
score:11
to expand on @huynhjl's answer: the inputstream
filter is dangerous if you have multi-byte characters, for example in utf-8 encoded text. instead, use a character oriented filter: filterreader
. or if the file is small enough, load into a string
and replace the characters there.
scala> val origxml = "<?xml version='1.1'?><root>\u0012</root>"
origxml: java.lang.string = <?xml version='1.1'?><root></root>
scala> val cleanxml = xml flatmap {
case x if character.isisocontrol(x) => "&#x" + integer.tohexstring(x) + ";"
case x => seq(x)
}
cleanxml: string = <?xml version='1.1'?><root></root>
scala> scala.xml.xml.loadstring(cleanxml)
res14: scala.xml.elem = <root></root>
Source: stackoverflow.com
Related Query
- Can I ignore invalid XML character using Scala's builtin xml handlers?
- In Scala, how can I put an incrementing ID in XML element using transformer / Rewrite rule
- How can I ignore non-matching preceding text when using Scala's parser combinators?
- How can XML errors be detected when using XMLEventReader?
- Can I edit xml using scala?
- How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)
- How can I get the value of an attribute called xlink:href of an xml node by using Scala
- How can I get complete stacktraces for exceptions thrown in tests when using sbt and testng?
- How can I connect to a postgreSQL database into Apache Spark using scala?
- Scala: How can I replace value in Dataframes using scala
- How can I handle a > 22 column table with Slick using nested tuples or HLists?
- How do I parse an xml document as a stream using Scala?
- How can I avoid mutable variables in Scala when using ZipInputStreams and ZipOutpuStreams?
- UnresolvedException: Invalid call to dataType on unresolved object when using DataSet constructed from Seq.empty (since Spark 2.3.0)
- How can I fix missing conf files when using shadowJar and Scala dependencies?
- How to split an inbound stream on a delimiter character using Akka Streams
- Building Apache Spark using SBT: Invalid or corrupt jarfile
- An example of xml processing but using anti-xml instead of Scala xml
- How can I load Avros in Spark using the schema on-board the Avro file(s)?
- How does one validate the schema of an XML file using Scala?
- Invalid Json: No content to map due to end-of-input when using play body parser
- How to declare scala method so that it can be called from Java using varargs style
- How can I connect to a MySQL database using Scala?
- Illegal base64 character "a" using java.util.Base64 from within Scala
- Using Scala 2.10 reflection how can I list the values of Enumeration?
- Scala how can I uppercase first character and lowercase others
- Akka TCP client: How can I send a message over TCP using akka actor
- How can I evaluate a lazy val using reflection?
- How can I configure Circe to stop using nested class names as key names in encoded JSON?
- Can you use antixml to create xml documents?
More Query from same tag
- How to write a timer actor in Scala?
- Extending a SortedMap in Scala
- How to convert bytes to int and compare with value
- Scala, Hammock - retrieve http response headers and convert JSON to custom object
- scala code for regex pattern matching
- Spark Streaming - Parquet file upload to S3 error
- Spark dataframe groupby and order group?
- Internal type U dependent on trait def
- How to remove empty strings/null with mapping on scala?
- Extracting members of a collection based on the contained type when the member is covariant on that type
- Returning Error for Invalid Parse inside of `rep`
- Get distinct items from rows of comma separated strings in Spark 2.0
- How to recognize boxing/unboxing in a decompiled Scala code?
- SBT builder crashed with error message "null"
- Implement nested loop with condition in Scala
- scala Futures: possible to find out if an 'onFailure' callback has been installed (so we can implement default error handling)?
- Why does Scala fires the trim function in None.map(_.trim)?
- SparkML - Creating a df(feature, feature_importance) of a RandomForestRegressionModel
- In Slick, what import is needed to use Table[T]?
- Writing a library that is optionally dependent on a third party library
- scala MongoDB update with $cond and $not (not display the same result)
- Run list of akka actors for list of messages
- Importing Functions from Other Classes in Scala
- Wrapping a class with side-effects
- Scala - XML parsing tag not working properly
- Slick threadLocalSession vs implicit session
- Spark MLib Statistics: where is it?
- How to read scala documentation using reflection
- SBT: execute task using other task value
- Type mismatch, expected: DBIOAction[NotInferedR, NoStream, Nothing], actual: Future[PortalPostgresProfile.ProfileAction[Int, NoStream, Effect.Write]]