score:-1
I tested the above code base with the following case class structres
case class Field3Array(
key1: String,
key2: List[String]
)
case class Input(
field1: String,
field2: String,
field3Array: List[Field3Array]
)
case class Output(
field1: String,
field2: String,
requiredKey: String,
field3Array: List[Field3Array]
)
case class Root(
Input: Input,
Output: Output
)
The Json string cannot be directly passed to the DataFrameReader as you have tried since the json
method expects a path.
I put the JSON string in a file and passed the file path to the DataFrameReader and the results were as follows
import org.apache.spark.sql.{Encoder,Encoders}
import org.apache.spark.sql.Dataset
case class Field3Array(
key1: String,
key2: List[String]
)
case class Input(
field1: String,
field2: String,
field3Array: List[Field3Array]
)
case class Output(
field1: String,
field2: String,
requiredKey: String,
field3Array: List[Field3Array]
)
case class Root(
Input: Input,
Output: Output
)
val pathToJson: String = "file:////path/to/json/file/on/local/filesystem"
val jsEncoder: Encoder[Root] = Encoders.product[Root]
val df: Dataset[Root] = spark.read.option("multiline","true").json(pathToJson).as[Root]
The results for show are as follows:
df.show(false)
+--------------------------------------------+--------------------------------------------------------------+
|Input |Output |
+--------------------------------------------+--------------------------------------------------------------+
|[Test1, Test2, [[Key123, [keyxyz, keyAbc]]]]|[Test2, Test3, [[Key123, [keyxyz, keyAbc]]], RequiredKeyValue]|
+--------------------------------------------+--------------------------------------------------------------+
df.select("Input.field1").show()
+------+
|field1|
+------+
| Test1|
+------+
score:1
The problem is that spark uses struct types to map a class to a Row
, take this as an example:
case class MyRow(a: String, b: String, c: Option[String])
Can spark create a dataframe, which sometimes has column c
and sometimes not? like:
+-----+-----+-----+
| a | b | c |
+-----+-----+-----+
| a1 | b1 | c1 |
+-----+-----+-----+
| a2 | b2 | <-- note the non-existence here :)
+-----+-----+-----+
| a3 | b3 | c3 |
+-----+-----+-----+
Well it cannot, and being nullable, means the key has to exist, but the value can be null:
... other key values
"optionalKey": null,
...
This is considered to be valid, and is convertible to your structs. I suggest you use a dedicated JSON library (as you know there are many of them out there), and use udf's or something to extract what you need from json.
Source: stackoverflow.com
Related Query
- Using Spark converting nested json with optional fields to Scala case class not working
- Parse json array to a case class in scala using playframework with the fields in json not matching the fields in case class
- Custom Json Writes with combinators - not all the fields of the case class are needed
- spark convert dataframe to dataset using case class with option fields
- Convert JSON to case class with a nested objects using Scala/Play
- How to send Json from client with missing fields for its corresponding Case Class after using Json.format function
- Write an Arbitrary Value Not Found in a Case Class Using Play's (2.2) Scala JSON Combinators
- How to create nested json using Apache Spark with Scala
- Issue converting scala case class to spark dataset with embedded options that wrap monads
- Not able to use JSON Macro Inception to convert nested JSON structure to Scala case class object
- No TypeTag available for a case class using scala 3 with spark 3
- flattening of nested json using spark scala creating 2 column with same name and giving error of duplicate in Phoenix
- How to read optional json values in case class using scala combinators
- Using a case class with map and nested case class with gremlin scala
- Compare a list values with case class using Scala and Spark
- json to case class using multiple rows in spark scala
- (Un)marshall JSON with named root for Ember Data using Scala case class on Spray
- How to update a mongo record using Rogue with MongoCaseClassField when case class contains a scala Enumeration
- Case Classes with optional fields in Scala
- Play json merge formats for case class with more than 22 fields
- Scala Dynamic Parse Json using case class No Manifest available for T
- compare case class fields with sub fields of another case class in scala
- Using new with Scala final case class
- Scala case class copy constructor with dynamic fields
- Scala - case class with 100 fields (StackOverflowError)
- Spark explode nested JSON with Array in Scala
- how to parse generic case class fields using scala option parser?
- Scala case class copy with optional values
- scala.MatchError: [abc,cde,null,3] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) in Spark JSON with missing fields
- How to specify only particular fields using read.schema in JSON : SPARK Scala
More Query from same tag
- How to sort RDD
- Scala Spark - Cannot resolve a column name
- single "*" wildcard in Lift SiteMap Menu entry
- Get a FieldMirror for a field on a package object
- How to manage properly Elastic Java Rest Client timeout
- psql \copy command using scala process builder syntax error
- Spec2: Getting second parameter argument passed into the mocked method invocation
- Running a scala program with the java tool / scala main function
- Scala-way to initialize members?
- Is OpenCL good for agent based simulation?
- How can I cast a string to a date in Scala so I can filter in in SparkSQL?
- What are the guarantees for scala access qualifiers?
- Executing tags as per desired order and enabling BeforeAll and AfterAll in scala
- Scala library for finding plural of string, and possible singulars of a plural?
- How to concatenate string in a scala.collection.mutable.WrappedArray using @tailrec?
- scala: classOf[], exceptions and :: operator for lists weird behavior
- Spark Cassandra application always running mode
- percentage of filtered columns after applying groupBy in spark Dataframe
- How to prevent repeated processing of failed message by ask pattern in Akka?
- What is the difference between curly bracket and parenthesis in a Scala for loop?
- Reactive Mongo: Trouble For Sync Multiple Queries In Reactive Mongo
- Need help in understanding Akka
- found String, Required (String,String,String,Int): tuples-scala
- In Scala how do I define upper type bounds that are exclusive of the defined class?
- a confusion about calling super class method in scala
- How do I properly combine numerical features with text (bag of words) in Spark?
- Unresolved dependency in SBT: no ivy file found
- Implicit parameters won't work on unapply. How to hide ubiquitous parameters from extractors?
- @tailrec error - "recursive call targeting a supertype"
- Spark ml cosine similarity: how to get 1 to n similarity score