score:1
Accepted answer
I suppose this is what you were trying to do:
val dfHome: DataFrame = ???
val dsHome: Dataset[General] = dfHome.as[General]
val dsMary1: Dataset[Mary] = dsHome.flatMap { case General(id, name, addrs, _) =>
addrs.map { case AddressMary(street, house) => Mary(id, name, street, house) }
}
val dsJohn1: Dataset[John] = dsHome.flatMap { case General(id, name, _, addrs) =>
addrs.map { case AddressJohn(street, house) => John(id, name, street, house) }
}
You can also rewrite it with for-comprehension as:
val dsMary2: Dataset[Mary] =
for {
General(id, name, addrs, _) <- dsHome
AddressMary(street, house) <- addrs
} yield Mary(id, name, street, house)
val dsJohn2: Dataset[John] =
for {
General(id, name, _, addrs) <- dsHome
AddressJohn(street, house) <- addrs
} yield John(id, name, street, house)
but you will need the plugin better-monadic-for as withFilter
isn't implemented for Dataset
.
EDIT: Author asked for a way to get the Dataset
of John
and Mary
at one go. We could zip
the inner arrays but this would require every element one array to have a corresponding element in the other array in the same order. We could also nest the flatMaps together but that would be equivalent to a cartesian join.
val dsMaryJohnZip1: Dataset[(Mary, John)] = dsHome.flatMap { case General(id, name, addrMs, addrJs) =>
addrMs.zip(addrJs).map { case (AddressMary(sM, hM), AddressJohn(sJ, hJ)) => (Mary(id, name, sM, hM), John(id, name, sJ, hJ)) }
}
val dsMaryJohnZip2: Dataset[(Mary, John)] =
for {
General(id, name, addrMs, addrJs) <- dsHome
(AddressMary(sM, hM), AddressJohn(sJ, hJ)) <- addrMs.zip(addrJs)
} yield (Mary(id, name, sM, hM), John(id, name, sJ, hJ))
val dsMaryJohnCartesian1: Dataset[(Mary, John)] = dsHome.flatMap { case General(id, name, addrMs, addrJs) =>
addrMs.flatMap { case AddressMary(sM, hM) =>
addrJs.map { case AddressJohn(sJ, hJ) =>
(Mary(id, name, sM, hM), John(id, name, sJ, hJ))
}
}
}
val dsMaryJohnCatesian2: Dataset[(Mary, John)] =
for {
General(id, name, addrMs, addrJs) <- dsHome
AddressMary(sM, hM) <- addrMs
AddressJohn(sJ, hJ) <- addrJs
} yield (Mary(id, name, sM, hM), John(id, name, sJ, hJ))
Source: stackoverflow.com
Related Query
- How can I create multiple Datasets with different class types from one general Dataset?
- Can I create a class with multiple dynamic methods with different return types?
- In SCALA, How can one construct case class from multiple tuples to handle 22+ fields in json?
- In Scala, how can I subclass a Java class with multiple constructors?
- How is an sbt task defined using <<= different from one defined with := that references another setting's .value?
- How can I create an instance of a Case Class with constructor arguments with no Parameters in Scala?
- How in Scala/Spark create excel file with multiple sheets from multiple DataFrame?
- In Scala, given a list of lists, how can I create one nested HashMap from the elements?
- In the scala spray framework, how can I create multiple http clients that have different configurations (e.g., timeouts, retries)
- How to use shapeless to copy fields from one class to another different class
- How to create a generic function that can be applied to two one or more types containing the same parameters?
- Scala. How to create general method that accept tuple with different arities?
- How to create a function that will only accept types that can be converted toJson with spray-json
- How to create a RPM using native packager from a project with multiple main methods?
- How to obtain DataFrame from the database table retrived with JDBC cut by the multiple date ranges with one date range per row in another DataFrame?
- How to join multiple columns from one DataFrame with another DataFrame
- I have multiple private repos in github, how can I place dependency from one to another?
- How can I use scala sources from a different location in a sbt project and also make it work with IntelliJ IDEA?
- How can I extract values from a String to create a case class instance in Scala
- How can I handle different raw types by only one class(covariance and contravariance)
- How to create a DataSet from RDD using Case Class with composed columns
- How can I map a case class with non-default field types to a table in Slick?
- How can I create A Class with Trait On Scala?
- How to parse JSON containing a property with different data types into a generic class in Scala?
- How to get an element from a List with multiple data types in Spark?
- How can one "associate" test scripts with a Class in Scala
- How can we create case class object from json in scala + play framework 2.0
- How to join multiple dataFrames in spark with different column names and types without converting into RDD
- How to chain multiple different InputStreams into one InputStream
- Scala. Can case class with one field be a value class?
More Query from same tag
- rename spark dataframe structType fields
- Stream data from flink to S3
- Strange exception in SBT test
- Iterator of repeated words in a file
- How can I add my parentheses back into my expression when printing?
- Scala value class that supports comparisons and maths operations
- Why do I get ACCESS_REFUSED using op-rabbit but not NewMotion/Akka?
- scalaz, read and map the lines of a file
- Which version of SBT ? scala2.13.3
- Scala How to pass by reference for accumulator?
- Slick: Combining SQL and Query API
- Scala - returning same type as passed as argument
- How do I raise failure with Scala Try
- Loop over listbuffer scala
- How to save or print receiverStream data in Spark scala
- How to access command line parameters in build definition?
- (Scala/PlayFramework) Mailgun mail gets truncated when using & in the body
- Explain a scala class and objects?
- Use custom version of the Scala standard library
- Scala <console>:1: error: ';' expected but '(' found
- How to create a list of all values of a column in Structured Streaming?
- Play 2 - Scala - Forms Validators and radio buttons
- I can't find implicit conversion special pattern with method arguments in Scala Specification
- Compiling Scala^Z3 on Windows
- Spark streaming from local file to hdfs. textFileStream
- Playframework 2.1 doesn't find javax.persistence and play.db
- Config key not found : Where does typesafe ConfigFactory.load look for config files in a scala multi-project?
- Scala - overhead of invoking 'first class functions'
- Capture and write string inside of dataframe using foreach row
- Automatic performance testing of Scala libraries