score:4
An ideal approach would be to read entire dataframe as Binary (Array[Byte]) data type, and then casting corresponding values to their compatible data types, however, Spark does not allow to read Double Data type as Binary data type. Therefore could not proceed with this approach.
A hack could be, to set Spark property "spark.sql.files.ignoreCorruptFiles" to true and then read the files with the desired schema. Files that don’t match the specified schema are ignored. The resultant dataset contains only data from those files that match the specified schema. Thus read two dataframe one with String data type and other with Double data type and then cast any one of them to a single data type and then finally union them.
val stringSchema = StructType(StructField("final_height", StringType, false) :: Nil)
val doubleSchema = StructType(StructField("final_height", DoubleType, false) :: Nil)
spark.conf.set("spark.sql.files.ignoreCorruptFiles", "true")
val stringDF = spark.read.schema(stringSchema).parquet("path/")
val doubleDF = spark.read.schema(doubleSchema).parquet("path/")
//Cast to compatible type
val doubleToStringDF = doubleDF.select(col("final_height").cast(StringType))
val finalDF = stringDF.union(doubleToStringDF)
Source: stackoverflow.com
Related Query
- Different types of data in a same column in Parquet partition
- When create two different Spark Pair RDD with same key set, will Spark distribute partition with same key to the same machine?
- Scala: Two methods, different parameter types but same code: How to unify?
- Scala Array with different data types
- Shapeless align with different types but same labels
- Is it possible to read multiple csv files with same header or subset of header in same or different order into spark data frame?
- Spark - creating schema programmatically with different data types
- Spark : Parse a Date / Timestamps with different Formats (MM-dd-yyyy HH:mm, MM/dd/yy H:mm ) in same column of a Dataframe
- How to combine the same function for different types of argument in scala
- How to read from textfile(String type data) map and load data into parquet format(multiple columns with different datatype) in Spark scala dynamically
- Data queried from Cassandra cannot be filtered on same column again (InvalidQueryException)
- Scala trait same method and argument with different return types
- scala: union of two maps whose key type is the same and whose value type is a collection of elements, but whose types are different
- Scala: HashMap with different data types for different keys possible?
- How to make it so that dependent types in two different traits are recognized as the same type
- Spark - remove special characters from rows Dataframe with different column types
- Replace two different column values in a dataframe using same condition with minimum complexity in scala
- Bitwise operations on different sized data types
- HBase storing data for a particular column with 2 or more values for the same row-key in Scala/Java API
- Scala Spark - copy data from 1 Dataframe into another DF with nested schema & same column names
- How to use same windows partition over different analytical function in optimal way?
- Spark/Scala repeated creation of DataFrames using the same function on different data subsets
- Multiple methods with same functionality but different types
- How to keep keep original column after applying data validation in same column
- Using Parboiled to parse different input types with same separator
- Get the elements from different arraytype columns and build a column with heterogeneous data in Spark
- Get average length of values of a column (from a hive table) in spark along with data types
- Read elements of different types at the same time
- How to assign the same value for a specific column to all rows in a Window Partition
- How to split a List to be added to a Map in Scala with Different Data Types
More Query from same tag
- Trouble with encapsulating "recursive types" in Scala
- Scala Singleton Object with Multi-threading
- Aux Pattern for higher-kinded types
- How to detect untracked future?
- Converting nested Scala type to Java types
- Specifying flywayUrl through system property in SBT
- Split string to a 2-dimensional array in Scala
- Play 2.1: Logger doesn't work
- Spark RDD tuple transformation
- How to apply pattern matching in this case
- (Akka HTTP) When I send an .XLSX file to the user as Array [bytes] then the user get the folder
- Sending jsonPath by using zeebe input/output mapping
- How to parse to a type in Scala
- Calculations before extending class
- Why does Scala script not find classes despite setting CLASSPATH or using -classpath option?
- Using Scala reflection with Java reflection
- Scala Slick 3.0.0 Strange Error
- High performing set like data structure for array of ints for java
- How to set array of records Using GenericRecordBuilder
- Slick code generation for only a single schema
- Using Scala lists in Spark SQL queries
- How to remove emoji from tweets in Scala
- prefix span output formatting
- Running scala futures somewhat in parallel
- Map JSON to nested case class play framework
- remote akka actor error to send serializable messages
- As I am defining classes in scala, The controls is then not going inside the main function. What to do?
- how to compile and run one single scala file without building whole project within intellij
- How can I omit case class fields in a slick table mapping?
- How can I express in Scala class configuration option?