score:0
The withColumn() option of adding new column will work on entire data set. And if you have more columns, it makes things even worse. You can use Spark SQL and have a query in SQL style to add new columns. This will need more sql skills than just spark. And with 100 columns, may be the maintenance would be tough.
You can follow another approach.
You can convert an rdd into dataframe. Then use map on the data frame and process each row as you wish. Inside map method,
a. you can gather new values based on the calculations
b. Add these new column values to main rdd as below
val newColumns: Seq[Any] = Seq(newcol1,newcol2)
Row.fromSeq(row.toSeq.init ++ newColumns)
Here row, is the reference of row in map method
c. Create new schema as below
val newColumnsStructType = StructType{Seq(new StructField("newcolName1",IntegerType),new StructField("newColName2", IntegerType))
d. Add to the old schema
val newSchema = StructType(mainDataFrame.schema.init ++ newColumnsStructType)
e. Create new dataframe with new columns
val newDataFrame = sqlContext.createDataFrame(newRDD, newSchema)
Source: stackoverflow.com
Related Query
- Mapping simple values from a Map to a spark DataFrame error
- Spark DataFrame map error
- Encoder error while trying to map dataframe row to updated row
- Error when Spark 2.2.0 standalone mode write Dataframe to local single-node Kafka
- Converting multiple different columns to Map column with Spark Dataframe scala
- Spark joinWithCassandraTable() on map multiple partition key ERROR
- spark dataframe explode function error
- Spark Scala Dataframe convert a column of Array of Struct to a column of Map
- Spark Scala Serialization Error from RDD map
- Scala Spark - empty map on DataFrame column for map(String, Int)
- Spark - convert Map to a single-row DataFrame
- Spark Dataframe : Accessing next record in map function
- Spark SQL Map only one column of DataFrame
- spark dataframe map aggregation with alias?
- Spark 2.0 DataSourceRegister configuration error while saving DataFrame as cvs
- How to retain the column structure of a Spark Dataframe following a map operation on rows
- Map in a spark dataframe
- java.lang.String is not a valid external type for schema of int error in creating spark dataframe
- Scala - Spark Dataframe - Convert rows to Map variable
- Convert multiple columns into a column of map on Spark Dataframe using Scala
- Spark map dataframe using the dataframe's schema
- Create a map to call the POJO for each row of Spark Dataframe
- create map from dataframe in spark scala
- Access Spark Dataframe field with a Map Type
- Spark dataframe to nested map
- Spark - How to convert map function output (Row,Row) tuple to one Dataframe
- Aggregation on an array of structs in a map inside a Spark dataframe
- Scala Spark - Map function referencing another dataframe
- How to merge the value of several columns into a map in Spark Dataframe
- How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model
More Query from same tag
- What does CoGroupedRDD do?
- Play Scala JSON Writes & Inheritance
- Should I use a regex or a .startsWith() for my string operation?
- How to sort array of struct type in Spark DataFrame by particular field?
- What does Preserve sharing means in lazy Streams?
- Replacement for setMaximumPoolSize on ForkJoinPool
- Mill Build Tool: How to run the tests of all Modules at once?
- How to know if a List is homogeneous
- Sbt run test in dependency
- Adding a column in Spark from existing column
- IntelliJ Scala SBT scala.slick package not found
- replacing strings inside df using dictionary scala
- Union can only be performed on tables with the compatible column types Spark dataframe
- How to embed Play 2.6 using Akka HTTP Server?
- Can we create auxiliary constructor in Scala with List?
- spark textfile load file instead of lines
- How to check if a class file version 50.0 (Java 6) has been preverified?
- Unable to create dataframe from RDD of Row using case class
- Making a folder/files and writing to them in scala?
- Scala Custome Typed Class
- How to check constructor arguments and throw an exception or make an assertion in a default constructor in Scala?
- Scala: how to define an abstract copyable superclass for any case class?
- Is it possible to get the type of a type field after creating an object?
- SBT `session save` - [info] No session settings defined
- Is there a way to refactoring this match case function?
- How to find sum/avg of sparkVector element of a DataFrame in Spark/Scala?
- Spark Merge schema on write - getting error: "The column number of the existing table doesn't match the data
- Spark read job from gcs object stuck
- How to get x values as a pair on a dataframe in scala?
- Simple feeder in Gatling without using a csv file