score:1
Have in mind that withColumn
method of DataFrame
could have performance issues when called in loop:
- Spark DAG differs with 'withColumn' vs 'select'
- There is even mentioned about that in https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame
this method introduces a projection internally. Therefore, calling it multiple times, for instance, via loops in order to add multiple columns can generate big plans which can cause performance issues and even StackOverflowException. To avoid this, use select with the multiple columns at once.
The safer way is to do it with select:
val monthsColumns = months.map { month:String =>
col("sal").as(month)
}
val updatedDf = df.select(df.columns.map(col) ++ monthsColumns: _*)
score:4
Yes , You can do the same using foldLeft.FoldLeft traverse the elements in the collection from left to right with the desired value.
So you can store the desired columns in a List(). For Example:
val BazarDF = Seq(
("Veg", "tomato", 1.99),
("Veg", "potato", 0.45),
("Fruit", "apple", 0.99),
("Fruit", "pineapple", 2.59)
).toDF("Type", "Item", "Price")
Create a List with column name and values(as an example used null value)
var ColNameWithDatatype = List(("Jan", lit("null").as("StringType")),
("Feb", lit("null").as("StringType")
))
var BazarWithColumnDF1 = ColNameWithDatatype.foldLeft(BazarDF)
{ (tempDF, colName) =>
tempDF.withColumn(colName._1, colName._2)
}
You can see the example Here
score:6
You can use foldLeft
. You'll need to create a List
of the columns that you want.
df.show
+---+----+----+
| id|name| sal|
+---+----+----+
| 1| A|1100|
+---+----+----+
val list = List("Jan", "Feb" , "Mar", "Apr") // ... you get the idea
list.foldLeft(df)((df, month) => df.withColumn(month , $"sal" ) ).show
+---+----+----+----+----+----+----+
| id|name| sal| Jan| Feb| Mar| Apr|
+---+----+----+----+----+----+----+
| 1| A|1100|1100|1100|1100|1100|
+---+----+----+----+----+----+----+
So, basically what happens is you fold the sequence you created while starting with the original dataframe and applying transformation as you keep on traversing through the list.
Source: stackoverflow.com
Related Query
- How to add multiple columns in a spark dataframe using SCALA
- How to break each rows into multiple rows in Spark DataFrame using scala
- Convert multiple columns into a column of map on Spark Dataframe using Scala
- How to check whether multiple columns values of a row are not null and then add a true/false resulting column in Spark Scala
- How do I explode multiple columns of arrays in a Spark Scala dataframe when the columns contain arrays that line up with one another?
- How to create new columns in dataframe using Spark Scala based on different string patterns
- how to convert rows into columns in spark dataframe using scala
- How to dynamically add columns based on source columns in spark scala dataframe
- How to add new columns to a dataframe in a loop using scala on Azure Databricks
- How to extract data from a single cell into multiple columns using scala in Spark
- Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names
- How to groupBy using multiple columns in scala collections
- Dropping multiple columns from Spark dataframe by Iterating through the columns from a Scala List of Column names
- Converting multiple different columns to Map column with Spark Dataframe scala
- Count instances of combination of columns in spark dataframe using scala
- Adding new Columns based on aggregation on existing column in Spark DataFrame using scala
- Scala - Spark - How to transform a dataframe containing one string column to a DF with columns with the rigth type?
- Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame
- Dynamically select multiple columns while joining different Dataframe in Scala Spark
- Perform multiple aggregations on different columns in same dataframe with alias Spark Scala
- Spark Scala Dataframe How to create new column with two or more existing columns
- How to drop multiple columns from JSON body using scala
- Add new rows in the Spark DataFrame using scala
- Add a row to a empty dataframe using spark scala
- how add a row to dataframe spark scala with different schema?
- How do parse fixed-position file with multiple sections in Spark using Scala
- How to add column in Dataframe base on the value of other Columns spark
- How to read a fixed length file in Spark using DataFrame API and SCALA
- How to compare two columns data in Spark Dataframes using Scala
- Convert DataType of all columns of certain DataType to another DataType in Spark DataFrame using Scala
More Query from same tag
- IntelliJ Unable to Compile Scala Slick CaseClassShape
- Spray dependencies error
- Form library suited for Snap and Heist
- val evaluation while reading from stdio in scala
- How would I implement a fixed size List in Scala?
- Proxying sbt ivy repositories with artifactory
- Does val initialization always allocate memory in scala?
- Nested iteration over infinite streams in Scala
- Scala http client for large files
- Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame
- Is it possible to have multi project SBT with version.sbt for each subproject
- Spark SQL QUERY join on Same column name
- PairRDD, initialise variable once per key
- Why does the code to model UNTIL/REPEAT require new keyword?
- How do I troubleshoot an https connection problem in a scala call?
- Create a HashMap in Scala from a list of objects without looping
- Can I get Char iterator that have bigger sliding window? java/scala
- Spark Cassandra Connector: Implement SCD Type 1
- Using Scala's worksheet, clarification needed
- ParHashMap.map method with function call
- LALR(1) parser generator for scala
- Does sharing of immutable dataset among actors create duplicate dataset underneath?
- Call method only once and apply filter to save result in different variables in scala
- Scala - Count the number of occurrences of every key in an Iterator
- Scala Regex for less than equal to operator (<=)
- how to use typesafe config library just render file content?
- SBT won't resolve fakehttpserver dependency because of bad commons pom file
- How to get creation date of a file using Scala
- Get function value of a instance method in Scala
- Equivalence lock in Scala/Java?