score:4
Accepted answer
you don't even need to use an udf for that :
val df3 = df1.as('a).join(df2.as('b), $"a.id" === $"b.id","left").
select(df1("id"),'x,'y,(coalesce('x, lit(0)) + coalesce('y, lit(0))).alias("x_plus_y")).na.fill(0)
df3.show
// df3: org.apache.spark.sql.dataframe = [id: int, x: int, y: int, x_plus_y: int]
// +---+---+---+--------+
// | id| x| y|x_plus_y|
// +---+---+---+--------+
// | 1| 10| 0| 10|
// | 2| 20|200| 220|
// | 3| 30|300| 330|
// | 4| 40|400| 440|
// +---+---+---+--------+
score:0
in scala noticed this solution,
val d = sqlcontext.sql("""
select df1.id, x, y from df1 left join df2 on df1.id=df2.id""").na.fill(0)
to join the frames and replace non available values with zeroes, and then define this udf,
import org.apache.spark.sql.functions
import org.apache.spark.sql.functions._
val plus: (int,int) => int = (x:int,y:int) => x+y
val plus_udf = udf(plus)
d.withcolumn("x_plus_y", plus_udf($"x", $"y")).show
+---+---+---+--------+
| id| x| y|x_plus_y|
+---+---+---+--------+
| 1| 10| 0| 10|
| 2| 20|200| 220|
| 3| 30|300| 330|
| 4| 40|400| 440|
+---+---+---+--------+
score:4
df3 = df1.join(df2, df1.id == df2.id, "left_outer").select(df1.id, df1.x, df2.y).fillna(0)
df3.select("id", (df3.x + df3.y).alias("x_plus_y")).show()
this works in python.
Source: stackoverflow.com
Related Query
- Spark sum columns from different dataframes
- How to get date from different year, month and day columns in spark (scala)
- How to read from textfile(String type data) map and load data into parquet format(multiple columns with different datatype) in Spark scala dynamically
- Compare two columns from different DF, Spark scala
- Get the elements from different arraytype columns and build a column with heterogeneous data in Spark
- Multiplying two columns from different data frames in spark
- Compare columns from two different dataframes based on id
- Merge two columns of different DataFrames in Spark using scala
- Spark & Scala : How can I replace values in Dataframes in different columns
- Bind columns of 2 different dataframes spark
- Derive multiple columns from a single column in a Spark DataFrame
- Select Specific Columns from Spark DataFrame
- Joining two DataFrames in Spark SQL and selecting columns of only one
- Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column
- Create array of literals and columns from List of Strings in Spark
- Dropping multiple columns from Spark dataframe by Iterating through the columns from a Scala List of Column names
- Converting multiple different columns to Map column with Spark Dataframe scala
- Get value from a map for a column value as a key in spark dataframes
- Reading JSON files into Spark Dataset and adding columns from a separate Map
- Sort Spark Dataframe with two columns in different order
- How to remove backslash from all columns in a Spark dataframe?
- Why does Spark ML NaiveBayes output labels that are different from the training data?
- How to explode StructType to rows from json dataframe in Spark rather than to columns
- Create a map column in Apache Spark from other columns
- Sum vector columns in spark
- Finding the difference of two columns in Spark dataframes and appending to a new column
- Spark Dataframes join with 2 columns using or operator
- Sum up into a new column from a range of values of a column in Spark using Scala
- Spark (scala) dataframes - Check whether strings in column contain any items from a set
- Selecting several columns from spark dataframe with a list of columns as a start
More Query from same tag
- Filtering a DataFrame on date columns comparison
- Exclude column header when writing DataFrame to json
- How to convert Spark's DataFrame to nested DataFrame
- How to access actor created in a websocket?
- An example of functional programming in scala
- Why is object creation fast in Scala?
- JSON to RDD of array (Spark Scala)
- Bind object's fields/methods
- Scala JSON to DenseMatrix
- Play framework localisation is not working in production
- Runtime of tail and init on List
- Serialize a Map[String, Any] to JSON where Any includes BigDecimal
- Compilation error on insert-or-update action in Quill when using H2 database
- fs2 Stream from queue doesn't consume elements
- Slick 3.1.x CRUD: how to extract the inserted row id?
- How to decorate an immutable object graph from scala case classes
- Scala How to match two list to a Map or tuple
- Is the actor model limited to specific languages?
- Unable to match a parameterized type with a concrete type after pattern-matching
- What is meant by not generating the answer lazily in this code?
- Reading Scala In Example, trying to understand the philosophy behind the examples
- Convert try to option without losing error information in Scala
- How to convert immutable Seq to mutable seq with until loop
- Free implementation in scalaz
- What datetime class should I use for my case classes?
- How to use UUID5 or UUID3 in Pyspark while adding a column?
- Not all akka stream Sinks receive the emitted data
- Remove least used word from document in Spark
- Lift framework. Using in enterprise projects
- Populate a Vector in Scala