score:6
Accepted answer
using a leftanti join
would be a possible solution. this will remove rows from the left table that are present in the right table for the given key.
table1.join(table2, seq("user_id", "item_id"), "leftanti")
Source: stackoverflow.com
Related Query
- How to implement `except` in Apache Spark based on subset of columns?
- How to return a subset of the DataFrame’s columns based on the column dtypes in Spark Scala
- How to subtract DataFrames using subset of columns in Apache Spark
- How to get Histogram of all columns in a large CSV / RDD[Array[double]] using Apache Spark Scala?
- How to sort the data on multiple columns in apache spark scala?
- In Apache Spark DataFrame, how to drop all columns where all non None values are identical?
- how to order spark RDD based on two columns
- Apache Spark - how to create difference columns for every column in dataframe?
- How to drop duplicate columns based on another schema in spark scala?
- How to partition a dataframe on multiple columns and write the output to xlsx in Apache Spark
- How to split data into series based on conditions in Apache Spark
- How to deploy TypeSafe Activator based application to an Apache Spark cluster?
- How to create new columns in dataframe using Spark Scala based on different string patterns
- Select columns from a dataframe into another dataframe based on column datatype in Apache Spark Scala
- How to get Running sum of based on two columns using Spark scala RDD
- Spark - how to get all relevant columns based on ambiguous names
- How to make new rows and columns based on array value of row Spark DataFrame
- How to select specific columns from Spark DataFrame based on the value of another column?
- How to dynamically add columns based on source columns in spark scala dataframe
- How to filter columns in one table based on the same columns in another table using Spark
- How to perform aggregation (sum) on different columns and group the result based on another column of a spark dataframe?
- Apache Spark / Scala: How to get all the elements of an array except the last one?
- How to compare a column with the columns in the same dataframe in apache spark
- Apache Spark RDD : How to get latest data based on Paired RDD key and value
- Optimal way to create a ml pipeline in Apache Spark for dataset with high number of columns
- How can I connect to a postgreSQL database into Apache Spark using scala?
- How to convert a dataframe to dataset in Apache Spark in Scala?
- How do I iterate RDD's in apache spark (scala)
- How to use s3 with Apache spark 2.2 in the Spark shell
- How to count number of columns in Spark Dataframe?
More Query from same tag
- Stackless Scala With Free Monads, complete example
- Formatting binary values in Scala
- How do I add new value to already existing JSON file?
- How can I see in what [Java/Scala?] code does Scala compiler rewrites original Scala-code
- Scala adding immutable list to mutable one
- Why does Functor[A => ?] not compile in Scala 2.11.7?
- Use Scala (or java) method in Gradle build script
- Apache Spark - Unable to understand scala example
- Idiomatic Scala: Map with value type Option
- Filter heterogeneous list for type
- How to convert IPv6 Addresses to and from BigInteger in Scala
- Are spark.streaming.backpressure.* properties applicable to Spark Structured Streaming?
- How can I convert a Java Iterable to a Scala Iterable?
- Find query result to List
- Emacs Ensime-sbt not responding to enter after I try to use RUN command
- In Spark Json to Csv converting?
- How can I use snowflake jar in Bitnami Spark Docker container?
- Create idea project using gen-idea
- Play Framework 2.5.x: Inject Environment in a Module
- Read files recursively in scala
- `++` - Operator on two Arrays returns ArraySeq when using type parameters
- Alternative to deprecated java.sql.Date for Spark DataFrame
- Using sealed trait as a key for a map
- scala save to array with format
- twitterStream not found
- Akka HTTP server receives file with other fields
- Switching on Strings
- Scala/Play: load template dynamically
- Tuple seen as Product, compiler rejects reference to element
- How to convert DataFrame of string to a Dataframe of defined schema