score:3
Accepted answer
you can use window
function with filter
on count
as below
val df = seq(
(1, 1, 2, "tom"),
(1, 1, 2, "tim"),
(1, 3, 2, "tom"),
(2, 1, 2, "mary")
).todf("id1", "id2", "id3", "value")
val window = window.partitionby("id1", "id2", "id3")
df.withcolumn("count", count("value").over(window))
.filter($"count" < 2)
.drop("count")
.show(false)
output:
+---+---+---+-----+
|id1|id2|id3|value|
+---+---+---+-----+
|1 |3 |2 |tom |
|2 |1 |2 |mary |
+---+---+---+-----+
Source: stackoverflow.com
Related Query
- Spark: remove all duplicated lines
- Remove all records which are duplicate in spark dataframe
- How to remove backslash from all columns in a Spark dataframe?
- Remove consecutive spaces in RDD lines in Spark
- Reading first N lines of all files via Spark Scala
- Spark Dataframe has duplicated values in all rows when adding column with random double
- remove duplicated columns in Spark using SQL expression
- Spark : remove duplicated rows with different values but keep only one row for distinctive row
- How to remove all characters that start with "_" from a spark string column
- Spark : remove duplicated rows with different values using groupBy
- Spark , Scala: How to remove empty lines either from Rdd or from dataframe?
- Remove spaces from all columns using spark
- Using .join to remove certain lines in a db in spark scala
- spark - How to effectively remove rows from dataframe where all chosen columns contain nulls?
- Not able to remove folder name inspite of delting all files inside it in spark
- How to find duplicated columns with all values in spark dataframe?
- Remove all files with a given extension using scala spark
- remove a column from a dataframe spark
- Read All Lines of BufferedReader in Scala into a String
- Spark shell command lines
- Remove Temporary Tables from Apache SQL Spark
- Scala - Iterator over all the lines in the files in a directory
- Read all Parquet files saved in a folder via Spark
- scala Remove (in place) all elements of a ListBuffer that meet a condition
- scala - Spark : How to union all dataframe in loop
- Use Spark to list all files in a Hadoop HDFS directory?
- Hive partitioned table reads all the partitions despite having a Spark filter
- Scala tool to remove all unused code
- Replacing whitespace in all column names in spark Dataframe
- How to make Spark session read all the files recursively?
More Query from same tag
- How to group by on a field inside an array of an array of records?
- Add stream values to existing Map in Scala
- How to get value from JsValue?
- Add a folder to the java.library.path in play framework using JNI
- Reflection on a Scala case class
- Scala type class to extend generic type: No implicits found for parameter
- Debugging Scala sequences in IDE?
- Define a static variable in a function like c++
- How to use Scala case class from another Object
- Can't disable the logs Spark eclipse scala
- How do you delete a directory before running in ScalaTest/sbt/Intellij?
- "conditionalZip" operator in Akka Streams
- Injecting PlaySlick database connection in ScalaTest
- Scala multiple type pattern matching
- Gradle and Specs2 : Specs2-HtmlOutput
- method overloading with a parameterized type
- Getting the largest integer value from many lists
- Spark: How to merge json objects into array
- Can't generate unpickler in class hierarchy when passing object params to parent
- Java calling Scala case class w/ implicit parameter?
- Does Scio TypeSafe BigQuery support with clauses
- Parsing list of case classes
- Scala return object of unknown type
- Unable to load a Suite class. This could be due to an error in your runpath. Missing class com.serviceyou.com.tests.ServiceYouTest
- Scala: handle different constructor when extending a parameterised Class
- recursive arithmetical operations with scala
- Strange error when using processing library from maven central in a sbt project
- Yield only if pattern match
- Scala Pattern Matching on Different type of Seq
- kafka json array consumer scala