score:0
The following methods generates the valid and invalid dataframes using the row_number function provided by spark-sql. I dont have access to cassandra so I am using a simple Dataframe here.
import sqlContext.implicits._
val df = sc.parallelize(Seq(("a" -> 1), ("b" -> 2), ("c" -> 3), ("d" -> 4), ("a" -> 5), ("a" -> 6), ("c" -> 7), ("c" -> 8))).toDF("c1", "c2")
df.registerTempTable("temp_table")
val masterdf = sqlContext.sql("SELECT c1, c2, ROW_NUMBER() OVER(PARTITION BY c1 ORDER BY c2) as row_num FROM temp_table")
masterdf.filter("row_num = 1").show()
+---+---+-------+
| c1| c2|row_num|
+---+---+-------+
| a| 1| 1|
| b| 2| 1|
| c| 3| 1|
| d| 4| 1|
+---+---+-------+
masterdf.filter("row_num > 1").show()
+---+---+-------+
| c1| c2|row_num|
+---+---+-------+
| a| 5| 2|
| a| 6| 3|
| c| 7| 2|
| c| 8| 3|
+---+---+-------+
Source: stackoverflow.com
Related Query
- Issue with dropDuplicates() and except() method in Spark using Scala
- Attach column names to elements with Spark and Scala using FlatMap
- Spark CSV issue with new line (LF) character in the field of file imported using scala
- Compatibility issue with Scala and Spark for compiled jars
- for loop into map method with Spark using Scala
- Compare rows of an array column with the headers of another data frame using Scala and Spark
- Log4j vulnerability while using Scala and Spark with sbt
- Split Map type column with huge values into multiple rows using Scala and Spark
- flattening of nested json using spark scala creating 2 column with same name and giving error of duplicate in Phoenix
- Predict and accuracy using neural network with Scala spark
- Loading and Parsing JSON with Spark using Scala Jackson library
- Issue in inserting data to Hive Table using Spark and Scala
- Using Scala and Spark Apply "like" operator with multiple string (Seq or Array etc). on a DataFrame
- Compare a list values with case class using Scala and Spark
- Read a CSV file with , as delim and numeric data also contain , separator to create RDD in Spark using Scala
- Converting a dataframe column with values to a list using spark and scala
- Difference between using App trait and main method in scala
- Using Scala 2.12 with Spark 2.x
- How to stub a method call with an implicit matcher in Mockito and Scala
- Why do Scala 2.11 and Spark with scallop lead to "java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror"?
- Configuring Apache Spark Logging with Scala and logback
- Issue with VectorUDT when using Spark ML
- Packaging and Running Scala Spark Project with Maven
- Problems using protobufs with java and scala
- Using AndroidAnnotations with Scala and Gradle
- Multiple constructors with the same number of parameters exception while transforming data in spark using scala
- Simplest method for text lemmatization in Scala and Spark
- How to develop and run spark scala from vscode using sbt or Metals
- Strange deserializing problems with generic types using Scala and Jackson and java.lang.Integer or scala.Int
- Using aws credentials profiles with spark scala app
More Query from same tag
- Meaning of super in stacked traits depends on call site?
- spark-core 1.6.1 & lift-json 2.6.3 java.lang.NoClassDefFoundError
- Why are so many new languages written for the Java VM?
- Macro compilation error on non-trivial lambda
- scala algebraic datatypes get member values
- Lettuce 6.0.1.RELEASE deprecated .withPassword
- Nil and List as case expressions in Scala
- Unnecessary recompilations by SBT
- Which one is better? To test by data or test by algorithm?
- Scala Play - Some unresolved dependencies have extra attributes
- How to open TCP connection with TLS in scala using akka
- Reading file from conf with Scala and Play! 2.5
- Scala broadcast join with "one to many" relationship
- How to use phantom in scala main object
- cannot find class manifest for element type T
- Scala: How to multiply a List of Lists by value
- Convert CSV to RDD and read with Spark/Scala
- What's the best way to pass field arguments (e.g. paging parameters) to a deferred `Fetcher`?
- Why does custom DefaultSource give java.io.NotSerializableException?
- Compiler complains about class that extends DelayedInit not defining delayedInit method
- How to get only "p" tags inside "small single" article?
- Jackson & Scala: How to get property value from a list of objects by property value?
- $and in reactive mongo with play framework 2.6
- Type mismatch when using map on a zipped list in Scala
- Cannot deploy compiled scala play application with sub projects to Heroku
- REmote LIft Actors
- Generate Gatling requests from dynamic data
- What advantages does Scala have over Java for concurrent programming?
- Why are some things defined twice in Scala's libraries?
- what is the meaning of the implicit def function