score:2
Accepted answer
here is a solution
val df = seq("1->2->3" -> "2", "1->2->3" -> "4").todf("path", "dst")
val dstnotinpath = df.filter(!array_contains(split($"path", "->"), $"dst"))
dstnotinpath.show
dstnotinpath: org.apache.spark.sql.dataset[org.apache.spark.sql.row] = [path:
string, dst: string]
+-------+---+
| path|dst|
+-------+---+
|1->2->3| 4|
+-------+---+
now, let's learn how to fish:
- create a suitable dataframe (it wasn't mentioned in your question)
- define what you want to do, with typing:
- split a string column into an array of strings
- test if this array contains the value of another column
- invert test (and filter)
- look on internet of each of these operations
- split string into array -> i found the
split
method, and an example of how to use it - test if array contains: found
array_contains
and an example - invert test was as you mentioned
note:
- here i use the "dataframe api", where i declare symbolically what operations on the dataframe columns need to be done. this is recommended for such operations (filtering on a
type dataframe = dataset[row]
objects) - you use the "rdd api" where you apply a scala function on each
row
type entry of the dataframe. it means that the function is serialized, send to each worker, and executed there on the java/scalarow
instances.
score:1
you can write it like this:
df2.filter(v=> !v.getas[string]("path").split("->").contains(v.getas[string]("dst")))
or solution with dataframe api:
df.withcolumn("splitted_path",functions.split($"path", "->"))
.withcolumn("filter_c",array_contains($"splitted_path", $"dst"))
.where(!$"filter_c")
.drop("splitted_path","filter_c")
Source: stackoverflow.com
Related Query
- Filtering out data in Spark dataframe in Scala
- How Scala dataframe filtering OUT?
- how to make a dataframe out of an array of case class in scala
- how to filter out a null value from spark dataframe
- How to convert Row of a Scala DataFrame into case class most efficiently?
- How to compare two dataframe and print columns that are different in scala
- How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?
- scala - Spark : How to union all dataframe in loop
- Filtering rows based on column values in spark dataframe scala
- How to match Dataframe column names to Scala case class attributes?
- Scala Spark : How to create a RDD from a list of string and convert to DataFrame
- How to find out what implicit(s) are used in my scala code
- How do I groupby and concat a list in a Dataframe Spark Scala
- How to write a DataFrame schema to file in Scala
- How to read a checkpoint Dataframe in Spark Scala
- How to create a sample dataframe in Scala / Spark
- How to list out all the files in the public/images directory in a Play Framework 2 Scala application?
- Spark, Scala - How to get Top 3 value from each group of two column in dataframe
- How to create a distributed sparse matrix in Spark from DataFrame in Scala
- How merge three DataFrame in Scala
- Scala - Spark - How to transform a dataframe containing one string column to a DF with columns with the rigth type?
- How To Mock Out KafkaProducer Used Inside a Scala Class
- About how to add a new column to an existing DataFrame with random values in Scala
- How to add multiple columns in a spark dataframe using SCALA
- How to convert DataFrame to DynamicFrame object in Scala
- How to factor out main method of Scala objects into a shared location?
- How to write partitioned DataFrame out without partition prefix in the path?
- How to break each rows into multiple rows in Spark DataFrame using scala
- Using Scala how to find out if a list of chars are present in a String, where there are duplicate characters?
- How to validate date format in a dataframe column in spark scala
More Query from same tag
- Reader Writer Lock using Actor Model in Akka
- How to include war in play 2 app?
- scala-spark: How to filter RDD after groupby
- Scala recursive types
- How to flatten a case class with a list value to another case class properly with scala
- Create a HashMap from an Array in Scala
- Kafka - patterns for handling consumer unable to process messages
- How do I execute Scrooge from commandline?
- Handling chunked requests in Scalatra
- Scala.js union types
- Operating on Scala collections in generic way
- Scala can't find method even after import
- Import Scala from Groovy
- reduce a list in scala by value
- Using breeze in spark
- Name and location of snippet for Menu with param in LiftWeb?
- Type parameters and inheritance in Scala
- How can I access a resource when running an SBT runTask?
- compute daily log returns for a [saddle] Series[DateTime, Double]
- Fractional part is removed after load data from Teradata to Spark
- hot swap in sbt project without play-plugin
- Deleting temp file in a Spark datasource
- Apache Spark 1.3 dataframe SaveAsTable database other then default
- Dynamically parametrize Poly1 function in shapeless
- Lift passing data between snippets
- Scala Transform DF into RDD - error java.lang.NumberFormatException: For input string: "age"
- How to Fix Ambiguous Reference to Overloaded apply() method when deserializing Json
- Equals free function definition in scala
- Dynamically select multiple columns while joining different Dataframe in Scala Spark
- [GAE Java]: WARNING: Not using any credentials