score:2

Accepted answer

here is a solution

val df = seq("1->2->3" -> "2", "1->2->3" -> "4").todf("path", "dst") 
val dstnotinpath = df.filter(!array_contains(split($"path", "->"), $"dst"))

dstnotinpath.show

dstnotinpath: org.apache.spark.sql.dataset[org.apache.spark.sql.row] = [path: 
string, dst: string]
+-------+---+
|   path|dst|
+-------+---+
|1->2->3|  4|
+-------+---+

now, let's learn how to fish:

  1. create a suitable dataframe (it wasn't mentioned in your question)
  2. define what you want to do, with typing:
  • split a string column into an array of strings
  • test if this array contains the value of another column
  • invert test (and filter)
  1. look on internet of each of these operations
  • split string into array -> i found the split method, and an example of how to use it
  • test if array contains: found array_contains and an example
  • invert test was as you mentioned

note:

  • here i use the "dataframe api", where i declare symbolically what operations on the dataframe columns need to be done. this is recommended for such operations (filtering on a type dataframe = dataset[row] objects)
  • you use the "rdd api" where you apply a scala function on each row type entry of the dataframe. it means that the function is serialized, send to each worker, and executed there on the java/scala row instances.

score:1

you can write it like this:

df2.filter(v=> !v.getas[string]("path").split("->").contains(v.getas[string]("dst")))

or solution with dataframe api:


    df.withcolumn("splitted_path",functions.split($"path", "->"))
      .withcolumn("filter_c",array_contains($"splitted_path", $"dst"))
      .where(!$"filter_c")
      .drop("splitted_path","filter_c")


Related Query

More Query from same tag