score:3
here's something that produces the pairs (and removes repeated ones). i couldn't work out how to use compactbuffer
so it uses arraybuffer
, since the source for compactbuffer says it's a more efficient arraybuffer
. you may need to convert your compactbuffer
in the flatmap
to something that supports .combinations
.
object sparkapp extends app {
import org.apache.spark.sparkcontext
import org.apache.spark.sparkcontext._
import org.apache.spark.sparkconf
import org.apache.spark.rdd.rdd
import scala.collection.mutable.arraybuffer
val data = list(
arraybuffer("person2", "person5"),
arraybuffer("person2", "person5", "person7"),
arraybuffer("person1", "person5", "person11"))
val conf = new sparkconf().setappname("spark-scratch").setmaster("local")
val sc= new sparkcontext(conf)
val datardd = sc.makerdd(data, 1)
val pairs = datardd.flatmap(
ab => ab.combinations(2)
.flatmap{case arraybuffer(x,y) => list((x, y),(y,x))}
).distinct
pairs.foreach (println _)
}
output
(person7,person2)
(person7,person5)
(person5,person2)
(person11,person1)
(person11,person5)
(person2,person7)
(person5,person7)
(person1,person11)
(person2,person5)
(person5,person11)
(person1,person5)
(person5,person1)
Source: stackoverflow.com
Related Query
- Spark scala : iterable to individual key-value pairs
- Convert a scala list with key value pairs into a spark data frame using only the values
- How do I return Multi-Column Key and Value pairs in Scala using Spark
- Spark Scala match error when comparing values between 2 key value pairs
- Scala spark reduce by key and find common value
- Making key value pairs from an HDFS sequence file using Apache Spark
- How can I add values in key value pairs generated in scala
- Multiple key value pairs from a single key,value using Spark Transformation functions
- dataframe columns as key and column data as value group by id in spark scala
- How to convert a SQL query output (dataframe) into an array list of key value pairs in Spark Scala?
- Fetch the partial value from a column having key value pairs and assign it to new column in Spark Dataframe
- Getting all key value pairs having the maximum value from a Scala map
- Spark key value pairs to string
- Increment value count of key-value pairs in spark scala dataframe
- Reading Key-Value pairs in a text file, key as column names and values as rows using Scala and Spark
- scala creating key value pairs from textfile with multiple entries for values
- Extract Key Value Pairs from Input Using Scala, Spark
- How to fill column with value taken from a (non-adjacent) previous row without natural partitioning key using Spark Scala DataFrame
- Generate key value pairs from spark dataframe or RDD with column name present in key
- Spark read key value pairs from the file into a Dataframe
- Key Value List from File In Spark Scala
- Comparing Two Iterable Strings in Spark scala Over Compact Buffer from common Key
- Create separate columns for key, value pairs contained in two columns of Spark Dataframe in Scala
- splitting string column into multiple columns based on key value item using spark scala
- How to process a nested Key Value Pair in Spark / Scala data import
- Flatten _metadata and extract _id key value in Spark scala from ES
- how to parse a custom log file in scala to extract some key value pairs using patterns
- Convert scala string to key value pairs
- read a file in scala and get key value pairs as Map[String, List[String]]
- Getting the sum and pick the max key value pair with out parallelize in spark scala
More Query from same tag
- How can I convert scala.xml.Elem to something compatible with the javax.xml APIs?
- Arbitrary Function - Generate return type according to input
- How to handle double in json in scala
- Closing over the loop variable in Scala
- ScalaTest how to exclude MustMatchers if multiple Matchers are mixed in
- How do I convert a Scala Double into a binary 64-bit String
- How to efficiently divide each value with the sum of the values in the same group?
- overloaded methods in Trait error Spark Scala
- Scala Using regular expressions to extract values from a file as a one liner
- Convert a DataFrame to RDD and Split the RDD into the same number of Columns as DataFrame Dynamically
- Could not prove that Long :: String :: Option[Long] :: Option[java.util.UUID] :: shapeless.HNil can be converted to/from SimpleMp4BoxHeader
- What is the difference between "class C extends A with B" and "class C extends B" when trait B extends trait A
- Composing `Future` result in Play Framework with Scala
- hi I'm learning scala and trying to split an array of strings but I am getting a mismatch error could someone please help me?
- Consider items of the same value when deciding rank
- Overriding a method and adding implicit parameter
- UnsatisfiedLinkError: no snappyjava in java.library.path when running Spark MLLib Unit test within Intellij
- Finds all the documents were the an array field contains a document that match some conditions
- How to see dependency tree in sbt?
- How do I convert an Array[String] to a Set[String]?
- Is it possible to use 'yield' to generate 'Iterator' instead of a list in Scala?
- What does the "not a simple type" warning mean in Scala?
- How can I convert an arbitrary number of columns in a Spark dataframe from Timestamps to Longs?
- Azure Synapse - How to stop an Apache Spark application / notebook?
- Use mocked function return value in real function call
- Higher order function in scala
- scala persisting seq with Options
- Make CRUD operations with ReactiveMongo
- Rendering an "Inter-"CategoryMarker in JFreeChart
- Intellij IDEA 14.1: Play Framework 2.3.8 Scala project import issue