score:3

here's something that produces the pairs (and removes repeated ones). i couldn't work out how to use compactbuffer so it uses arraybuffer, since the source for compactbuffer says it's a more efficient arraybuffer. you may need to convert your compactbuffer in the flatmap to something that supports .combinations.

object sparkapp extends app {
import org.apache.spark.sparkcontext
import org.apache.spark.sparkcontext._
import org.apache.spark.sparkconf
import org.apache.spark.rdd.rdd
import scala.collection.mutable.arraybuffer


val data = list(
arraybuffer("person2", "person5"),
arraybuffer("person2", "person5", "person7"),
arraybuffer("person1", "person5", "person11"))

val conf = new sparkconf().setappname("spark-scratch").setmaster("local")
val sc= new sparkcontext(conf)


val datardd = sc.makerdd(data, 1)
val pairs = datardd.flatmap(
             ab => ab.combinations(2)
                     .flatmap{case arraybuffer(x,y) => list((x, y),(y,x))}
            ).distinct

pairs.foreach (println _)

}

output

(person7,person2)
(person7,person5)
(person5,person2)
(person11,person1)
(person11,person5)
(person2,person7)
(person5,person7)
(person1,person11)
(person2,person5)
(person5,person11)
(person1,person5)
(person5,person1)

Related Query

More Query from same tag