score:3
If you want to find how many 1 and 0s there are you can do:
val rdd = clusterAndLabel.map(x => (x,1)).reduceByKey(_+_)
this will give you an RDD[(Int,Int),Int]
containing exactly what you described, meaning: [((0,0),2), ((1,0),1), ((1,1),1), ((2,1),2), ((2,0),1)]
. If you really want them gathered by their first key, you can add this line:
val rdd2 = rdd.map(x => (x._1._1, (x._1._2, x._2))).groupByKey()
this will yield an RDD[(Int, (Int,Int)]
which will look like what you described, i.e.: [(0, [(0,2)]), (1, [(0,1),(1,1)]), (2, [(1,2),(0,1)])]
.
If you need the number of instances, it looks like (at least in your example) clusterAndLabel.count()
should do the work.
I don't really understand question 3? I can see two things:
you want to know how many keys have 3 occurrences. To do so, you can start from the object I called
rdd
(no need for the groupByKey line) and do so:val rdd3 = rdd.map(x => (x._2,1)).reduceByKey(_+_)
this will yield and
RDD[(Int,Int)]
which is kind of a frequency RDD: the key is the number of occurences and the value is how many times this key is hit. Here it would look like:[(1,3),(2,2)]
. So if you want to know how many pairs occur 3 times, you just dordd3.filter(_._1==3).collect()
(which will be an array of size 0, but if it's not empty then it'll have one value and it will be your answer).you want to know how many time the first key 3 occurs (once again 0 in your example). Then you start from
rdd2
and do:val rdd3 = rdd2.map(x=>(x._1,x._2.size)).filter(_._1==3).collect()
once again it will yield either an empty array or an array of size 1 containing how many elements have a 3 for their first key. Note that you can do it directly if you don't need to display
rdd2
, you can just do:val rdd4 = rdd.map(x => (x._1._1,1)).reduceByKey(_+_).filter(_._1==3).collect()
(for performance you might want to do the filter before
reduceByKey
also!)
Source: stackoverflow.com
Related Query
- in scala how do we aggregate an Array to determine the count per Key and the percentage vs total
- how do you get the maximum per key for an Array [(Int,Double)] and then aggregate them
- How to get the final element of a hierarchical array in scala and apply aggregate functions on it?
- how to convert image (.jpg) to array in scala and how to count the green pixel in the same array using scala
- How does Scala Cons pattern matching determine the head and the tail of a List?
- In scala, how to get an array of keys and values from map, with the correct order (i-th key is for the i-th value)?
- Scala how to match two dfs if mathes then update the key in first df and select all columns from required df
- Spark Scala Count the Occurrence of Array of strings in the Map Key
- Using Apache Spark, how to count the occurrences of each pair in a Scala Array
- how to split and handle the array data in Scala in more functional way
- How to add the elements into the Map where key is String and Value is List[String] in Scala
- How to generate the random values of map from a given set of values and then store the key and values into separate variables in scala
- How to break a Map in Scala and save the key at breakpoint
- Scala : How to group by key and sum the values up in scala and return the list in expected return type
- How to loop through the Dataframe which is of type of Array and append the value to a final Dataframe using Scala
- How to model a relation in Scala between the key type and the corresponding value type
- Scala. How to take the count of array of first array using scala
- How to map the array key value parameters using SCALA
- How to sort a map in scala where the key is List[Double] and the value is double. And I wanna sort with double?
- how to take 2 values from array and put it as pair and set count to 1 map function in scala
- How do i parse a file,split it as per the requirement and store it in a list in scala
- What are the key differences between Scala and Groovy?
- Scala how can I count the number of occurrences in a list
- How does Scala know the difference between "def foo" and "def foo()"?
- What is the difference between Array and WrappedArray in Scala
- How to create a Scala class with private field with public getter, and primary constructor taking a parameter of the same name
- How can I idiomatically "remove" a single element from a list in Scala and close the gap?
- How can I syntax check a Scala script without executing the script and generating any class files?
- Why is the Scala for-loop (and internals) NumericRange restricted to Int size and how to elaborate the functionality?
- Scala 12.x and Java 11 `String.lines`, how to force the implicit conversion in a chained call?
More Query from same tag
- Functional Programming scala
- Return in a map
- Annotations on anonymous inner type
- difficulty with scala package
- Apache Spark in Scala not printing rdd values
- Spark - Make dataframe with multi column csv
- How to define this class
- Not getting any tweets using TwitterUtils and Spark Streaming
- Heap Memory is growing while using akka-http for implementing web socker server
- Scala Replacement by Regex
- In Slick, what import is needed to use Table[T]?
- maven-scala-spring-jetty servlet not initialized
- Scala: Assigning path dependant type to type projection
- ofInstant is not a member of object java.time.LocalDate
- In simplest term, what is currying and why should this approach be favored over traditional programming paradigm?
- Java List to Scala Conversion Error
- Strange behaviour with StdIn.readLine in Scala?
- pass typesafe config file in maven
- IOException: Cannot run program "javac": CreateProcess error=2, The system cannot find the file specified
- Explanation of Scala map function signature?
- How to get the name of a nested object in Scala without its package name and without $?
- Get class object from Java Interface
- Does Slick Direct Embedding with Play 2.0 Work?
- Converting vector of vectors to Matrix in scala
- How to compile Scala source code into .Net
- Mill Build Tool - Install dependencies without compiling source code
- How to suppress info and success messages in sbt?
- Is there way to create tuple from list(without codegeneration)?
- What is the type alias rules in Scala?
- Thread.sleep() optimization for small sleep intervals