score:2

Accepted answer

It looks you struggle with at least two distinct problems here. Lets assume you have Dataset like this:

val ds = Seq(
  ("foo",  Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))), 
  ("foo",  Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0)))
).toDS

Selecting TypedColumn:

  • using implicit conversions with $:

    ds.select(col("_1").as[String])
    
  • using o.a.s.sql.functions.col:

    ds.select(col("_1").as[String])
    

Adding matrices:

  • MLLib Matrix and MatrixUDT don't implement addition. It means you won't be able to sum function or reduce with +
  • you can use third party linear algebra library but this is not supported in Spark SQL / Spark Dataset

If you really want to do it with Datsets you can try to do something like this:

ds.groupByKey(_._1).mapGroups(
  (key, values) => {
    val matrices = values.map(_._2.toArray)
    val first = matrices.next
    val sum = matrices.foldLeft(first)(
      (acc, m) => acc.zip(m).map { case (x, y) => x + y }
    )
    (key, sum)
})

and map back to matrices but personally I would just convert to RDD and use breeze.


Related Query

More Query from same tag