Accepted answer

It looks you struggle with at least two distinct problems here. Lets assume you have Dataset like this:

val ds = Seq(
  ("foo",  Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))), 
  ("foo",  Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0)))

Selecting TypedColumn:

  • using implicit conversions with $:"_1").as[String])
  • using o.a.s.sql.functions.col:"_1").as[String])

Adding matrices:

  • MLLib Matrix and MatrixUDT don't implement addition. It means you won't be able to sum function or reduce with +
  • you can use third party linear algebra library but this is not supported in Spark SQL / Spark Dataset

If you really want to do it with Datsets you can try to do something like this:

  (key, values) => {
    val matrices =
    val first =
    val sum = matrices.foldLeft(first)(
      (acc, m) => { case (x, y) => x + y }
    (key, sum)

and map back to matrices but personally I would just convert to RDD and use breeze.

Related Query

More Query from same tag