score:2

Accepted answer

Not a straight forward task, but one approach would be to:

  1. group time and corresponding value into a "map" of time-value pairs
  2. flatten it out into a column of time-value pairs
  3. perform groupBy-pivot-agg transformation using time as part of the groupBy key and types as the pivot column to aggregate the time-corresponding value

Sample code below:

import org.apache.spark.sql.functions._

val df = Seq(
  ("G", "12/3/2018", "Import", "Voltage", 3.5, 6.8),
  ("H", "13/3/2018", "Import", "Voltage", 7.5, 9.8),
  ("H", "13/3/2018", "Export", "Watt", 4.5, 8.9),
  ("H", "13/3/2018", "Export", "Voltage", 5.6, 9.1)
).toDF("ID", "Date", "Type1", "Type2", "0:30", "1:00")

df.
  withColumn("TimeValMap", array(
    struct(lit("0:30").as("_1"), $"0:30".as("_2")),
    struct(lit("1:00").as("_1"), $"1:00".as("_2"))
  )).
  withColumn("TimeVal", explode($"TimeValMap")).
  withColumn("Time", $"TimeVal._1").
  withColumn("Types", concat_ws("-", array($"Type1", $"Type2"))).
  groupBy("ID", "Date", "Time").pivot("Types").agg(first($"TimeVal._2")).
  orderBy("ID", "Date", "Time").
  na.fill(0.0).
  show
// +---+---------+----+--------------+-----------+--------------+
// | ID|     Date|Time|Export-Voltage|Export-Watt|Import-Voltage|
// +---+---------+----+--------------+-----------+--------------+
// |  G|12/3/2018|0:30|           0.0|        0.0|           3.5|
// |  G|12/3/2018|1:00|           0.0|        0.0|           6.8|
// |  H|13/3/2018|0:30|           5.6|        4.5|           7.5|
// |  H|13/3/2018|1:00|           9.1|        8.9|           9.8|
// +---+---------+----+--------------+-----------+--------------+

Related Query

More Query from same tag