score:4

Accepted answer

you will have to separate the person struct column into separate columns and then use drop

new_df.select("car", "person.*").drop("name")

if you want person.age back then you can construct it back as struct

import org.apache.spark.sql.functions._
new_df
  .select("car", "person.*")
  .drop("name")
  .withcolumn("person", struct("age"))
  .drop("age")

root
 |-- car: string (nullable = true)
 |-- person: struct (nullable = false)
 |    |-- age: long (nullable = true)

as @raphaelroth has pointed out in the comments below that you can just use

new_df.select($"car",struct($"person.age").as("person"))

or even shorter as

new_df.withcolumn("person", struct("person.age"))

udf way

you can even do it in udf way (is not recommended though) (just for your information)

import org.apache.spark.sql.functions._
def removestruct = udf((p: personold)=> person(p.age))

new_df.withcolumn("person", removestruct(col("person")))

for that you would need two case classes though

case class personold(age: long, name: string)
case class person(age: long)

Related Query