score:2
Accepted answer
assuming your column metadata
contains json strings, you can first convert it to maptype
with from_json
function, then add the columns you want using map_concat
and finally convert again to json string using to_json
:
val df2 = df.withcolumn(
"metadata",
from_json(col("metadata"), lit("map<string,string>"))
).withcolumn(
"metadata",
to_json(map_concat(col("metadata"), map(lit("adjective"), col("adjective"))))
)
df2.show(false)
//+-----+-------+---------+----------------------------------------------------+
//|noun |pronoun|adjective|metadata |
//+-----+-------+---------+----------------------------------------------------+
//|homer|simpson|engineer |{"age":"50","country":"usa","adjective":"engineer"} |
//|elon |musk |king |{"age":"45","country":"rsa","adjective":"king"} |
//|bart |lee |cricketer|{"age":"35","country":"aus","adjective":"cricketer"}|
//|lisa |jobs |daughter |{"age":"35","country":"ind","adjective":"daughter"} |
//|joe |root |player |{"age":"31","country":"eng","adjective":"player"} |
//+-----+-------+---------+----------------------------------------------------+
this also can be done using a conversion to structtype instead of maptype but map is more generic in this case.
Source: stackoverflow.com
Related Query
- Append values from one column to another JSON column in the same dataframe
- How to swap the values from one column to another in Spark dataframe
- Compare the values in multiple columns in one Dataframe with multiple rows in one single column for target dataframe for the same record in Scala?
- How to replace string values in one column with actual column values from other columns in the same dataframe? Part 2
- How to replace string values in one column with actual column values from other columns in the same dataframe?
- Replace the value of one column from another column in spark dataframe
- How to join distinct values from the same column in one row with Spark/Scala?
- How to set the value from one column to another in the same row by scala-slick
- How to sum the values of one column of a dataframe in spark/scala
- add column from one dataframe to another dataframe in scala
- Split an Spark dataframe by some column values and then rotate each generated dataframe independently from the others
- Check if value from one dataframe column exists in another dataframe column using Spark Scala
- Read a path from a dataframe column and add another column from the dataframe
- Fetch all values irrespective of keys from a column of JSON type in a Spark dataframe using Spark with scala
- Create new column in Spark DataFrame with diff of previous values from another column
- How to rename a dataframe column and datatype from another dataframe values in spark?
- Scala Spark - copy data from 1 Dataframe into another DF with nested schema & same column names
- How to obtain DataFrame from the database table retrived with JDBC cut by the multiple date ranges with one date range per row in another DataFrame?
- Subtract one list from another in the same order
- Add a new Column in Spark DataFrame which contains the sum of all values of one column-Scala/Spark
- How to merge all unique values of a spark dataframe column into single row based on id and convert the column into json format
- Imputing null values in spark dataframe, based on the row category, by fetching the values from another dataframe in Scala
- How to replace a value in dataframe based on a value from another column in same dataframe?
- Pass one dataframe column values to another dataframe filter condition expression + Spark 1.5
- Scala Actor: Will messages passed from one actor to another within the same process always be processed in the original order?
- Adding a new column to a dataframe with a value which is based on the values from next rows
- Copying columns values from one dataframe into another dataframe in Spark + Scala
- Scala Spark Dataframe sum list of json values in the column
- Scala: Having a filter, selecting the values from another column
- Scala Spark creating a new column in the dataframe based on the aggregate count of values in another column
More Query from same tag
- Attach kafka offset to each record in foreachRDD
- Scala MapReduce Framework giving Type Mismatch
- Tree in scala swing
- ArrayBuffer elements are not being swapped in Scala
- Using Generics To Group A List Of Elements
- Can't we use sparkContext inside map function?
- filter markers in logback
- Mapping tuples in shapeless HList
- why use spark core API (RDD) when you can do most of it in spark-sql
- Spark Shell code automation using power shell or windows batch file
- Converting from Task[Either[A, Task[B]]] to Task[Either[A, B]]
- Given dataset of state snapshots at time t how to transform it into dataset with each state's effective start and end times?
- How to create a function that will only accept types that can be converted toJson with spray-json
- Using forall() in extracting String from Option[String] in scala
- compare the timestamp with a specific date in Spark and Scala
- Spray Client Post Multipart
- DRY when passing similar functions to Scala map()
- Kotlin zipAll alternative
- Migrating a generic append function to Scala 2.13 collections
- Fill in missing weeks within a given date interval in Spark (Scala)
- Adding a Maven dependency to Netbeans 6.9.1
- Group List elements with a distance less than x
- Handling large iterators - aggregation
- working shell command fails to execute with scala.sys.process
- Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]
- How do I run a basic ScalaFX application?
- Run java source annotation processing on Scala built class files
- Selecting rows based on a checks on muitiple columns of a dataframe
- Fixing case object identity / pattern matching under serialization
- PySpark shell command produces no output