score:0
`found a workaround, see if there is a better solution using one dataframe and no udf'
import org.apache.spark.sql.functions.regexp_replace
val df = spark.createdataframe(seq(
(1, "9/11/2020"),
(2, "10/11/2020"),
(3, "1/1/2020"),
(4, "12/7/2020"))).todf("id", "x4")
val newdf = df.withcolumn("x4new", regexp_replace(df("x4"), "(?:(\\b\\d{2}))/(?:(\\d))/(?:(\\d{4})\\b)", "$1/0$2/$3"))
val newdf1 = newdf.withcolumn("x4new1", regexp_replace(newdf("x4new"), "(?:(\\b\\d{1}))/(?:(\\d))/(?:(\\d{4})\\b)", "0$1/$2/$3"))
val newdf2 = newdf1.withcolumn("x4new2", regexp_replace(newdf1("x4new1"), "(?:(\\b\\d{1}))/(?:(\\d{2}))/(?:(\\d{4})\\b)", "0$1/$2/$3"))
val newdf3 = newdf2.withcolumn("date", to_date(regexp_replace(newdf2("x4new2"), "(?:(\\b\\d{2}))/(?:(\\d{1}))/(?:(\\d{4})\\b)", "$1/0$2/$3"),"mm/dd/yyyy"))
val formateddatadf = newdf3
.drop("x4new")
.drop("x4new1")
.drop("x4new2")
formateddatadf.printschema
formateddatadf.show
output looks like as follows
root
|-- id: integer (nullable = false)
|-- x4: string (nullable = true)
|-- date: date (nullable = true)
+---+----------+----------+
| id| x4| date|
+---+----------+----------+
| 1| 9/11/2020|2020-09-11|
| 2|10/11/2020|2020-10-11|
| 3| 1/1/2020|2020-01-01|
| 4| 12/7/2020|2020-12-07|
+---+----------+----------+
score:2
use from_unixtime,unix_timestamp
(or) date_format,to_timestamp,(or) to_date
in built functions.
example:(in spark-2.4)
import org.apache.spark.sql.functions._
//sample data
val df = spark.createdataframe(seq((1, "9/11/2020"),(2, "10/11/2020"),(3, "1/1/2020"), (4, "12/7/2020"))).todf("id", "x4")
//using from_unixtime
df.withcolumn("date",from_unixtime(unix_timestamp(col("x4"),"mm/dd/yyyy"),"mm/dd/yyyy")).show()
//using date_format
df.withcolumn("date",date_format(to_timestamp(col("x4"),"mm/dd/yyyy"),"mm/dd/yyyy")).show()
df.withcolumn("date",date_format(to_date(col("x4"),"mm/dd/yyyy"),"mm/dd/yyyy")).show()
//+---+----------+----------+
//| id| x4| date|
//+---+----------+----------+
//| 1| 9/11/2020|09/11/2020|
//| 2|10/11/2020|10/11/2020|
//| 3| 1/1/2020|01/01/2020|
//| 4| 12/7/2020|12/07/2020|
//+---+----------+----------+
Source: stackoverflow.com
Related Query
- In Apache Spark, I have a dataframe with one column which has string (its a date) but leading zero is missing from month and day
- Apache Spark: Convert column with a JSON String to new Dataframe in Scala spark
- Scala - Spark - How to transform a dataframe containing one string column to a DF with columns with the rigth type?
- How to transform a string column of a dataframe into a column of Array[String] with Apache Spark and Scala
- My task is to to update spark dataframe which has column of type string and struct. I am trying to do in scala
- Spark dataframe select rows with at least one null or blank in any column of that row
- Scala - Spark In Dataframe retrieve, for row, column name with have max value
- Spark : Parse a Date / Timestamps with different Formats (MM-dd-yyyy HH:mm, MM/dd/yy H:mm ) in same column of a Dataframe
- Spark Dataframe has duplicated values in all rows when adding column with random double
- Convert a column which contains array of string (of unequal size) to exactly two columns with multiple rows in scala spark
- Spark read from Teradata view which has column with Title
- Convert the date string with timezone column to timestamp in spark scala
- Is it possible to create a dataframe column with json data which doesn't have a fixed schema?
- Replace String in Spark Dataframe based on Map of Regex with its replacements + SCALA
- Add a new Column in Spark DataFrame which contains the sum of all values of one column-Scala/Spark
- Converting an Array[Double] Column into a string or two different columns with Spark Dataframe Scala
- how to identify digital chars as date from a string column in spark dataframe
- Transforming a Spark Dataframe Column into a Dataframe with just one line (ArrayType)
- Spark DataFrame aggregate multiple column into one column as a string
- filter string with quotes in Spark dataframe column
- Filter based on JSON data which is in a string column in a Spark dataframe
- Spark DataFrame convert milliseconds timestamp column in string format to human readable time with milliseconds
- Spark dataframe - Replace tokens of a common string with column values for each row using scala
- Spark Scala - converting Dataframe with one record and one column into Double
- spark dataframe add a new column with a function returning a string
- Convert Spark Dataframe Column with Seq of String to Nested Json with Dynamic Json Schema
- How to compare a column with the columns in the same dataframe in apache spark
- How to filter a spark dataframe based on occurrence of a value in a column with a condition a date column?
- Extract column values of Dataframe as List in Apache Spark
- How do I detect if a Spark DataFrame has a column
More Query from same tag
- Spark ML insert/fit custom OneHotEncoder into a Pipeline
- Scala Singleton Pattern: Private Primary Constructor
- Scala: how to get random element form given list
- How to transform a CompactBuffer to List
- Cache and Query a Dataset In Parallel Using Spark
- How do I get the last commit programmatically in Java code? Jenkins / sbt
- Managing Scala dependencies in Databricks notebooks
- Unable read file from partitioned directory
- Slick 3.0 bulk insert returning object's order
- Scala / Java - File not created inside the specified directory
- Possibilities to handle rate limit in Scala
- Scala abstract value members on list returning unexpected result?
- How to add source file name to each row in Spark?
- How do I make a FXML controller with scala?
- How SBT resolves conflicted version of libraries in dependency
- Transform one case class into another when the argument list is the same
- Idiomatic way of extracting known key-> val pairs from a string
- JUnit test categories in Scala?
- Can I suppress the unchecked warning in this case?
- How do you code up a pattern matching code block in scala?
- Play Framework + Angular Issue with JSON render on Page
- Get auto-generated ID from INSERT by Slick Plain SQL
- How do you mixin functionality to each step of an iterative procedure with Scala?
- why auto conversions between Java and Scala collections fail?
- How can I convert a length 4 byte array to an unsigned int?
- Jackson Scala JSON Deserialization to case classes
- Why do mutable and immutable ListMaps have different orders in Scala?
- Scala - How to extract list from java.io.Serializable type
- Eliminating identity wrapper types from Scala APIs
- Does Scala's specialization compose?