score:0
tr
might be the faster solution. note, you can pipe any strings, so in this case, i'm cat
ing a file on disk, but this can also be a file stream from sftp.
~/desktop/test $ cat data.txt
$~$field1$~$|$~$field2$~$|$~$field3$~$
$~$data1$~$|$~$data2$~$|$~$data3$~$
$~$data4$~$|$~$data5$~$|$~$data6$~$
$~$data7$~$|$~$data8$~$|$~$data9$~$
# the '>' will open a new file for writing
~/desktop/test $ cat data.txt | tr -d \$~\$ > output.psv
# see the results here
~/desktop/test $ cat output.psv
field1|field2|field3
data1|data2|data3
data4|data5|data6
data7|data8|data9
examples: https://shapeshed.com/unix-tr/#what-is-the-tr-command-in-unix
score:0
here is a pure spark solution. there might be better performing solutions.
var df = spark.read.option("delimiter", "|").csv(filepath)
val replace = (value: string, find: string, replace: string) => value.replace(find, replace)
val replaceudf = udf(replace)
df.select(
df.columns.map(c => replaceudf(col(c), lit("$~$"), lit("")).alias(c)): _*)
.show
update: you cannot use $~$
as quote
option or use $~$|$~$
as a delimiter
in the 2.3.0 as those options accept only single character.
score:0
using regexp_replace and foldleft to update all columns. check this out
scala> val df = seq(("$~$data1$~$","$~$data2$~$","$~$data3$~$"), ("$~$data4$~$","$~$data5$~$","$~$data6$~$"), ("$~$data7$~$","$~$data8$~$","$~$data9$~$"),("$~$data10$~$","$~$data11$~$","$~$data12$~$")).todf("field1","field2","field3")
df: org.apache.spark.sql.dataframe = [field1: string, field2: string ... 1 more field]
scala> df.show(false)
+------------+------------+------------+
|field1 |field2 |field3 |
+------------+------------+------------+
|$~$data1$~$ |$~$data2$~$ |$~$data3$~$ |
|$~$data4$~$ |$~$data5$~$ |$~$data6$~$ |
|$~$data7$~$ |$~$data8$~$ |$~$data9$~$ |
|$~$data10$~$|$~$data11$~$|$~$data12$~$|
+------------+------------+------------+
scala> val df2 = df.columns.foldleft(df) { (acc,x) => acc.withcolumn(x,regexp_replace(col(x),"""^\$~\$|\$~\$$""","")) }
df2: org.apache.spark.sql.dataframe = [field1: string, field2: string ... 1 more field]
scala> df2.show(false)
+------+------+------+
|field1|field2|field3|
+------+------+------+
|data1 |data2 |data3 |
|data4 |data5 |data6 |
|data7 |data8 |data9 |
|data10|data11|data12|
+------+------+------+
scala>
Source: stackoverflow.com
Related Query
- How to trim fields when loading into dataframe in spark?
- java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL
- How to extract urls from HYPERLINKS in excel file when reading into scala spark dataframe
- How can I print nulls when converting a dataframe to json in Spark
- How to read in-memory JSON string into Spark DataFrame
- How to load the csv file into the Spark DataFrame with Array[Int]
- how do i convert a flat dataframe into a nested json in spark (scala or java)
- How to break each rows into multiple rows in Spark DataFrame using scala
- Apache Spark Dataframe How to turn off partial aggregation when using groupBy?
- How to load data into Teradata table using FASTLOAD through Spark dataframe
- How to write a Spark dataframe into Kinesis Stream?
- How to push Spark dataframe to Sql Server table when set_identity is off?
- How to load data at Spark Dataframe in each Worker to prevent loading huge data to Master node
- How to merge the value of several columns into a map in Spark Dataframe
- How do I explode multiple columns of arrays in a Spark Scala dataframe when the columns contain arrays that line up with one another?
- Spark 1.6: Exception when saving DataFrame into Hive
- How to convert spark streaming output into dataframe or storing in table
- Scala Spark - how to iterate fields in a Dataframe
- How do I parse timestamps correctly when using saveToEs to save data from a Spark DataFrame to Elasticsearch?
- How to parse a csv string into a Spark dataframe using scala?
- How to insert record into a dataframe in spark
- How to calculate the ApproxQuanitiles from list of Integers into Spark DataFrame column using scala
- How to read decimal logical type into spark dataframe
- How to load data into Product case class using Dataframe in Spark
- How to load a json file which is having double quotes within a string into a dataframe in spark scala
- How to covert nested struct into nested map for Spark DataFrame
- How to store a spark DataFrame as CSV into Azure Blob Storage
- How to get Table Stats of Hive table into Dataframe in Spark Scala
- How to encode string values into numeric values in Spark DataFrame
- how to convert rows into columns in spark dataframe using scala
More Query from same tag
- Scala first program issue
- How to call merge sort
- Play 2.5.X dependency injection
- How does the order of instructions works inside Scala classes when you override a method
- Scala the index of a number in which interval of an array
- Joining two Array[Byte] in Scala?
- Scala type inference of implicitly resolved anonymous functions
- Scala Macro Annotations - Why doesn't my class appear to be updated when I use it as a type parameter?
- Add a column to DataFrame with value of 1 where prediction greater than a custom threshold
- How to have multiple traits of same base trait implement the same method
- Longest alphabetical Order sub String from a String in Scala
- How to insert only some fields to hive table, reusing existing table schema?
- Scala: Reading HDFS file as Stream
- Define scalaz monad instance for a shapeless hlist
- Map over SBT's Setting while keeping scopes?
- Need to convert a Seq[Option[A]] to Option[Seq[A]]
- Warning when upgrading junrar: Symbol extractArchive is deprecated
- Strange exception when using groupBy in slick
- How do I cast Any returned from executeInsert in Anorm to Long
- How to validate for nullable types using json schema validator?
- Stream in production code
- ResultSet exhausted error is coming while accessing value of cassandra table through scala
- Compiler unable to find implicit Witness.Aux[_] for singleton symbol
- java.lang.IllegalArgumentException: requirement failed: The inlets [] and outlets [] must correspond to the inlets [Map.in] and outlets []
- Play 2 configuration file : Reading a list of string variable
- In scala, how to make type class working for Aux pattern? - Part 2
- Get Seq of entities from akka http response based on json array
- Is this the correct way of translating Java interface into Scala?
- scala Play - repeated WS calls - making it Async
- filter only not empty arrays dataframe spark