Summarizing what I found while researching this issue. Hadoop's TextInputFormat or most input formats deal with a line at a time. Also, it uses the delimiter as a character/string and not as a regex.
One way around this is to build a custom Regex InputFormat. This blog describes how this can be done in more detail
Another approach to dealing with few lines that have these escape sequences is to filter out those lines in a separate RDD, reduce it to a String and Split it to generate a new RDD which can be Unioned back. This works as a hack but not a real solution to the problem. Better solutions are welcome
- Using negative lookbehind regex as record delimiter to read Hadoop File in Spark
- how to read a .dat file with delimiter /u0001 and record next record will be separating by next line in spark with scala
- Spark read file from S3 using sc.textFile ("s3n://...)
- Write and read raw byte arrays in Spark - using Sequence File SequenceFile
- Read file on remote machine in Apache Spark using ftp
- Spark DataFrame - Read pipe delimited file using SQL?
- How to read the json file in spark using scala?
- Read JSON inside a text file using spark and Scala
- How to read text file using Scala(spark) line by line and split using delimiter and store values in respective columns?
- How to read .csv files using spark streaming and write to parquet file using Scala?
- How to read a fixed length file in Spark using DataFrame API and SCALA
- How to read file from Blob storage using scala to spark
- How to launch a Jar file using Spark on hadoop
- How can I read this avro file using spark & scala?
- how to read and write (update) the same file using spark (scala)
- How to read a file in Spark (with scala) using new File()?
- Using Custom Hadoop input format for processing binary file in Spark
- Read CSV file in Spark and insert it to HBase using created RDD
- How To Read XML File from Azure Data Lake In Synapse Notebook without Using Spark
- Scala Apache Spark Filter DF Using Arbitrary Number of Bounding Boxes Read From File
- Read a remote file from a specific node in my cluster using Spark Submit
- How to read a text file in S3 bucket from inside an AWS EMR without using spark
- Unable To Read local JSON File using spark submit
- unable to read a CSV file present in AWS S3 folder locally in intelij using spark scala
- Create multiple RDDs from single file based on row value ( header record in sample file) using Spark scala
- Spark Scala read csv file using s3a
- Read file path available in a dataframe and read content of those files using spark
- Read a CSV file with , as delim and numeric data also contain , separator to create RDD in Spark using Scala
- Read line from file apply regex and write to parquet file scala spark
- Read CSV file with values containing delimiter in Apache spark
More Query from same tag
- Can sbt-native-packager generate multiple start scripts for one project?
- SBT multi-project build without using lazy vals
- Scala compilation issue when using Java library with interface using generic type
- Parsing error when trying to parse comma-delimited key-value pairs with values containing list separated by commas
- Scala parser gets stuck in infinite loop
- compare the timestamp with a specific date in Spark and Scala
- Akka Source, is there a way to throttle based on a global rate limit coming from an api call?
- Scala macro annotations do not get expanded (macro paradise)
- elastic4s and play framework 2.2.1
- potential memory leak using TriMap in Scala and Tomcat
- Combining a Parser using andThen to create another Parser of different type
- Pattern matching refuses to recognize member type (value X is not a member of _2)
- How to return a "primitive Java char" from Scala?
- Option[Future[Option[Int]]] => Future[Option[Int]]
- Strange behavior with implicits
- How to use Kafka consumer in spark
- Scala bazel Error: Could not find or load main class
- Scala type system, cannot find common ancestor inline
- How can I iterate a scala collection using a loop?
- Merging the datetime ranges at the end of the day and start of the next day
- Implicit vals in traits behave differently with type annotation
- Scala Future callbacks not seem to be doing anything
- Equals free function definition in scala
- Mac OS Intellij Error while running scala program
- Using sealed trait as a key for a map
- SBT Web Plugin : Error getting ScopedKey(Scope(This,Select(ConfigKey(container)),This,This),full-classpath)
- Creating AST for arithmetic expression in Scala
- convert java.util.Map[String, Object] to scala.collection.immutable.Map[String, Any]
- Convert JSON to CSV in Scala?
- How to execute select query on oracle database using pi spark?