score:4
Accepted answer
Well... use substring to break lines. Then trim to remove wheitespaces. And then do whatever you want.
case class DataUnit(s1: Int, s2: String, s3:Boolean, s4:Double)
sc.textFile('your_file_path')
.map(l => (l.substring(0, 3).trim(), l.substring(3, 13).trim(), l.substring(13,18).trim(), l.substring(18,22).trim()))
.map({ case (e1, e2, e3, e4) => DataUnit(e1.toInt, e2, e3.toBoolean, e4.toDouble) })
.toDF
score:5
The Fixed Length format is very old and I could not find a good Scala library for this format... so I have created my own.
You can check it out here: https://github.com/atais/Fixed-Length
Usage with Spark is quite simple, you would get a DataSet
of your objects!
You first need to create a description of your objects, fe:
case class Employee(name: String, number: Option[Int], manager: Boolean)
object Employee {
import com.github.atais.util.Read._
import cats.implicits._
import com.github.atais.util.Write._
import Codec._
implicit val employeeCodec: Codec[Employee] = {
fixed[String](0, 10) <<:
fixed[Option[Int]](10, 13, Alignment.Right) <<:
fixed[Boolean](13, 18)
}.as[Employee]
}
And later just use the parser:
val input = sql.sparkContext.textFile(file)
.filter(_.trim.nonEmpty)
.map(Parser.decode[Employee])
.flatMap {
case Right(x) => Some(x)
case Left(e) =>
System.err.println(s"Failed to process file $file, error: $e")
None
}
sql.createDataset(input)
Source: stackoverflow.com
Related Query
- How to read a fixed length file in Spark using DataFrame API and SCALA
- Read JSON inside a text file using spark and Scala
- How to read .csv files using spark streaming and write to parquet file using Scala?
- How to Read and Change position of lines in a file using scala
- How to skip first and last line from a dat file and make it to dataframe using scala in databricks
- How to read file from Blob storage using scala to spark
- how to read a .dat file with delimiter /u0001 and record next record will be separating by next line in spark with scala
- How to read a csv file and assign values to the variable in spark scala
- how to read and write (update) the same file using spark (scala)
- How to load complex xml files containing more than 1 row tag into dataframe using spark scala and save it as table(note generic solution)
- how to read messages in kafka using Spark Scala API
- How to use a JSON mapping file to generate a new DataFrame in Spark using Scala
- How to filter the data from Rdd and save it to text file using scala in spark
- Read file path available in a dataframe and read content of those files using spark
- Read a CSV file with , as delim and numeric data also contain , separator to create RDD in Spark using Scala
- Spark scala dataframe read and show multiline json file
- How can i read and modify a json file in scala spark play
- How to convert dataframe to RDD and store it in cassandra using spark scala
- How to convert dataframe to RDD and store it in cassandra using spark scala
- read a large csv file and separate it according to conditions using scala / spark
- Read csv file into a dataframe and access it using scala
- Write and read raw byte arrays in Spark - using Sequence File SequenceFile
- Spark-SQL : How to read a TSV or CSV file into dataframe and apply a custom schema?
- Scala Spark : How to create a RDD from a list of string and convert to DataFrame
- How to read json data using scala from kafka topic in apache spark
- How do I groupby and concat a list in a Dataframe Spark Scala
- How to develop and run spark scala from vscode using sbt or Metals
- How to read a text file using Relative path in scala
- How to read a checkpoint Dataframe in Spark Scala
- Spark DataFrame - Read pipe delimited file using SQL?
More Query from same tag
- Add to list nth times depending of element's value
- Chain multiple method calls using a map in Scala
- Understanding the scala substitution model through the use of sumInts method
- Scala: lazy vals, call by name, closures and memory leaks
- Scala closures filename
- How to make Intellij resolve package from private maven repository in ammonite script?
- How does one provide an unmarshaller for a Scala trait in Akka HTTP?
- How to disable package and publish tasks for root aggregate module in multi-module build?
- How to print out a classes' var fields in toString without override?
- Correlation Of dynamic value in Gatling
- make avro schema from a dataframe - spark - scala
- Why does the Scala Actor implementation involve synchronized code?
- None, Some and conditional treatments
- Replace every word in a document based on a defined pattern
- Scala REPL: How to load specific package/class/object?
- How to delete all files from hdfs directory with scala
- Packing scala tuple to custom class object
- Understanding `f: Int => _`
- Modify and build spark-mllib locally
- dataframe with dynamic when condition using with column
- java.io.NotSerializableException: org.apache.spark.sql.Column when I created a new column using a condition with a UDF
- Resizing JFrame when child resizes: how to keep up?
- Scala match in equals method
- Scala performs multiple operations on the same object without repeating its name
- Play 2.6: Cache Api and Memory Usage Configuration
- explode a row of spark dataset into several rows with added column using flatmap
- while retrieving string into json from database facing error
- The proper way to compute correlation between two Seq columns into a third column
- Can't resolve the implicit for a constrained class type parameter
- Scala math lib for BigDecimals