Accepted answer

Well... use substring to break lines. Then trim to remove wheitespaces. And then do whatever you want.

case class DataUnit(s1: Int, s2: String, s3:Boolean, s4:Double)

  .map(l => (l.substring(0, 3).trim(), l.substring(3, 13).trim(), l.substring(13,18).trim(), l.substring(18,22).trim()))
  .map({ case (e1, e2, e3, e4) => DataUnit(e1.toInt, e2, e3.toBoolean, e4.toDouble) })


The Fixed Length format is very old and I could not find a good Scala library for this format... so I have created my own.

You can check it out here:

Usage with Spark is quite simple, you would get a DataSet of your objects!

You first need to create a description of your objects, fe:

case class Employee(name: String, number: Option[Int], manager: Boolean)

object Employee {

    import com.github.atais.util.Read._
    import cats.implicits._
    import com.github.atais.util.Write._
    import Codec._

    implicit val employeeCodec: Codec[Employee] = {
      fixed[String](0, 10) <<:
        fixed[Option[Int]](10, 13, Alignment.Right) <<:
        fixed[Boolean](13, 18)

And later just use the parser:

val input = sql.sparkContext.textFile(file)
               .flatMap {
                  case Right(x) => Some(x)
                  case Left(e) =>
                         System.err.println(s"Failed to process file $file, error: $e")

Related Query

More Query from same tag