score:0

So, it all points to that after the file is inside JAR, it can only be accessed as a inputstream to read the chunk of data from within the compressed file.

I arrived at a solution, even though its not pretty it does what I need, that is to read a csv file, take the 2 first columns and make it into a dataframe and after load it inside a key-value structure (in this case i created a case class to hold these pairs).

I am considering migrating these lookups to a HOCON file, that may make the process less convoluted to load these lookups


import sparkSession.implicits._
val fileStream = scala.io.Source.getClass.getResourceAsStream("/lookup01.csv")
val input = sparkSession.sparkContext.makeRDD(scala.io.Source.fromInputStream(fileStream).getLines().toList).toDF()

val myRdd = input.map {
      line =>
        val col = utils.Utils.splitCSVString(line.getString(0))
        KeyValue(col(0), col(1))
    }

val myDF = myRdd.rdd.map(x => (x.key, x.value)).collectAsMap()

fileStream.close()

score:2

You have to get the correct path from classPath

Considering that your file is under src/main/resources:

val path = getClass.getResource("/lookup01.csv")

val v_lookup = sparkSession.read.option( "header", true ).csv(path)

Related Query

More Query from same tag