score:6

Accepted answer

You can read as a text file and get the path of file as below.

import org.apache.spark.sql.functions.input_file_name
val spark = SparkSession
  .builder()
  .appName("Test App")
  .master("local[1]")
  .getOrCreate()
import spark.implicits._

val data = spark.read.text("/parent_dir/*")
  .select(input_file_name().as("path"), $"value")

Now you get the data as dataframe that consist of path of file and data as

+--------------------------------+-------+
|path                            |value  |
+--------------------------------+-------+
|file:///parent_dir/subdir1/file1|abc|123|
|file:///parent_dir/subdir1/file1|def|456|
|file:///parent_dir/subdir3/file1|jkl|901|
|file:///parent_dir/subdir2/file1|ghi|789|
+--------------------------------+-------+

Now you can parse the path and get only the required directory.

Hope this helps!


Related Query

More Query from same tag