score:0

you have multiple files per partition because each node writes output to its own file. that means that the only way how to have only single file per partition is to re-partition data before writing. please note, that that will be very inefficient because data repartition will cause shuffling on your data.

score:1

having lots of files is the expected behavior as each partition (resulting in whatever computation you had before the write) will write to the partitions you requested the relevant files

if you wish to avoid that you need to repartition before the write:

spark.createdataframe(asrow, struct)
      .repartition("foo","bar")
      .write
      .partitionby("foo", "bar")
      .format("text")
      .save("/some/output-path")

Related Query