Try to avoid repartition() call as it causes unnecessary data movements within the nodes.

According to Learning Spark

Keep in mind that repartitioning your data is a fairly expensive operation. Spark also has an optimized version of repartition() called coalesce() that allows avoiding data movement, but only if you are decreasing the number of RDD partitions.

In a simple way COALESCE :- is only for decreases the no of partitions , No shuffling of data it just compress the partitions.

More Query from same tag