Add an example of @Fountaine007's first bullet.

I ran into the same issue and it's because the allocated vcores is less than the application's expectation.

For my specific scenario, I increased the value of yarn.nodemanager.resource.cpu-vcores under $HADOOP_HOME/etc/hadoop/yarn-site.xml.

For memory related issue, you may also need to modify yarn.nodemanager.resource.memory-mb.


There are two known reasons for this:

  1. Your application requires more resources (cores, memory) than allocated. Increasing worker cores and memory should solve it. Most other answers focus on this.

  2. Where less known, the firewall is blocking the communication between master and workers. This could happen especially you are using cloud service. According to Spark Security, besides the standard 8080, 8081, 7077, 4040 ports, you also need to make sure the master and worker can communicate via the SPARK_WORKER_PORT, spark.driver.port and spark.blockManager.port; the latter three are used by submitting jobs and are randomly assigned by the program (if left unconfigured). You may try to open all ports to run a quick test.

Related Query

More Query from same tag