In which case would we like to have more than one executor in each worker?
Whenever possible: if jobs require less resources for one executor than what a worker node has, then spark should try to start other executors on the same worker to use all its available resources.
But that's the role of spark, not our call. When deploying spark apps, it is up to spark to decide how many executors (jvm process) are started on each worker node (machine). And it depends on the executor resources (core and memory) required by the spark jobs (the
spark.executor.* configs). We often don't know what resources are available per worker. A cluster is usually shared by multiple apps/people. So we configure executor number and required resources and let spark decide to run them on the same worker or not.
Now, your question is maybe "should we have less executors with lots of cores and memory, or distribute it in several little executors?"
Having less but bigger executors reduce shuffling, clearly. But there are several reasons to also prefer distribution:
- It is easier to start little executors
- Having big executor means the cluster need all required resources on one worker
- it is specially useful when using dynamic allocation, that will start and kill executor in function of runtime usage
- Several little executors improve resilience: if our code is unstable and might sometime kill the executor, everything is lost and restarted.
- I met a case where the code used in the executor wasn't thread safe. That's a bad thing, but it wasn't done on purpose. So until, or instead :\ , this is fixed, we distributed the code on many 1-core executors.
- Regarding Spark, In which cases we would like to have more then one executor in each worker node?
- Spark Get only columns that have one or more null values
- How to perform one operation on each executor once in spark
- Which guarantees do Scala's singletons have regarding serialization?
- Use more than one collect_list in one query in Spark SQL
- Which one is more idiomatic/preferred in Scala: Map(<stuff>: _*) or <stuff>.toMap?
- Spark 2.0+ Even the dataframe is cached, if one of its source changes, it would recompute?
- Spark Join of 2 dataframes which have 2 different column names in list
- Spark UDF returning more than one item
- Scalatest: have a test which is valid if one of both matchers matches
- Does Scala have a value restriction like ML, if not then why?
- apache spark - which one encounters less memory bottlenecks - reduceByKey or reduceByKeyLocally?
- Apache Spark using running one task on one executor
- Getting error from spark : Number of dynamic partitions created is 1041, which is more than 1000
- scala distinct() vs. spark distinct(), which is more efficient?
- Assigning an anonymous function - which one is more idiomatic?
- Spark : How do I find the passengers who have been on more than 3 flights together
- How to pass column names in selectExpr through one or more string parameters in spark using scala?
- Spark Window function using more than one column
- Generate a flat case class from a schema avro which has more then 254 fileds
- Spark - after a withColumn("newCol", collect_list(...)) select rows with more than one element
- Can I read csv files from Google Storage using Spark in more than one executor?
- Using same check point location for more than one consumers - Spark direct streaming
- Scala-How to apply this class code to a datafile which contains more than one record
- More than one spark context error
- HTTP Services REST - how to have more than one PUT method
- Why sortWith(lt: (A, A) => Boolean) require a function with two params but can use compareTo which have only one param?
- How to pivot on more than one column for a spark dataframe?
- Add a new Column in Spark DataFrame which contains the sum of all values of one column-Scala/Spark
- Which is more efficient, max or order by desc limit 1 in HIVE using spark version 2
More Query from same tag
- Defining variables in scala using def
- How to prevent Scala's XML PrettyPrinter class from removing newlines
- Concise syntax for function composition in Scala?
- Invoking print() with list.foreach in Scala is printing Nil
- Not sure what this Scala Syntax is doing
- Not able to access a file using Relative paths in Scala
- Extend with dependent type
- Automatic type hints resolving in json4s
- Attach sources not working in ScalaIDE
- Modify constructor arguments before passing it to superclass constructor in Scala
- Play upgrade to 2.3.8 fails scala tests that use wiremock
- Understanding Scala Play Actions and Futures
- How memory allocation takes place in scala
- Invert a Scala Future
- Scala 3 "a match type could not be fully reduced" with literal types
- Is it a good practice to set GLOBAL variables like this in playframework?
- How to avoid a nested query?
- How to silence exception stack trace from Play framework stdout
- How to send data from Spark to my Angular8 project
- Is using Try[Unit] the proper way?
- Get some values from each object in Scala List of objects and send it to HTTP
- Most efficient way to format a string in scala with leading Zero's and a comma instead of a decimal point
- Why does the Scala compiler give "value registerKryoClasses is not a member of org.apache.spark.SparkConf" for Spark 1.4?
- Using IntelliJ, how to add dependency in an sbt project
- Polling with Akka-Http stream
- Increment month column by 1 and store in scala variable
- Slick 3.1.1 deduplication errors with sbt-assembly
- how to split a column containing pipe-separated string into two columns in scala
- Why am i getting Set[Char] instead of Set[String]?
- How to compile and execute scala code at run-time in Scala3?