score:53
Some tips :
Do not index your collection before inserting, as inserts modify the index which is an overhead. Insert everything, then create index .
instead of "save" , use mongoDB "batchinsert" which can insert many records in 1 operation. So have around 5000 documents inserted per batch. You will see remarkable performance gain .
see the method#2 of insert here, it takes array of documents to insert instead of single document. Also see the discussion in this thread
And if you want to benchmark more -
This is just a guess, try using a capped collection of a predefined large size to store all your data. Capped collection without index has very good insertion performance.
score:0
Another alternative is to try TokuMX. They use Fractal Indexes which means that it does not slow down over time as the database gets bigger.
TokuMX is going to be included as a custom storage driver in an upcoming version of MongoDB.
The current version of MongoDB runs under Linux. I was up and running on Windows quite quickly using Vagrant.
score:4
What I did in my project was adding up a bit of multithreading (the project is in C# but I hope the code is self-explanatory). After playing with the necessary number of threads it turned out that setting the number of threads to the number of cores leads to a slightly better performance(10-20%) but I suppose this boost is hardware specific. Here is the code:
public virtual void SaveBatch(IEnumerable<object> entities)
{
if (entities == null)
throw new ArgumentNullException("entities");
_repository.SaveBatch(entities);
}
public void ParallelSaveBatch(IEnumerable<IEnumerable<object>> batchPortions)
{
if (batchPortions == null)
throw new ArgumentNullException("batchPortions");
var po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(batchPortions, po, SaveBatch);
}
score:6
I've had the same thing. As far as I can tell, it comes down to the randomness of the index values. Whenever a new document is inserted, it obviously also needs to update all the underlying indexes. Because you're inserting random, as opposed to sequential, values into these indexes, you're constantly accessing the entire index to find where to place the new value.
This is all fine to begin with when all the indexes are sitting happily in memory, but as soon as they grow too large you need to start hitting the disk to do index inserts, then the disk starts thrashing and write performance dies.
As you're loading the data, try comparing db.collection.totalIndexSize()
with the available memory, and you'll probably see this happen.
Your best bet is to create the indexes after you've loaded the data. However, this still doesn't solve the problem when it's the required _id index that contains a random value (GUID, hash, etc.), then your best approach might be to think about sharding or getting more RAM.
Source: stackoverflow.com
Related Query
- How to load 100 million records into MongoDB with Scala for performance testing?
- scala & mongoDB - how to count records with salat?
- How to read from textfile(String type data) map and load data into parquet format(multiple columns with different datatype) in Spark scala dynamically
- Scala Play: How to inject test Database into Controller for testing
- How to resolve "Failed to load class" with Spark 3 on EMR for Scala object
- How to put datas from MongoDB into an object with Casbah and Scala
- With Scala Salat (using mongodb casbah), how to store and load binary data?
- How to insert double quotes into String with interpolation in scala
- How do I find the correct Maven archetype project for developing with Scala in Eclipse?
- How to load JSON file using Play with Scala
- How do I create a TestActorRef in Scala for an Actor with constructor params?
- How can I use JMH for Scala benchmarks together with sbt?
- How do you make a list with 100 1s in Scala 2.9
- How to know if a Scala file modified with IntelliJ Idea is saved and if it is checked into CVS?
- re-run with -feature for details, How to see scala feature warnings when building with gradle?
- Scala spark: how to use dataset for a case class with the schema has snake_case?
- How to create DSL in Scala for command lines with minimum extra boilerplate
- Load performance testing with Gatling and Content-Type
- How to load the csv file into the Spark DataFrame with Array[Int]
- How Scala achieve performance improvement for Map and Set by using different Class based on size?
- How can I get random data generated for scala case classes with the ability to "change some values" for unit testing?
- How do I write a scala extractor for a case class with default parameters?
- How do I create horizontal or vertical struts and glue for use with scala BoxPanel?
- How to combine sbt continuous testing with eclipse scala ide?
- how to decode Java strings with Unicode escapes etc. from Scala JavaTokenParsers into unescaped strings?
- How to fetch records from the database using Play with Scala and Slick
- How to convert Play! 2.0 Scala file into a format for CSVReader?
- How to return mongodb ObjectId _id after insertOne with mongodb scala driver
- How do I read a non standard csv file into dataframe with python or scala
- How to iterate over a huge amount of records with scala sorm
More Query from same tag
- Gatling HTTP Proxy
- canEqual() in the scala.Equals trait
- Play Framework with React in Webjar
- Akka scheduler : strange behavior in production (messages not firing)
- TypeError Expected: Result, Actual: Future[SimpleResult]
- Converting disrete chunks of Stdin to usable form
- Replacing delegation with cake
- How to match multiple files with names using TextIO.Read in Cloud Dataflow
- does the Scala compiler expose the functionality to turn a Scala name into the underlying JVM name?
- play2.0 dist getting errors
- How do I reflectively create a new collection?
- scalamock stubbing method with specific parameters fails on null
- How to define scala function with undefined number and type of arguments
- read only required data in play 2
- akka actor: Patterns.pipe for Either
- Lifting A Future using EitherT.liftF When Value Is Already a Future
- Using play-scalr raises Unsupported major.minor version 51.0
- Implementing ADT prosthetics in scala - why the compiler doesn't want my implicit?
- Why does Scala Try not catching java.lang.StackOverflowError?
- Scala: why must use def for generic instead of var
- Establishing singleton connection with Google Cloud Bigtable in Scala similar to Cassandra
- How to have a thread safe data in scala
- How to foreachRDD over records from Kafka in Spark Streaming?
- How to assign a static date to a variable in Scala
- Parboiled2 causes "missing or invalid dependency detected while loading class file 'Prepender.class'"
- scala speed when using get() method on hash tables? (are temporary Option() objects generated?)
- Formatting with an Option[String] in Scala
- Spring Ehcache : Delete entries by a condition like endDate > now
- Akka HTTP flow not stopping when exception thrown
- Concatenanting UDF in scala Spark