Skip to content Skip to sidebar Skip to footer

Node Insert Large Data Using Mongoose

I am trying to insert large data sets into mongodb with mongoose. But before that I need to make sure my for loop is working correctly. // basic schema settings var mongoose = requ

Solution 1:

The problem here is that the loop you are running is not waiting for each operation to complete. So in fact you are just queuing up 1000's of .save() requests and trying to run them concurrently. You can't do that within reasonable limitations, hence you get the error response.

The async module has various methods for iterating while processing a callback for that iterator, where probably the most simple direct for is whilst. Mongoose also handles the connection management for you without needing to embed within the callback, as the models are connection aware:

var tempColSchema = newSchema({
    cid: {
        type: Number,
        required: true
    },
    loc:[]
});

varTempCol = mongoose.model( "TempCol", tempColSchema );

mongoose.connect( 'mongodb://localhost/mean-dev' );

var i = 0;
async.whilst(
    function() { return i < 10000000; },
    function(callback) {
        i++;
        var c = i;
        console.log(c);
        var lon = parseInt(c/100000);
        var lat = c%100000;
        newTempcol({cid: Math.random(), loc: [lon, lat]}).save(function(err){
            callback(err);
        });            
    },
    function(err) {
       // When the loop is complete or on error
    }
);

Not the most fantastic way to do it, it is still one write at a time and you could use other methods to "govern" the concurrent operations, but this at least will not blow up the call stack.

Form MongoDB 2.6 and greater you can make use of the Bulk Operations API in order to process more than one write at a time on the server. So the process is similar, but this time you can send 1000 at a time to the server in a single write and response, which is much faster:

var tempColSchema = newSchema({
    cid: {
        type: Number,
        required: true
    },
    loc:[]
});

varTempCol = mongoose.model( "TempCol", tempColSchema );

mongoose.connect( 'mongodb://localhost/mean-dev' );

mongoose.on("open",function(err,conn) {

    var i = 0;
    var bulk = TempCol.collection.initializeOrderedBulkOp();

    async.whilst(
      function() { return i < 10000000; },
      function(callback) {
        i++;
        var c = i;
        console.log(c);
        var lon = parseInt(c/100000);
        var lat = c%100000;

        bulk.insert({ "cid": Math.random(), "loc": [ lon, lat ] });

        if ( i % 1000 == 0 ) {
            bulk.execute(function(err,result) {
                bulk = TempCol.collection.initializeOrderedBulkOp();
                callback(err);
            });
        } else {
            process.nextTick(callback);
        }
      },
      function(err) {
        // When the loop is complete or on error// If you had a number not plainly divisible by 1000if ( i % 1000 != 0 )
            bulk.execute(function(err,result) {
                // possibly check for errors here
            });
      }
    );

});

That is actually using the native driver methods which are not yet exposed in mongoose, so the additional care there is being taken to make sure the connection is available. That's an example but not the only way, but the main point is the mongoose "magic" for connections is not built in here so you should be sure it is established.

You have a round number of items to process, but where it is not the case you should be calling the bulk.execute() in that final block as well as shown, but it depends on the number responding to the modulo.

The main point is to not grow a stack of operations to an unreasonable size, and keep the processing limited. The flow control here allows operations that will take some time to actually complete before moving on to the next iteration. So either the batch updates or some additional parallel queuing is what you want for best performance.

There is also the .initializeUnorderedBulkOp() form for this if you don't want write errors to be fatal but handle those in a different way instead. Mostly see the official documentation on Bulk API and responses for how to interpret the response given.

Post a Comment for "Node Insert Large Data Using Mongoose"