Import Large Data from MongoDB
This example shows how to import a large set of flight data from a MongoDB® collection into the MATLAB® workspace using the Database Toolbox™ interface for MongoDB. To avoid out-of-memory issues with the Java® heap when retrieving many documents, use a loop to import large data in batches.
To run this example, you must first install the Database Toolbox interface for MongoDB. For details, see Database Toolbox Interface for MongoDB Installation.
Connect to MongoDB
Create a MongoDB connection to the database mongotest
. Here, the
database server dbtb01
hosts this database using port number
27017
.
server = "dbtb01"; port = 27017; dbname = "mongotest"; conn = mongo(server,port,dbname)
conn = mongo with properties: Database: 'mongotest' UserName: '' Server: {'dbtb01'} Port: 27017 CollectionNames: {'airlinesmall', 'employee', 'largedata' ... and 3 more} TotalDocuments: 23485919
conn
is the mongo
object that contains the
MongoDB connection. The object properties contain information about the
connection and the database.
The database name is
mongotest
.The user name is blank.
The database server is
dbtb01
.The port number is
27017
.This database contains six document collections. The first three collection names are
airlinesmall
,employee
, andlargedata
.This database contains 23,485,919 documents.
Verify the MongoDB connection.
isopen(conn)
ans = logical 1
The database connection is successful because the isopen
function returns 1
. Otherwise, the database connection is closed.
Determine Number of Documents to Import
Find the total number of documents totaldocs
in the
airlinesmall
collection for the years 1997 through 2010.
Use a MongoDB query to filter the flight data for the specified years.
collection = "airlinesmall"; mongoquery = '{"Year":{$gte:1997,$lte:2010}}'; totaldocs = count(conn,collection,'Query',mongoquery);
Retrieve Large Data in Batches
Estimate the batch size to be 15,000 documents. Define the MATLAB workspace variable for storing the retrieved data.
batchsize = 15000; flightdata = [];
You can change the batch size depending on the performance and memory capacity of your system.
Use a while
loop to retrieve flight data from the
collection. The variable flightdata
accumulates each batch of
retrieved data.
% Track number of documents read index = 0; while index < totaldocs % Retrieve documents in a batch localdata = find(conn,collection,'Query',mongoquery, ... 'Skip',index,'Limit',batchsize); % Store retrieved documents locally flightdata = [flightdata; localdata]; % Move to the next batch index = index + batchsize; end
Display information about the flightdata
variable. The
retrieved data is a structure array that contains 75,603 structures. Each
structure contains 30 fields of flight data.
whos flightdata
Name Size Bytes Class Attributes flightdata 75603x1 285102752 struct
Close MongoDB Connection
close(conn)
See Also
mongo
| isopen
| close
| count
| find
| while
Related Topics
- Import and Analyze Data from MongoDB
- Import Filtered Data from MongoDB
- Export MATLAB Data into MongoDB
- Database Toolbox Interface for MongoDB Error Messages