How To Store And Update Time Series Data In Mongodb

MongoDB 5.0 Time Series Collections In a previous article, I tested a new feature of MongoDB 5.0: resharding. Today, I take a look at another new feature: the Time Series collections.

The Time Series collection is an astonishing new feature available in MongoDB 5.0. Based on the beginning tests I take done, the Time Series support provides comparable operation to the index usage on regular collections only saves a lot of disk and retentiveness space. Aggregation pipelines, which are common queries you can run on fourth dimension series information, tin can get even more do good.

Let'southward offset the tests.

What is a Fourth dimension Series Database?

More often than not speaking, a Time Series database is a specialized database designed for efficiently storing data generated from a continuous stream of values associated with a timestamp. The typical utilize case is when you need to store data coming from sensory equipment that transmits data points at fixed intervals, but now they are used in support of a much wider range of applications.

Typical apply cases are:

IoT data
Monitoring web services, applications, and infrastructure
Sales forecasting
Understanding financial trends
Processing self-driving auto information or other physical devices

A Fourth dimension Series specialized database utilizes compression algorithms to minimize the space requirement and also provides access paths to dig more efficiently into the data. This improves the performance of retrieving information based on time range filters and accumulation data. They are more efficient than using a common relational database.

Usually, the values of a Time Serial shouldn't change in one case recorded, they are divers every bit INSERT only, also known equally immutable information points. Once the information is stored the update operation is really uncommon.

Another characteristic of Time Series is that every detail should have a single value (a single temperature, a stock cost, and and then on).

Popular Fourth dimension Series databases are InfluxDB, Prometheus, Graphite. At that place are likewise many others. VictoriaMetrics in item is a popular fork of Prometheus and is used in our Percona Monitoring and Direction software.

The New Time Series Collections in MongoDB 5.0

MongoDB, too as relational databases, has been widely used for years for storing temperature information from sensors, stock prices, and any other kind of unchanging data over time. MongoDB version 5.0 promises that this can exist done more than efficiently, so let's take a look at how it works.

A Time Series collection appears as a regular collection and the operations you can practise are exactly the same: insert, update, find, delete, aggregate. The main deviation is backside the curtain. MongoDB stores data into an optimized storage format on insert. Compared to a normal drove, a Time Serial is smaller and provides more query efficiency.

MongoDB treats Fourth dimension Series collections every bit writable non-materialized views. The information is stored more efficiently, saving disk space, and an automatically created internal index orders the information by fourth dimension. By default, the data is compressed using the zstd algorithm instead of snappy. The new compression provides a higher ratio, less CPU requirements, and it is well suited for time series information where there are few variations from one document to the side by side 1. You can somewhen modify the pinch algorithm, but it is not really recommended.

A Time Series drove is non implicitly created when y'all insert a document, the aforementioned every bit regular collections. Y'all must create it explicitly.

Let's do some tests.

Create a Time Series Collection for Storing Stock Prices

We need to apply the createCollection() method, providing some parameters.

[ straight : mongos ] timeseries > db .createCollection (

"stockPrice1week" , {

timeseries : {

timeField : "timestamp" ,

metaField : "metadata" ,

granularity : "minutes"

} ,

expireAfterSeconds : 604800

}

)

{ ok : one }

The name of the collection is stockPrice1week and the only required parameter is timeField. The other parameters are optional.

timeField: the proper noun of the field where the date is stored. This will exist automatically indexed and used for retrieving information.

metaField: the field containing the metadata. It can be a uncomplicated scalar value or a more than circuitous JSON object. It's optional. It cannot be the _id or the aforementioned as the timeField. For example, the metadata for a temperature sensor could be the code of the sensor, the type, the location, and then on.

granularity: possible values are seconds, minutes, and hours. If not set, it defaults to seconds. If you specify the closest match between two consecutive values this volition assist MongoDB to store data more than efficiently and improve the query performance.

expireAfterSeconds: you tin can automatically delete documents after the specified time, the same equally TTL index. If non specified the documents will non expire.

Allow's insert some random data for three stocks: Apple, Orange, and Assistant. Data is collected once per minute.

four

eight

nine

ten

fourteen

[ direct : mongos ] timeseries > var stockPriceDate = ISODate ( "2021-x-13T00:00:00.000Z" )

[ direct : mongos ] timeseries > var priceApple = 100

[ direct : mongos ] timeseries > var priceOrange = 50

[ direct : mongos ] timeseries > var priceBanana = 80

[ direct : mongos ] timeseries > for ( i = i ; i < 100000 ; i++) {

priceApple = priceApple + Math .random ( ) ;

priceOrange = priceOrange + Math .random ( ) ;

priceBanana = priceBanana + Math .random ( ) ;

db .stockPrice1week .insert ( { "timestamp" : stockPriceDate , "metadata" : { "stockName" : "Apple" , "currency" : "Dollar" } , "stockPrice" : priceApple } ) ;

db .stockPrice1week .insert ( { "timestamp" : stockPriceDate , "metadata" : { "stockName" : "Orangish" , "currency" : "Dollar" } , "stockPrice" : priceOrange } ) ;

db .stockPrice1week .insert ( { "timestamp" : stockPriceDate , "metadata" : { "stockName" : "Assistant" , "currency" : "Euro" } , "stockPrice" : priceBanana } ) ;

stockPriceDate = new Appointment ( stockPriceDate .getTime ( ) + m * 60 ) ;

}

We tin can query to check the inserted documents:

three

seven

eight

ten

eleven

xiv

sixteen

[ direct : mongos ] timeseries > db .stockPrice1week .discover ( ) .limit ( iii )

[

{

_id : ObjectId ( "6166df318f32e5d3ed304fc5" ) ,

timestamp : ISODate ( "2021-10-13T00:00:00.000Z" ) ,

metadata : { stockName : 'Apple' , currency : 'Dollar' } ,

stockPrice : 100.6547271930824

} ,

{

_id : ObjectId ( "6166df318f32e5d3ed304fc6" ) ,

timestamp : ISODate ( "2021-x-13T00:00:00.000Z" ) ,

metadata : { stockName : 'Orangish' , currency : 'Dollar' } ,

stockPrice : l.51709117468818

} ,

{

_id : ObjectId ( "6166df318f32e5d3ed304fc7" ) ,

timestamp : ISODate ( "2021-10-13T00:00:00.000Z" ) ,

metadata : { stockName : 'Banana' , currency : 'Euro' } ,

stockPrice : eighty.17611551979255

}

]

Check the Collection Size

At present, let'southward create a regular collection having the aforementioned exact data.

[ direct : mongos ] timeseries > db .stockPrice1week .discover ( ) .forEach ( function ( md ) {

db .stockPrice1week_regular .insertOne ( doc ) ;

} )

Let's check the full size of the two collections.

[ directly : mongos ] timeseries > db .stockPrice1week .stats ( ) .totalSize

5357568

[ direct : mongos ] timeseries > db .stockPrice1week_regular .stats ( ) .totalSize

21934080

As expected, the Time Serial collection is four times smaller than the regular one. Also, consider the regular collection doesn't accept any secondary alphabetize at the moment.

Query the Collections

Allow'due south run a simple query to find out the stock values for a specific timestamp. We exam the query on both collections.

iii

vii

eight

nine

sixteen

eighteen

thirty

[ straight : mongos ] timeseries > db .stockPrice1week .observe ( { "timestamp" : ISODate ( "2021-ten-23T12:00:00.000Z" ) } )

[

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f5" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Apple' , currency : 'Dollar' } ,

stockPrice : 7636.864548363888

} ,

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f6" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Orange' , currency : 'Dollar' } ,

stockPrice : 7607.03756525094

} ,

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f7" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Banana' , currency : 'Euro' } ,

stockPrice : 7614.360031277444

}

]

[ direct : mongos ] timeseries > db .stockPrice1week_regular .find ( { "timestamp" : ISODate ( "2021-ten-23T12:00:00.000Z" ) } )

[

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f5" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Apple' , currency : 'Dollar' } ,

stockPrice : 7636.864548363888

} ,

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f6" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Orange' , currency : 'Dollar' } ,

stockPrice : 7607.03756525094

} ,

{

_id : ObjectId ( "6166dfc68f32e5d3ed3100f7" ) ,

timestamp : ISODate ( "2021-10-23T12:00:00.000Z" ) ,

metadata : { stockName : 'Banana' , currency : 'Euro' } ,

stockPrice : 7614.360031277444

}

]

We've got the aforementioned issue, but what is important here is looking at the explain() to run into the execution plan. Here is the explain() of the regular collection.

one

half-dozen

seven

xviii

[ direct : mongos ] timeseries > db .stockPrice1week_regular .find ( { "timestamp" : ISODate ( "2021-x-23T12:00:00.000Z" ) } ) .explain ( "executionStats" )

{

. . .

winningPlan : {

stage : 'COLLSCAN' ,

filter : {

timestamp : { '$eq' : ISODate ( "2021-10-23T12:00:00.000Z" ) }

} ,

direction : 'frontwards'

. . .

executionSuccess : true ,

nReturned : 3 ,

executionTimeMillis : 200 ,

totalKeysExamined : 0 ,

totalDocsExamined : 299997 ,

. . .

We didn't create whatever secondary index, so the winning program is a COLLSCAN, all documents must be examined. The query takes 200 milliseconds.

The following is the explain() of the Fourth dimension Series collection instead.

six

fourteen

sixteen

eighteen

[ direct : mongos ] timeseries > db .stockPrice1week .find ( { "timestamp" : ISODate ( "2021-10-23T12:00:00.000Z" ) } ) .explain ( "executionStats" )

{

. . .

executionStats : {

executionSuccess : true ,

nReturned : 3 ,

executionTimeMillis : 2 ,

totalKeysExamined : 0 ,

totalDocsExamined : eight ,

executionStages : {

stage : 'COLLSCAN' ,

filter : {

'$and' : [

{

_id : { '$lte' : ObjectId ( "6173f940ffffffffffffffff" ) }

} ,

{

_id : { '$gte' : ObjectId ( "6172a7c00000000000000000" ) }

} ,

{

'control.max.timestamp' : {

'$_internalExprGte' : ISODate ( "2021-10-23T12:00:00.000Z" )

}

} ,

{

'command.min.timestamp' : {

'$_internalExprLte' : ISODate ( "2021-10-23T12:00:00.000Z" )

}

]

} ,

. . .

Surprisingly it is a COLLSCAN, but with different numbers. The number of documents examined is at present only eight and execution time is two milliseconds.

As already mentioned, the Fourth dimension Series is a non-materialized view. It works as an abstraction layer. The actual information is stored into another system collection (system.buckets.stockPrice1week) where documents are saved in a slightly unlike format. It'due south not the goal of this article to dig into the internals, just keep in heed the dissimilar storage format permits mongod to fetch only a few buckets of data instead of reading everything, even if it is flagged equally a COLLSCAN. That'due south astonishing.

What Happens if I Create an Index on the Regular Collection?

Let's effort.

four

thirteen

xiv

eighteen

[ straight : mongos ] timeseries > db .stockPrice1week_regular .createIndex ( { "timestamp" : i } )

timestamp_1

[ directly : mongos ] timeseries > db .stockPrice1week_regular .getIndexes ( )

[

{ v : 2 , key : { _id : 1 } , proper noun : '_id_' } ,

{ v : 2 , central : { timestamp : ane } , proper noun : 'timestamp_1' }

]

[ direct : mongos ] timeseries > db .stockPrice1week_regular .find ( { "timestamp" : ISODate ( "2021-10-23T12:00:00.000Z" ) } ) .explain ( "executionStats" )

{

. . .

winningPlan : {

stage : 'FETCH' ,

inputStage : {

stage : 'IXSCAN' ,

keyPattern : { timestamp : 1 } ,

indexName : 'timestamp_1' ,

. . .

executionStats : {

nReturned : three ,

executionTimeMillis : 2 ,

totalKeysExamined : 3 ,

totalDocsExamined : three ,

. . .

Now the winning plan is an IXSCAN, the new index is used. Simply three keys examined, three docs examined, and 3 docs returned. The query takes two milliseconds.

So, information technology is every bit fast as the Time Series collection. There is not such a big difference; the order of magnitude is the aforementioned.

Likewise notice the same performance comes at the toll of having a larger collection at the finish because we have created a secondary index.

[ direct : mongos ] timeseries > db .stockPrice1week_regular .stats ( ) .totalSize

25251840

[ directly : mongos ] timeseries > db .stockPrice1week .stats ( ) .totalSize

5357568

For getting a comparable execution time, now the regular collection is five times larger than the Time Series.

A Query with a Time Range Filter

Let'south examination a different query looking for a range of timestamps. The following are the explicate() outputs.

[ direct : mongos ] timeseries > db .stockPrice1week_regular .find ( { "timestamp" : { $ gte : ISODate ( "2021-x-20T00:00:00Z" ) , $ lt : ISODate ( "2021-10-20T23:59:59Z" ) } } ) .explain ( "executionStats" )

{

. . .

winningPlan : {

stage : 'FETCH' ,

inputStage : {

stage : 'IXSCAN' ,

keyPattern : { timestamp : 1 } ,

. . .

executionStats : {

nReturned : 4320 ,

executionTimeMillis : seven ,

totalKeysExamined : 4320 ,

totalDocsExamined : 4320 ,

. . .

ane

xiv

fifteen

sixteen

xxx

[ directly : mongos ] timeseries > db .stockPrice1week .detect ( { "timestamp" : { $ gte : ISODate ( "2021-10-20T00:00:00Z" ) , $ lt : ISODate ( "2021-10-20T23:59:59Z" ) } } ) .explicate ( "executionStats" )

{

. . .

winningPlan : {

stage : 'COLLSCAN' ,

filter : {

'$and' : [

{

_id : { '$lt' : ObjectId ( "6170ad7f0000000000000000" ) }

} ,

{

_id : { '$gte' : ObjectId ( "616e0a800000000000000000" ) }

} ,

{

'control.max.timestamp' : {

'$_internalExprGte' : ISODate ( "2021-x-20T00:00:00.000Z" )

}

} ,

{

'control.min.timestamp' : {

'$_internalExprLt' : ISODate ( "2021-ten-20T23:59:59.000Z" )

}

]

} ,

. . .

executionStats : {

executionSuccess : true ,

nReturned : 6 ,

executionTimeMillis : 6 ,

totalKeysExamined : 0 ,

totalDocsExamined : 11 ,

. . .

The same as before. The execution fourth dimension is basically the same for both queries. The main problem remains the size of the regular drove that is significantly larger.

Only six documents are plain returned by the Fourth dimension Series, but it's not. If you execute the query for real yous'll get 4320 documents. The six documents mentioned by explain() refer to the documents that must be returned by the real drove below the non-materialized view.

Assemblage Test

On our Time Serial data, we would like to do some aggregation. This is a typical task: summate averages over a menses, find min and max values, and other kinds of statistics.

Let's suppose nosotros demand to calculate the average stock toll on a daily basis. We can use the post-obit aggregation pipeline for example:

one

seven

ten

fourteen

eighteen

nineteen

thirty

db .stockPrice1week .aggregate ( [

{

$ project : {

appointment : {

$ dateToParts : { date : "$timestamp" }

} ,

stockPrice : 1

}

} ,

{

$ grouping : {

_id : {

date : {

year : "$date.year" ,

month : "$engagement.month" ,

day : "$date.day"

}

} ,

avgPrice : { $ avg : "$stockPrice" }

}

] )

[

{

_id : { appointment : { year : 2021 , month : 12 , day : 4 } } ,

avgPrice : 37939.782043249594

} ,

{

_id : { appointment : { yr : 2021 , calendar month : 11 , day : 22 } } ,

avgPrice : 29289.700949196136

} ,

{

_id : { date : { year : 2021 , month : 10 , day : 27 } } ,

avgPrice : 10531.347070537977

} ,

. . .

As usual, let's take a look at the explain() of the aggregate against the two collections, just focusing on execution fourth dimension and documents examined.

[ direct : mongos ] timeseries > db .stockPrice1week .explain ( "executionStats" ) .amass ( [ { $ project : { date : { $ dateToParts : { date : "$timestamp" } } , stockPrice : 1 } } , { $ grouping : { _id : { engagement : { yr : "$date.year" , month : "$date.month" , 24-hour interval : "$date.mean solar day" } } , avgPrice : { $ avg : "$stockPrice" } } } ] )

{

. . .

executionStats : {

executionSuccess : true ,

nReturned : 300 ,

executionTimeMillis : 615 ,

totalKeysExamined : 0 ,

totalDocsExamined : 300 ,

executionStages : {

phase : 'COLLSCAN' ,

. . .

[ direct : mongos ] timeseries > db .stockPrice1week_regular .explain ( "executionStats" ) .aggregate ( [ { $ project : { appointment : { $ dateToParts : { date : "$timestamp" } } , stockPrice : 1 } } , { $ group : { _id : { date : { year : "$date.year" , month : "$engagement.month" , 24-hour interval : "$date.solar day" } } , avgPrice : { $ avg : "$stockPrice" } } } ] )

{

. . .

executionStats : {

executionSuccess : true ,

nReturned : 299997 ,

executionTimeMillis : 1022 ,

totalKeysExamined : 0 ,

totalDocsExamined : 299997 ,

executionStages : {

stage : 'PROJECTION_DEFAULT' ,

. . .

The aggregation pipeline runs 40 per centum faster with the Time Series collection. This should be more relevant the larger the collection is.

Conclusion

MongoDB five.0 is an interesting new version of the most pop document-based database, and new features similar Time Serial collections and resharding are amazing.

Anyhow, due to many core changes to WiredTiger and the core server introduced to facilitate new features, MongoDB 5.0.x is still unstable. Nosotros exercise not recommend using information technology for production environments.

Check the documentation of Percona Server for MongoDB v.0.iii-2 (Release Candidate).

Consummate the 2021 Percona Open Source Data Direction Software Survey

Take Your Say!