MongoDB CRUD Operations


Previous Article: Data Storage Internals

Create Documents

Assuming you are in the test database in Mongo and you want to find out the tables (There are no tables in Mongo) and collections (only collections) in the database.

>DB
Test
>Show collections
>db.Foo.Save({_id:1, x:10})
>db.Foo.find()
{"_id":1, "x":10}
>Show collections()
foo
system.indexes
>db.Bar.Save({_id:1, x:10})
>show collections()
Foo
Bar
System.indexes

Since it is a test DB there are no collections.

Remember that collections in Mongo defines the scope of interaction with documents

You can issue commands against a specific collection to store and retrieve data. Being a flat database you cannot issue relational queries across different collections. In the above save command “db”  means we are operating against the “test” database.  “Save” means we are saving a record.”Foo” is the name of the collection into which we are saving the document into. “Find()” is the command to retrieve the record.

If you notice that the second time the issue of the show collections command returns 2 results: Foo & system.indexes. remember the one rule that every document must have an ID, that’s because in order to have fast access, we need an index on the ID field. The code that follows shows the addition of a second collection “Bar”. Pretty  intuitive, right?

Take a look at the system.indexes

indexmongo

Mongo has created an index on the ID field on the test.foo collection and the test.Bar collection. Test being the database, Foo & Bar being the collection. Before we continue any further, lets talk about what kind of datatypes the ID field supports.

indextypes

As you can see the datatype of an ID field can be an type as long as its not an array. ( Although its possible to convert the array into binary data and save it as an ID).

The Object ID

objectidmon

You can save a document in Mongo without specifying the ID. In this case Mongo generates  an ObjectID. Observe how “bob” is saved without an ID but Mongo generates an ID as can be seen in the find() method.

ObjectID’s in Mongo are unique and it is embedded with a time stamp. Therefor you can retrieve the Object creation  time stamp of an objectID. This is particularly useful because we do not need to store a separate field for document creation time. Other advantages include

Objects are created & inserted roughly in an ascending order. This is particularly useful for sorting and indexing because the serve can append the document to the end of the index rather than finding space within the data structures such as B-trees, which would require re-balancing and shuffling around of memory, before a document can be inserted.

While using ObjectID’s are great for aster insertions , you might need a different strategy for faster reads. This might mean that you would have to create your own id’s that will support closer proximity of related documents (or some such criteria)

Mongo Save vs Insert

mongosavevsinsert

If you save a document using the same ID. Mongo will  allow you to save the document. but it will only record the latest changes. (see example above). so if you trust your ID generation system then by all means use save() command, if not use the insert() command, as it will reject any inserts with a duplicate ID.

insertwithouid

Inserting without ID does not make sense in all cases. for example we have inserted two documents with the same value in  collection “aaa”. This generates a seemly duplicate document (because only the ID’s are same but the  values are same.)

Compare that to inserting a user with the email address as the ObjectID.  In this case if the user accidentally presses the submit button twice, a duplicate record wont be created.

Insert Complex Documents

different-records

Following the previous example, if we look up the inserted customer record, you will notice two fields with array type data viz “Address” & “Data”. If we do another insert with a seemingly different data format, Mongo will still accept it. (notice that the address format is different in both the documents. as is a new array field called “Logins” in the new document.

This is perfectly legal in Mongo and is infact encouraged. we have two documents that are completely different from each other in the document collection.. This allows you to build very flexible  schema for complex business requirements rather than imposing strict schema requirements ( having rules to populate  null values & empty fields etc.)

Insert & Save with Update ( Concurrency Management Issue )

manualincrement

Insert and save are great ways to get data into the collection, but there are several issues when it comes to updating the document. In the example above , we created a document with value of x=10, but we now need to increment the value by 1. So we create a small JavaScript that retrieves the value and increments it by 1. and we can now save the document and it overwrites the previous value. There are several problems to this. Let me elaborate

  1. If somebody on some other thread in some another location tried to read the same document with the value x and tried to increment the value themselves, we now have a stale value of X. My intent was to increment the value of X from the current value in the database by 1, but instead I overwrote the value of X that someone else had updated with a stale version of X. So now we have a concurrency management issue. There is no versioning management built in by default and I could hold that record under lock for a long while.
  2. Another related issue is when a second client reads the document and decides to update the document with an extra field ( See extra field Y =3 above). but the first client incremented the value of X and over wrote the second clients update, we have another incorrect version of the document. ( we just deleted some data)

Mongo Update Command

Mongo’s update command resolves the issues described above. the update command is “Atomic within a document“. No two clients may update the same document at the same time. Two update commands issued concurrently will be executed one after the other. The syntax of the update command is as below.

syntax

 

  1. First you need to specify which collection you wish to update.
  2. Next you need to specify which document(s) you are targeting. This is done by issuing a Mongo query.
  3. Third, you will need to specify what change you wish to see enacted (the update parameter)
  4. Last you specify options such as ,
    • do you wish to update the first document matching the query
    • multiple documents, as in any document matching the query
    • if no document match is found , do you want to generate a new document on the fly and insert it.
    • The options parameter is optional

Lets look at a real example to resolve the issues that we glossed over before (concurrency)

Lets look at scenario 1 under concurrency management

Increment a field

increment

 

Mongo uses the update command  and we can use Mongo’s increment operator to increment the value ({$inc:{x:1}}). i.e. a field name and the amount with which you want to increment it . If 2 concurrent clients would have issued an increment, both of them would have incremented eventually, one or the other before the other, but the field would have been incremented from whatever it was inside the server, which is what we want.

Now lets look at scenario 2 under concurrency management.

Add a New Field

 

update

let me recap, one client is trying to add a field to the document while another is trying to increment a field to the document.

To add a new field, we use the update  command with the set option (create a new field name and assign a value, as seen in the example above)

If another client wanted to increment the value of X, only specifying the value of X to be incremented. As you can see in the example, they don’t need a full version of the document, you just have to say, find me that document and increment that field without having any knowledge of the other parts of the document.

Delete a Field

vlcsnap-2016-09-24-21h08m16s266

You can delete a field using the unset command. Weather you specify a value or not (‘ ‘ vs 0) does not make a difference

Rename a Field (Self Explanatory)

vlcsnap-2016-09-24-21h19m21s566

Array Operations

Push

push

Lets start with a document that has only an ID (Line 3). we can use the update command to add an item to the array. Irrespective of weather an array was present in the document or not, an array will be created (see line 6). We can continue to push items ‘two’ and ‘three’. We can even push a duplicate item ( three in the example above).  Notice that ‘three’ is repeated twice in the array. This can be detrimental depending on the situation. To prevent this we can use the “addtoset” operator. The purpose of this operator is to prevent duplicates from being inserted. If the element is already present then it will reject the insertion. (see exampple above, where we are trying to insert ‘four’ twice’)

Pull Operator

pull

continuing from the prior example, since we have ‘three’ inserted two times, lets get rid of it. We can do that using the “Pull” operator.

Pop Operator

pop

Pop operator allows you to pull things out of the array if you didn’t know the value. Using Pop with 1 would pull the last value in the array. Using -1 would pull the first one in the array. We can continue to pop things out of the array even if the array is empty and it wont return an error.

Strings

strings

The Push,Pull & Pop operators work on  arrays. If your document had a string instead it wouldn’t work. The document schema is completely flexible, but it does not mean it is strongly typed. The BSON type field is not an array , therefor these operators cannot be applied.

Multiple document updates

qry

In the scenario  above, we have multiple records in the database and we want to update several of them. Lets try to push values to several of them . We can do that using the update command with an empty query to say anything matches. As you can see only one record was updated and appended with the number 4. This is because the default behavior of an update is to effect only one record.

If we wanted to effect multiple records we use the option multi field set to ‘true’. The first one has the number 4 repeated twice, that’s because we used the option ‘push’ command  instead of the ‘addtoset’

If we only wanted to update the documents with the value ‘2’ in the array things, we could change the query to specify things with a value of ‘2’ to be appended with ’42’. As you can see from the results above thearrays with the number 2 has 42 appended to it.

Find And Modify

findandmodify

The update command works fine if you are interested in modifying single or multiple records, but there is a more concise command to update a single record, ‘FindAndModify‘.. The signature of the command is as above.

Just like before you would need to know which collection you are looking into (Foo collection in this case) , the next parameter is to specify which exact record to modify.

You might get more than one document with your search criteria, the sort order allows you to specify weather you want to update the first or last document in the query.

Next we need to specify the change we want to make to the document such as an upsert.  If we set upsert to true, it will create a new record if one does not exist or update an existing record that matches the criteria or remove a record that matches the criteria.

Boolean parameter “Remove” can be used to delete a document.

The “NEW” parameter is interesting because if it is set to true, it will return the document after the change was made to it else return before the change was made to it.

The “Fields” parameter lets you specify the subset of the data within a document, so it can potentially reduce traffic (remember a document can be upto 16MB)

Practical Example

examplepart1

We have created a couple of records and to make things simple we have scripted an object that contains a query. the query specifies

  1. find me records that has i in the things array
  2.  An update that sets the value of touched to true
  3. and sort by ID in descending order (-1)

using MOD we do a findAndModify to get the matching record before it was modified. ( Remember that by default it only returns one record hence recordID=1 was omitted). Notice that it does not have a touched field. This is because we didn’t include the  “New” field. If the attribute is omitted it defaults to false and hence returns the value before the document was modified

When we do a lookup again db.a.find(), low and behold the record now has the touched field.

Lets change the touched field to false , but return the record after it was modified.

examplepart2

We add the  field called “new” and set it to true and change the touched field to false like above.

Now it returns the document after the modification ( because we set the “New” field to “TRUE”. Before the modification touched=true, after the modification touched=false

touchedfalse

Lets change the sort to find the first record rather than the last record(change to 1 in the mod: mod.sort.id=1). so its ID ascending not descending. The new search returns a esult with the objectID=1 because it is the first one in the result.

For more information please visit www.mongoDB.org

Summary

Saving data in a flat database like MongoDB is very different from a relational database. So far in this series we have covered the following topics till now.

capture

Next up : More details on finding documents

For all your application development needs, visit www.verbat.com for a fiscally conscious proposal that meets your needs ( So I can keep this blog going as well!!!!)

Alternatively click through the link   if you found this article interesting. (This will help the companies Search engine rankings)

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s