MongoDB Indexing : Part 1


The Basics

Lets say you have a statement like

db.foo.find({x:2})

What does the server do in order to find this document?  It does an innocuous  for-each loop looking for the values, document by document ( assuming the documents were placed contiguously and there was no other logic involved in storing the document) until it finds the value (trying to make a point so I am over simplifying it).

Obviously what I have just stated is a very slow & inefficient way for searching the document. The solution is to create an Index ( as in all other modern databases). Each document has its own location on disk and an index logically holds mapping to those locations from field values.

in the example above an index on field X for the collection foo has an entry for each possible value of  X associated with a list of document locations. Each of those documents, contain that value or key. So if we have a bunch of documents with X=9 or X=10, and we want to find documents with the value 10, It would look into the index and map directly back to the document and this is  a much more efficient and faster way to look up a value. Being able to jump directly to the disk location where a document is stored is great, this means we don’t have to load many documents into memory, reduces I/O,  the query math to figure out if a document matches the query or not can be done against the index rather than against the document itself.

Sorting

If the index keys are in a certain documents directions, documents can be sorted with the assistance of indexes. This is of great advantage because documents may not be physically  stored in the sequence specified by the column being sorted on, but if they are indexed by that column and used as a key index field, the issue can be relieved and sorting will be much faster

Indexing in Mongo

Indexe types in MongoDB Regular(B-Tree), Geo, Text, Hashed, TTL

Indexes in MongoDB Regular(B-Tree), Geo, Text, Hashed, TTL

The regular index is an index on a single field or multiple fields with multiple values as well.

The GEO index is optimized for geographical queries. It does not have to be about geography. It supports proximity of points to a center, letting you do queries like things near something or sort by proximity or nearness to a certain point. Ex. find me restaurants around these location

The text index allows you to do searching like search engines do, parsing text queries and comparing them against text fields. It is great because you can use Mongo instead of another installation of a search engine to index all your documents as a seperate operation.

The Hashed index is mainly used in the context of sharding. It allows you to index on a certain field, but have the key values be more evenly distributed instead of clustered. This allows sharding and spread documents more evenly across the shards.

TTL is a (Time To Live) index , this supports expiring documents. Using a TTL index, you can designate a date time field on your document to be an expiration date and Mongo will automatically remove this document from your collection.when it expires. This reduces your overhead in writing batch jobs to expire documents and remove them.yourself.

Create Index

ensureindex

The ensureIndex command has 2 parameters. The first one says that which ields in the document will be used as keys in the index. It specifies which fields and in what order to build the index and weather its a Geo or text . In case of a mutifield index, you can specify a  sort order that matches your expected query key sort order. It also specifies the intended use of an index, as a text search index or as a geo  index.

The second parameter has the following options

  • provide name of the index  if you do not like the index provided by Mongo
  • Build the index immediately (blocking every other operation) or build it in the background. This is very relevant to performance.
  • weather the index is unique and prevent insertion of other documents with the same values
  • Sparse index
  • TTL index
  • language index, which spoken language the text search will be used.

Lets try an example.

>db
Test
db.animals.find({name:'cat'})
--{ <span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">returns</span> document on cat with&amp;amp;amp;quot; name&amp;amp;amp;quot;:&amp;amp;amp;quot;cat&amp;amp;amp;quot; }
-- but we are not sure what index mongo used or if an index on name exists
&amp;gt;db.system.index.find({ns:'test.animals'}, {key:1})
{"key": {'_id':1}}
-- shows that there is one index on the field ID
-- so there is no index on the name of the animal

This can be used to tell weather an Index exist, but not weather an index is being used.

for this e can use the explain() method

&amp;gt;db
Test
db.animals.find({name:'cat'}).explain()
&lt;span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;"&gt;"cursor": "basicCursor", // Indicates no index was used&lt;/span&gt;
"isMultiKey": false,
"n":1,
<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">nscannedObjects: 6,</span>
<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">nScanned :6</span>
<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">nscannedObjectsAllPLans: 6,</span>
<span style="color: rgb(255, 0, 0);" data-mce-style="color: #ff0000;">nScannedAllPlans :6</span>
....
....
..... 

The explain command will tell us how mongo is going about to find the document. In the above query no index was used. lets try to add an index using “ensureindex”

db.animals.ensureIndex({name:1}). This means that the index is on the name and “1” means that the index is ascending (-1 for descending index)

I we run the system.index.find command we will now see that an index is used

>db.system.index.find({ns:'test.animals'}, {key:1})
{"key": {"_id":1}}
{"key": {"name":1}}

Now i we were to do an explain

db.animals.find({name:'cat'}).explain()
"cursor" : "Btreecursor name_1",
isMultikey: false,
"n":1
nscannedObjects: 1,
nScanned :1,
nscannedObjectsAllPLans: 1,
nScannedAllPlans :1
....
...
....

This time  we can see that it is using BTree index named name_1 and it will fire that index  on a search. It also shows that the number of scanned objects is 1, before it was 6, this shows us that Mongo used the index to find just the document that we specified in the query rather than scanning all the documents.

This is part of indexing in Mongo. Part 2 will cover the remaining topics on indexing

For all your application development needs, visit www.verbat.com for a fiscally conscious proposal that meets your needs ( So I can keep this blog going as well!!!!)

Alternatively click through the link   if you found this article interesting. (This will help the companies Search engine rankings)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s