I have been working with MongoDB for the past year or so and I feel as if I have earned the right to have an opinion on it now. This is going to be an unpopular opinion, but it’s okay I have plenty of those. I don’t mind catching flack from people when I know I have the facts on my side. Being a fan-boy isn’t an excuse for poor execution.
What is MongoDB good for
- Speed – it’s blazing fast for reads.
- It’s relatively easy to work with as you can basically plop anything into a MongoDB collection and call it a day. Of course, that comes at a cost – mostly complexity (I will explain below).
- It’s pretty great for SMALL databases in my opinion.
- It’s okay for large databases, but again, at the cost of complexity (I will explain below).
- It’s excellent for FLAT structures. Meaning, you have NO intentions of performing joins between two collections. I am not saying that the documents themselves must be flat, they can have whatever structure you want them to have.
- When I am performing data analysis, I really enjoy being able to just throw objects I just created into a collection that doesn’t exist yet and not having to worry about defining a table schema. That’s actually pretty great.
- You can run MongoDB in a container which is very convenient.
I have used MongoDB now for one large datastore and a few smaller data stores. The experiences are divisive.
What is MongoDB not so good at
If you need a relational database, MongoDB is NOT for you. In other words, if you need to actually relate rows together by performing joins, you DO NOT want to use MongoDB. Use the right tool for the job. Again, it’s good for FLAT structures. Meaning the collection is self contained and does not have dependencies on other collections. Your documents themselves can have nested structures that doesn’t matter.
I am going to explain each pain point as its own section of what MongoDB is not great at.
High coupling to code
MongoDB has far too much dependence on code, which to me feels like bad design. This is a loaded statement so please allow me to explain before you rage comment:
- Your chosen language will unfortunately dictate a lot of Mongo’s behaviors. This means programmers of different languages are having differing levels of experience. I will expand into a deeper example of this in a separate section about how MongoDB handles UUIDs. For this list item, the best example I can give is of MongoDB having a problems with their wait queue (connection pool) filling up too easily when working with asynchronous programming. Specifically with the C# driver MongoDB seems to completely fail in this regard and they don’t have a solution for it.
- https://www.mongodb.com/community/forums/t/mongowaitqueuefullexception-when-wait-queue-is-still-empty/201708
- https://jira.mongodb.org/browse/CSHARP-1180 – this ticket is closed, but I guarantee you it’s still not working.
- https://stackoverflow.com/questions/37322110/mongowaitqueuefullexception-the-wait-queue-for-acquiring-a-connection-to-server
- There are plenty of examples of the C# driver failing, also you can reproduce this problem easily by making a ton of requests to MongoDB asynchronously. The driver just fails to pool correctly. I don’t know what else to say and there is no work around other than accessing MongoDB synchronously which is exactly what I have done and I don’t have this stupid problem anymore.
- When you explain to some people what problem you are having with MongoDB and they learn you are using C#, they will say something flippant like, “Just use Python or Node.js” – well thanks asshole, but that’s not an option for me. I dunno, but maybe the driver should work?
- Your code shape will change dramatically the moment your documents are no longer uniform. It will happen, it’s just a matter of when.
- You have to configure your code to use certain modes in order for it to continue to work correctly. If you forget to do this in a new project, you will pay for it by your code just seemingly not working with little to no explanation.
- MongoDB itself ties many of its data types to functions, which is dangerous. Instead of your data just being immutable, it is being evaluated on the fly by a MongoDB function using your serialized data which uses proprietary serializations. This is dangerous for one very good reason – who is to say those functions won’t work differently in the future? Why isn’t my data flat?
MongoDB Compass is not great
MongoDB Compass is to MongoDB as SSMS is to SQL Server. It’s the data navigator and query editor. It’s not great. I strongly recommend you use Studio 3T instead. It is far more sophisticated and it helps you learn how to query MongoDB. Compass is very plain, the UI is pretty, but the UX is lacking. Studio 3T is what Compass should be.
IDEs aside, coming from a strong SQL background myself (mySQL & SQL Server) I can say that querying MongoDB is joyless and frustrating. Learning new syntax isn’t the problem, it’s the inconveniences that come with MongoDB that annoy me such as:
- Not being able to save a query to a file and give it to a colleague.
- Not being able to save an aggregation to a file and give it to a colleague.
- In general just not being able to work out a query in a regular editor window.
Eventually, when you become frustrated enough, you just stop using Compass or even Studio 3T to do serious querying and move back over to code. I in particular use LinqPad to do meaningful queries against MongoDB and I can save those queries to a file. Once again tying MongoDB strongly bad to code.
I understand the paradigm is different and I shouldn’t fight the pattern, but honestly this is just a big inconvenience. I have written a lot of horrible things in SQL, I should be able to do the same with MongoDB.
Indexing and searches
For the purposes of this section, assume we have a collection with fifty million documents. This is based on a real world scenario.
Searching without an index
MongoDB champions itself as “FAST READS!” and it’s true, but really it’s only true if you have an index for what you are searching. SQL Server, for all of its faults, unindexed searches are not usually too bad even on tables with one-hundred million rows I am able to usually perform some kind of analysis without sweating the performance. I cannot say this about MongoDB at all. On a collection of fifty million documents, performing a search without an index is about a two minute plus wait. That’s just horrific performance. This is a fact. This makes performing analysis a very irritating exercise because YOU ARE NOT GOING TO INDEX EVERY QUERY PERMUTATION; nor should you. You don’t even do that in SQL Server because it’s bad for performance and it can potentially double the storage size of your table.
That being said – searching with an index is VERY FAST.
You will have an ObjectId whether you like it or not
This is just another fact, in every collection you create you will have:
- A non-negotiable property in your document schema named
Id
which is of proprietary typeObjectId
and it looks like this:ObjectId('63c5d9aa951bc9d97409a4a8')
- Please notice the function ObjectId().
- This is Mongo’s magical globally unique identifier.
- A non-negotiable index as part of your collection named
_id_
. It uses aUnique
constraint which makes sense.
You can think of this as a primary key index and a non-nullable column using a default constraint of NEWID().
It will exist whether you want it to or not.
Personally, I do not recommend using this ID outside of MongoDB for any reason. You can use this ID for linking documents together inside of MongoDB only. I don’t trust this identifier as we cannot predict how Mongo is going to change the
ObjectId function in the future. Seeing as how poorly they implemented UUID support, I don’t think they have earned trust for their own interpretation of a GUID.
CUD or everything else besides reads
Ok, so I have said already several times that MongoDB is great for reads, blazing fast! Well guess what! That’s only the “R” from “CRUD”. MongoDB is FUCKING TERRIBLE at Creates, Updates and Deletes! Of course, this depends on how large your collection is. Small collections, you aren’t going to notice a difference. Large collections, oh boy you will feel it and how.
Keep assuming we have a collection with fifty million documents.
First thing’s first – if you plan on doing mass inserts – DO NOT use Mongo’s InsertOne method in a loop, it will be very slow. There is a bulk version InsertMany and it works far better. The same statement is mostly true for updates and deletes.
Blanket updates
You provide filters (like a where clause) to find the documents to work on. This doesn’t work so great for updates if your intention is to perform a specific update per document. It only lends itself to updating all desired properties to one value. So in other words, if you need to perform a blanket update then this is fine, otherwise you are doomed to doing one by one document updates if you need to perform more complex evaluations.
Mass deletes
MongoDB’s biggest weakness is CUD! It really is terrible at performing CUD operations. Taking the fifty million document collection, if you attempted to deleted all of those documents you would be waiting many hours for it to happen. You are better off dropping the collection and starting over. Dropping the collection takes seconds.
- There is no equivalent of
TRUNCATE TABLE
in MongoDB. - You also CANNOT rename collections! You can only drop them and recreate them. Poor design.
Schemaless schema
This is by far the biggest lie I have heard about MongoDB, that it is schemaless. That’s very far from the truth. Logically, it’s impossible. Every datastore has a schema otherwise how are things queried, indexed or optimized for? Additionally, we all design objects to put into a collection. That object IS the schema! So saying it’s a schemaless database is disingenuous and a sham that many SQL haters got themselves caught up into. I remember when RavenDB, MongoDB and others were in their infancy and people were losing their minds over it because – and I quote, “I don’t have to care how my data is being stored” – I looked at them and laughed because I knew they were wrong. If you want to report off of your data, you need to know how it’s being stored – so this is a lie.
Further more, if you draw your eyes over to the word that is enclosed by the red rectangle in the screenshot of MongoDB Compass, you will see what word?
SCHEMA
So explain to me – how is it that they boast MongoDB is schemaless when it clearly isn’t?
Don’t let dubious and unscrupulous marketing buzzwords rug you.
What happens when you schema changes?
Over the life of your application, your schema will change, it’s just the nature of software development. In a SQL context you would just update your schema, update your data and call it a day. It’s not without problems, but it isn’t impossible either. Then there’s mongo…
With Mongo you will be in for a rude awakening for the following reasons:
- You cannot perform mass updates. If you did attempt to perform a mass update, your update probably won’t finish for days assuming we are still talking about a large collection.
- You will more than likely have to update your repository layer to now support your new object’s shape and now use
BsonDocuments
to account for when a field does or does not exist. This means changing a lot of code to account for two schemas existing simultaneously. How’s that schemaless schema working out for you? - Since you cannot perform a mass update you must now perform what is called a slow rolling migration, which is another way of saying cluttering up your program to perform schema maintenance on the fly live for as long as it takes.
Slow rolling migrations are irritating
To perform a slow rolling migration, you now have to purposely put tech debt into your code so that you can support two schemas. The idea here is that your code will detect what version of a document you are dealing with. When it finds the old version it has to upgrade it to the latest version. This is as irritating as it sounds.
Again, going off of a real world example of fifty million documents, I have a slow rolling migration going on right now and it’s only 22% converted. This has taken six months. I hate this.
Running an aggregation to find out what my conversion rate is takes about forty minutes. I only run this query once a month because it’s so goddamned slow.
If it were up to me, I would put in the effort to just mass update all documents as quickly as possible this way you are not polluting your code base or worse. The worse is now if other people, even on your own team, aren’t aware that they must upgrade your documents they can produce unintended results and now your data has been damaged.
UUID and GUID handling
Mongo’s handling of Guids is terrible. The reason it’s terrible is for several reasons:
- They are not enforcing version 3 UUID representation (
GuidRepresentationMode
), so it
defaults to version 2 which is deprecated.
a. https://bsonspec.org/spec.html you can see here that\x03
is marked as UUID
(Old). - The way UUID are serialized and encoded quintessentially depends on the driver handling the data. This means it is implementation specific; which means it is language specific. This is a large cause for concern because if for example a Java program accessed C# program’s stored GUID, Java’s driver is going to interpret that UUID DIFFERENTLY. This is comically bad and dangerous.
- This is what a MongoDB UUID looks like:
BinData(3,'dThb3YvrIEeH3lYU3haIAA==')
- This is not human readable.
- Mongo does not provide a simple way to convert this inside of Compass from encoded binary data to a human readable string. They provide this uuidhelpers.js which you have to pre-load into Compass to convert your Guids to something legible. Even this helper has dedicated functions per language.
Now if you can read all of that and say, “Yeah this is fine.”, then I would be afraid for your data. It is for these EXACT reasons I am opted to use strings to store Guids for the sake of human readability and ease of querying. It has a negligible impact on performance. In my particular case, the Guid is being issued to consumers outside of my datastore. Everyone knows what a Guid is, as it is a well-known type and it’s not proprietary.
You know what’s not a well-known type and is proprietary? That’s right – ObjectId
. It’s why I wouldn’t dare hand this out to anyone needing a reference back to their data. Especially since it relies on a function that could change at any moment.
I have absolutely no idea why any of the MongoDB designers would say to themselves:
- Let’s make the UUID data type, language dependent – this is a good idea.
- When people want to query based on a UUID, if they use our BSON version of it – they won’t be able to query it unless they use a side loaded script that is STILL language dependent!
I’m sorry, no disrespect, but this is just bad design.
Just use a string
When you use the string interpretation of a Guid:
- It’s NOT language dependent.
- It’s human readable data.
- I can QUERY IT without a special side loaded script.
- I have confidence that the Guid I put into my database will come out looking exactly the same regardless of which language is accessing the data via its driver!
The only downside to using a string for Guid representation is that they are case sensitive, so choose up front if you are using lowercase or uppercase and stay consistent! I recommend uppercase if you plan on cross querying SQL Server.
Deploying MongoDB databases
- Deploying MongoDB databases is awkward because you can’t script it. Instead, you have to write a program to do it for you. This continues with that theme of MongoDB being too tightly coupled to code.
- Need to update an index? Yep, do it via code.
- Need to drop an index? Code.
Replication
I am still very unclear on how to replicate a MongoDB database. I know it’s possible, I am just not clear on how to do it properly or safely so I can’t write about it.
Conclusion
Over all, not impressed with Mongo. I feel like it produces more problems than it solves. However, it is fast. I do like using it for smaller projects, nice to not have to worry about data shape. For larger databases, I don’t know that it really makes business sense to use unless it is being used only for reads. As per usual – use the right tool for the job. It’s not bad, but not great either.
To be completely fair, SQL Server is a total slog when there is too much data, even with indices in place sometimes. In contrast though, I can still find data I am looking for relatively quickly and I can script against it without needing a coding language.