(This is a guest post by Michael Washington)
We enter the age of everything being made as a service from platforms, databases, APIs, etc. With these services, convenience is becoming more of an expectation but at what cost? More specifically lets talk about database as a service. I have recently been tasked with a challenging decision to build out a database architecture that ranges from 1GB to many petabytes for individual clients while keeping costs REASONABLE, whatever that means lol. So let’s dive in!
The client I’m working with visualizes large amounts of data that has been analyzed by data scientists. My team will transfer over the data to the dashboard to visualize the data.
MongoDB as service
Just build the damn thing myself
Features all databases will have:
- Global Availability
- Highly Available
- All costs are monthly except Google BigQuery
I pretty much use AWS for everything web scraping jobs, web applications and data analytics. So Redshift was natural for me to look at first.
They care of backup, durability, availability, security, monitoring, and maintenance for you.
Costs – Used Amazon Calculator
I feel pricing is a bit much for what I’m eyeing but amazon has consistently proven throughout the years to be a secure go-to solution.
Google BigQuery has treated me pretty well in past for visualizing data and works pretty well out the box. You can literally throw whatever data you have in there, write complex queries against and it will auto scale for you and your needs. The learning curve is pretty low. Here are some of the Key Features that stood out to me:
Auto Scale your data even into the petabytes!
Complex Queries and easy to write if you have SQL background
Easy to integrate with
Costs (From their website)
Google BigQuery costs are great when it’s between 1 GB to 1TB and possibly when you get in the petabyte range. Reason being is Big Query doesn’t index your data and they charge per query. If you have a web application with millions of users costs can go to unpredictable levels. If you have a great caching strategy this will prevent surging costs. From my experience BigQuery seems to best for production use with a predictable amount of users.
MLab (mlab.com) – MongoDB as service
This was the ideal situation for me as MongoDB is were most of my experience lie in the NoSQL world. I haven’t used this service but it seems like an ideal situation where you want to scale data from 1 GB to 1 TB. You can even choose where you want your data hosted which is pretty nice.
Costs (From their website)
1 GB – $180
1 TB – $3790
1 PB – ?
The costs for mlab.com seem pretty reasonable. The worry I have is the when I get into the petabyte range and how big the cost will be.
Build it your damn self
This might be the best case scenario when scaling large amounts of data for creative and custom solutions. You will have to handle backup, durability, security, monitoring, and maintenance. Which means your team will have to invest in acquiring and maintaining these skills which are not factored into these cost estimates.
1 GB – $19.04
1 – m1.small
1 TB– $1,248.06
5 – r3.8xlarge
1 PB – $46,622 – $69,934
22 – d2.8xlarge
The costs for building it yourself work across the board for most situations.
After looking at these different solutions since I’m looking for reasonable costs I will go with “build it your damn self”. I have learned a lot though after looking at these various solutions. If I’m looking to build databases for more enterprising clients RedShift and BigQuery handles a lot of grunt work I don’t want to be bothered with and would make the client feel more comfortable in terms of security. MLab is looks like a reliable solution for mongo users and flexible for wherever you wish to host it. Hopefully this guide saves you time on which solution.
Note: All AWS related prices were computed with on-demand instances, it’s possible to make costs much lower with reserved or spot instances.