Should you use a Database as a service or build it yourself?

(This is a guest post by Michael Washington)

We enter the age of everything being made as a service from platforms, databases, APIs, etc. With these services, convenience is becoming more of an expectation but at what cost? More specifically lets talk about database as a service. I have recently been tasked with a challenging decision to build out a database architecture that ranges from 1GB to many petabytes for individual clients while keeping costs REASONABLE, whatever that means lol. So let’s dive in!


Scenario

The client I’m working with visualizes large amounts of data that has been analyzed by data scientists. My team will transfer over the data to the dashboard to visualize the data.

Possible Solutions

  • Google BigQuery

  • Amazon Redshift

  • MongoDB as service

  • Just build the damn thing myself

Features all databases will have:

  • Global Availability
  • Highly Available
  • All costs are monthly except Google BigQuery

Amazon Redshift

Summary

I pretty much use AWS for everything web scraping jobs, web applications and data analytics. So Redshift was natural for me to look at first.

Key Features

    • They care of backup, durability, availability, security, monitoring, and maintenance for you.

Costs – Used Amazon Calculator

I feel pricing is a bit much for what I’m eyeing but amazon has consistently proven throughout the years to be a secure go-to solution. 

Monthly Costs

1.6 GigaByte

Redshift– $804.90

2 TeraBytes

Redshift2657.38

1 PetaByte

Redshift $313,588.80


Google BigQuery

Summary

Google BigQuery has treated me pretty well in past for visualizing data and works pretty well out the box. You can literally throw whatever data you have in there, write complex queries against and it will auto scale for you and your needs. The learning curve is pretty low. Here are some of the Key Features that stood out to me:

    • Auto Scale your data even into the petabytes!

    • Complex Queries and easy to write if you have SQL background

    • Easy to integrate with

Costs (From their website)

Google BigQuery costs are great when it’s between 1 GB to 1TB and possibly when you get in the petabyte range. Reason being is Big Query doesn’t index your data and they charge per query. If you have a web application with millions of users costs can go to unpredictable levels. If you have a great caching strategy this will prevent surging costs. From my experience BigQuery seems to best for production use with a predictable amount of users.


MLab (mlab.com) – MongoDB as service

Summary

This was the ideal situation for me as MongoDB is were most of my experience lie in the NoSQL world. I haven’t used this service but it seems like an ideal situation where you want to scale data from 1 GB to 1 TB. You can even choose where you want your data hosted which is pretty nice. 

Costs (From their website)


1 GB – $180

1 TB – $3790

1 PB – ?

The costs for mlab.com seem pretty reasonable. The worry I have is the when I get into the petabyte range and how big the cost will be.


Build it your damn self 

Summary

This might be the best case scenario when scaling large amounts of data for creative and custom solutions. You will have to handle backup, durability, security, monitoring, and maintenance. Which means your team will have to invest in acquiring and maintaining these skills which are not factored into these cost estimates.

Costs (AWS)


1 GB – $19.04

 1 – m1.small

1 TB– $1,248.06

5 – r3.8xlarge

1 PB – $46,622 – $69,934

22 – d2.8xlarge

The costs for building it yourself work across the board for most situations. 


Conclusion

After looking at these different solutions since I’m looking for reasonable costs I will go with “build it your damn self”. I have learned a lot though after looking at these various solutions. If I’m looking to build databases for more enterprising clients RedShift and BigQuery handles a lot of grunt work I don’t want to be bothered with and would make the client feel more comfortable in terms of security. MLab is looks like a reliable solution for mongo users and flexible for wherever you wish to host it. Hopefully this guide saves you time on which solution.

Note: All AWS related prices were computed with on-demand instances, it’s possible to make costs much lower with reserved or spot instances.

Reference:

https://cloud.google.com/bigquery/

https://aws.amazon.com/redshift/pricing

https://mlab.com/plans/pricing/

http://calculator.s3.amazonaws.com/index.html

Mapbox’s API to geocode data to get location information

 

When starting the National Geographic project, the dataset used had missing data. The dataset was missing street, city, state or zipcode to go along with the longitude and latitude for parks. Also, when doing a general search of street, city, state or zipcode, we wanted the capability to get the longitude and latitude of the search term to supply the nearest parks.  So I looked in to using the Mapbox Geocoding API to solve the issues. This API allows users to easily change location text into graphic coordinates (Maryland → -76.349,38.841) by forward geocoding. Reverse geocoding is the opposite wherein a pair of coordinates, latitude and longitude, are converted into a particular name of a place(-110.547,44.596 →Yellowstone National Park).

Forward Geocoding

In forward geocoding,  the query should be URL-coded, must not contain special characters such as semicolons that are often used to group or delimit requests. The response is usually made up of one or more results.

Here is what the URL looks like:

Here is an example use of the URL with the Query = Maryland:

Here is sample of the response:

{"type":"FeatureCollection","query":["maryland"],"features":[{"id":"region.9113309487783240","type":"Feature","text":"Maryland","place_name":"Maryland, United States","relevance":0.99,"properties":{},"bbox":[-79.487651,37.886605,-74.8966568003481,39.722883],"center":[-76.349402,38.841628],"geometry":{"type":"Point","coordinates":[-76.349402,38.841628]}

 

To get the rest of the data, you will actually have to make the http request in a browser or with a programming language. Also, with forward geocoding, country filtering and with proximity can be added. If you want more information about how to use the API to forward geocode, you can go here.


Reverse Geocoding

Reverse geocoding latitude and longitude coordinates as input. The response includes one result from each geocoding dataset.

Here is what the URL looks like:

https://api.mapbox.com/v4/geocode/mapbox.places/'+ xCoordinate+ ',' + yCoordinate + '.json?access_token=

Here is an example use of the URL with the x-coordinate/longitude = -110.547 and y coordinate/latitude = 44.596:

Here is sample of the response:

{"type":"FeatureCollection","query":[-110.547,44.596],"features":[{"id":"place.55914","type":"Feature","text":"Yellowstone National Park","place_name":"Yellowstone National Park, Wyoming, United States","relevance":1,"properties":{},"bbox":[-111.056889009999,44.1312909900081,-109.825486991003,45.0039050099999],"center":[-110.83,44.46],"geometry":{"type":"Point","coordinates":[-110.83,44.46]},"context":[{"id":"postcode.8237635196769770","text":"82190"},{"id":"region.10947505346724150","text":"Wyoming"},{"id":"country.5877825732302570","text":"United States","short_code":"us"}]}

To get the rest of the data, you will actually have to make the HTTP request in a browser or with a programming language. Limiting the input coordinates to at most 5 decimal places usually improves precision and performance of the reverse geocoding process. If you want more information about using the API to reverse geocode, you can go here.

Forward and reverse geocoding can be further enhanced by filtering to match more types. Features such as country, region, postcode, place, neighborhood, address and POI (places of interest) may be used. In the end, Mapbox’s API allowed us to geocode data to get location information.

If you are interested in discussing more about how to use Mapbox’s API in your software applications, leave a comment, contact me here or at adetola@adelabs.com.

PostGIS’s Simple Geospatial Features for PostgreSQL

PostGIS is a geospatial database extension of PostgreSQL. This technology can be used to find the geometry, longitude, latitude, distance and other measures. Some of its features, namely geometric types are specific to geospatial databases. These measures can be used for finding the location of an object and objects that are near to a point. Recently we have spent a lot of time talking about GIS. Several things have been examined, such as using geospatial technologies to build a custom web application here and using PostGIS for this custom National Geographic Projecthere.  Simple Features supported by PostGIS that will be discussed further are GeoJSON support and nearest neighbor method.


Geojson Support

The Postgis Extension has built-in GeoJSON format support using the ST_AsGeoJSON() method. This method can be used to addGeoJSON format can to be added in the sql query. The ST_AsGeoJSON() method formats a string into GeoJSON format in the sql query.

Here is an example PostgreSQL query with ST_AsGeoJSON method to format the result in GeoJSON:

SELECT ST_AsGeoJSON(geometry) as geometry
FROM points limit 1;

And here is the GeoJSON Query Result from a query with the ST_AsGeoJSON method:

{"type":"Point","coordinates":[-76.0301621537909,43.1440190508815]}


Nearest Neighbor Method

The basis for the PostGIS geographical type is a point. Given a geometry it aims to find the x number of nearest neighbors and n geometries of data. A  query with to find the nearest objects to a geometric value has the following clause, ORDER BY geometry_column <-> geometric value LIMIT k. <-> in the ORDER BY clause is a “distance” operator that when combined with ORDER BY clause and LIMIT k allows query to find the k nearest objects to a geometric value. We will focus on using a point, so geometric value = ST_MakePoint(double x , double y), where x is longitude and y is latitude.

Here is an example PostgreSQL query with ST_MakePoint method to get the k = 5 points nearest to the given point (-110.547348214819,44.5964236630063) in the database:

SELECT ST_AsGeoJSON(geometry) as geometry
FROM points
ORDER BY ST_MakePoint(longitude,latitude) <-> ST_MakePoint(-110.547348214819,44.5964236630063) limit 5;

And here is the Nearest Neighbor Query Result from the query above :

{"type":"Point","coordinates":[-110.547348214819,44.5964236630063]}
{"type":"Point","coordinates":[-110.709738503157,45.0298832732411]}
{"type":"Point","coordinates":[-110.663536323832,44.0947376621328]}
{"type":"Point","coordinates":[-110.728358647307,44.0885386251708]}
{"type":"Point","coordinates":[-111.100823317053,44.6630655719391]}

PostgreSQL with PostGIS is good  geospatial database to use for a lot of applications. Some use cases are in logisticsoil and gas, and internet of things. The use cases have a little similarity because the cases are based on location. Also, with PostGIS is good for any advanced geometric based functionality.

If you are interested in discussing more about how to use PostGIS in your software applications, leave a comment, contact me here or at adetola@adelabs.com.

 

How GIS can benefit the Oil and Gas Industry

Investing in research and development is essential in the Oil and Gas Industry when thinking long term. One area of interest for Oil and Gas Companies to invest in is GIS (Geospatial Information Systems). While big companies like Exxon have geomatic teams on their payroll, the small and medium sized companies are not able to afford this expense. Whether these companies can afford the expense for GIS, there are some uses they can benefit from below:


Vessel Tracking

Valuable assets can be tracked using GIS. This is especially so when tracking mobile vessels like boats and vehicles.  Knowledge of the exact location of a particular vessel means efficiency in service delivery and emergency response. Vessel tracking leads to efficiency gains and helps add value in complex operations.


Land management

Land Management is invaluable to the land men. It is invaluable when mapping lease data like lease expiry period, leaser’s identity, revenue interest, overriding royalty among other information related to land. Using GIS in land management is handy especially when it comes to generating mandatory reports as well.


Monitoring of Field Operations

GIS is also useful in monitoring of field operations by companies. Companies can manage change with this technology and avoid the commissioning of satellite data which is quite expensive. Through perhaps maybe a drone or other type of flying sensor, images can be collected from operation sites which help in monitoring progress.


Monitoring of Environmental Changes

Oil and Gas Companies can use GIS in monitoring environmental changes. GIS is helpful in detecting the sinking of land that may be the result of extracting shale. Maps can be used to visualize data against the base line cases. GIS also works in well planning to come up with the most accurate configurations for drilling given its distinct spatial analytics.

When it comes to mitigation measures due to cases of gas explosions and accidental oil spills, GIS is invaluable. Geospatial data available to Oil and Gas Companies, helps them to respond better during emergency situations. These are among the many uses of GIS that are helpful to players in the Oil and Gas industry.

Interested in discussing more how GIS can benefit Oil and Gas Companies? Leave a comment, contact me here or at adetola@adelabs.com.

Incorporating Modern Geospatial Technology in Logistics

Innovative technologies are slowly improving the efficiency of logistic services. Systems are now available for both logistics enterprises and their clients for location and routing of goods in transit. These systems efficiently reduce delivery time. New transportation systems and the use of on demand applications are using geospatial technologies to help people and businesses provide information on the location of the goods in transit.

Transportation Systems for Logistics

Transportation of goods and supplies to areas with zero accessibility has driven systems to be created to easily transport goods and supplies from one area to another. Innovators such as Matternet are transforming the methods of moving goods locally in key sites where traditional transportation infrastructures are inaccessible. One use case that Matternet has proven to be successful is using drones to bring much medical supplies to places devastated by disaster. The goal is to set up landing stations and routing systems not just for remote areas but urban or densely populated areas. Once in the area,  needed medical goods and supplies are delivered within minutes from one point to another. The concept is simple enough to incorporate the use of drones in transporting goods from one point or station to another in a systematic fashion. Integrating this type of transportation system can be done in the daily routine of businesses as well as in a more personal capacity of delivering goods. This sort of action has potential to be as common as the internet integrated into almost all aspects of our daily life. The routing systems created by Matternet puts these stations in the background. The stations can operate 24 hours  a day and 7 days a week without fail to move goods and some services without relying on any large infrastructure. With this drone based transportation system, Matternet seems to be on the cutting edge of improving the delivery of goods.

On Demand “Logistics” Applications

The thought of on demand applications usually brings Uber to mind. Uber has branched out into the delivery of goods aside from alternative transportation for people in urban areas. Using UberCargo can transport and deliver packages, goods and supplies from one location to another, or any on demand logistics needs. Uber has even had experiments with delivering donuts, flowers and other goods which them to start UberRush.  Also, there are young companies like InstacartPostmatesand DoorDash that seem to be at the forefront of this old but new way of delivering goods using the web or mobile application.

The Role of Geospatial Technologies in Logistics

The role of geospatial technologies in logistics is to map out routes to reach target locations on time, whether in transporting people or goods. This helps in efficiency and avoids bottlenecks in the delivering goods and transportation services. Geospatial Technologies involve maps, spatial databases, and other parts of geospatial information systems. They are a key component in the transportation systems and on demand applications as well.

If you are interested in discussing more about how geospatial technologies can be used into your logistics company, leave a comment, contact me here or at adetola@adelabs.com.

Creating a Geospatial Web Application for National Geographic

Celerity chose Ade Labs to help build a geospatial web application for National Geographic. The application would involve creating a web service with park location information so that the information could be plotted on a map. Ade Labs was in charge of building the Web Service/API.

Software Used

The initial technologies that the client wanted to use were the following Hapijs, PostgreSQL. I also chose to use Postgis, Knexjs, Bookshelfjs to incorporate geospatial capabilities and an ORM into the backend/web service.

Geospatial Web/Service API

The first step one was to load initial park data from National Geographic into the postgres database. Postgis was then used to get the longitude and latitude of park. Next, Postgis was used to convert geometric measures of the parks to longitude and latitude values.

Location information (ie. city, state, zip code) was missing from the parks data. Mapbox was used to reverse geocode the longitude and latitude of the parks data to get location information. After the location information was received they parks data was updated with that information in the postgres database.

Once the data is in the database, the web service methods can be created so that the data can be displayed on the map. Inside the methods, sql statements are created to get the parks data to be displayed on the map.

Some of the web service functionality built was:

Parks Search Method

The Park Search Method based off Mapbox’s web service API and a sql statement to get all the park information by a search query parameter. The park information is returned in geojson format.

Nearest Parks Method

The Nearest Parks Method to return a number of parks near a location given a point and other possible parameters. This functionality is based of a sql statement given the point and other possible parameters. The park information is returned in GeoJSON format.

If you are interested in discussing more on how to create a geospatial application, feel free to contact me here or email me at adetola@adelabs.com.