Machine learning has been present for many decades now. However, some industries such as the logistics industry seem to be dependent on human decisions and actions until recently. Recently, how GIS can help the logistics industry was discussed here. With this in mind we wanted to look into using machine learning within the logistics industry. Working with Wardell Samotshozo, a graduate student at Howard University’s Computer Science Department, we ran a small experiment to predict port destinations of logistics data to show an example of this.

After experimenting with the data and tools, we ended up using Weka. Weka has a set of algorithms that are fit for machine learning and data mining tasks. It contains tools for data pre-processing as well.

The Chosen dataset

Enigma was the place where the data came from.  The data consisted of incoming shipments from U.S. Customs and Border Protection’s Automated Manifest System (AMS) for 2015.

1427 was the number of instances/rows of incoming shipments used. The port_of_destination is the final port of destination if the cargo travels by ship beyond its initial port of unloading in the United States. The cities picked for port of location were Miami, Florida, Norfolk, Virginia and Oakland, California.

Here is a list of the rest of the attributes used:

identifier – Unique shipment identifier. Can be used as the key to link to more detailed Bill of Lading, Cargo Description, & Hazardous Materials tables.
trade_update_date – Date trade records were updated.
run_date – Run Date
vessel_name – Name of the ship carrying the cargo.
port_of_unlading – Location where the items first entered the United States.
estimated_arrival_date – Estimated date the cargo would arrive at its destination.
foreign_port_of_lading – Foreign port where the cargo embarked on its voyage to the United States by sea.
record_status_indicator – Whether the record is New, Updated, or has been Deleted. Any records marked deleted should not be counted in any summations or rankings.
place_of_receipt- Location where the shippers first took possession of the cargo.
port_of_destination – Final port of destination if the cargo travels by ship beyond its initial port of unlading in the United States.
actual_arrival_date – Actual date the shipment arrived at its destination.
consignee_name – The company or person receiving the items.
shipper_party_name – The company or person shipping the items.
container_number – Container Number
description_sequence_number – Description Sequence Number
piece_count – Number of items contained in the shipment.
harmonized_number – Harmonized Tarrif Code
harmonized_value – Harmonized Value
harmonized_weight – Harmonized Weight
harmonized_weight_unit – Harmonized Weight Unit

Also, you can see the information about the dataset in the picture below.

Screen Shot 2016-04-08 at 3.57.33 PM

Chosen Algorithm to Predict Port Locations

The Decision Tree Algorithm was chosen to create a model based off the dataset. In particular, we chose to use J48 Classifier, which is a version of the Decision Tree Algorithm.  Using the J48 Classifier, a model or in this case, a ruleset was created. This model was then used predict or classify the target value for port location of the training set and test set. The dataset was then split so that 66% of the dataset was used to train the model and come up with the rule set.  This 66% is called the training dataset. Then 33% was used for the test dataset that will be used to test the predictions on a data that was not used to train the model.

Here are pictures below that show this process.

Description of attributes and algorithm

Results for using all the attributes.

As you can see in the image above, the results to predict the port location for the 33% or test dataset. Out of 485 instances or records,  474 or 97.732 % were predicted or classified correctly. Only 11 or 2.268 %were predicted or classified incorrectly.

Now this simple experiment was done to demonstrate how machine learning can be used. Weka is one of many tools. It is easy to run and view results on a small scale. Other tools are R, Scikit-Learn and Mllib with the Apache Spark.

If you are interested in discussing more about how to use Machine Learning in your software applications, leave a comment, contact me here or at adetola@adelabs.com.