December 14, 2022

Moonshadow Mobile started the development of MMZIP, its waypoint compression technology, last summer. We recently had an opportunity to test out MMZIP for a large-scale project that was using Wejo data. The results were very impressive; MMZIP reduces the data preparation time and costs by over 90% and our client can start using the data weeks earlier.

Moonshadow was asked to prepare one year of Wejo data for all trips in the state of Alabama for transportation analysis. Wejo generated over 90 billion waypoints and made these available as 10TB of GZIP files on AWS. Downloading the 10TB of data from AWS to our datacenter in Eugene took over 70 hours and cost $700 in AWS download costs. We had expected that MMZIP would reduce the size of the data by 50% compared to GZIP but after we processed the data the GZIP size was reduced by almost 90%. If we had used MMZIP on AWS before transmitting the data the transfer time would have been reduced to under 9 hours and the AWS data transfer cost would have been under around $85. Using compression at AWS would have sped up the project start by two days and saved over $600 in cost. AWS charges 8 cents per GB per month for storage costs. Storing the Alabama data costs the customer $800 per month. MMZIP reduces this to under $100. The table below shows that MMZIP reduced the download times by 63 hours while reducing the download costs by over $1,000.

While the savings on downloading the data are impressive this was just the beginning. The savings on processing the data were much more significant. MMZIP doesn’t just decompress the data; it can enrich and filter the data during decompression without the need to store it on disk first (which would take extra time and cost money). Before waypoint data can be used to analyze traffic on roads every single waypoint needs to be matched to a road segment. Assigning a waypoint to a road segment is a quick enrichment process that takes just a few microseconds when you use traditional technology. But do this 90 billion times and you are talking ‘real time’. Before we developed MMZIP Moonshadow was using PostgreSQL and PostGIS to perform road segment matching. Assigning 90 billion waypoints to road segments would have taken over 38 days of server time. MMZIP performs this task in just over a day which means that the transportation analysis project could start weeks earlier. MMZIP uses a streaming format where it reads in one million records at a time and streams out the results. The 90 billion waypoints never need to be in RAM at the same time. This means that a much smaller – and less expensive – server can be used when using MMZIP. With the PostgreSQL solution we needed a much more powerful server with more RAM and larger disks. AWS charges 8 cents per GB per month for storage. That may look like a small charge until you look at the size of this data. At 10TB the Wejo GZIP data costs 10,000 times 8 cents which is $800. What’s more, we need the AWS storage for five weeks whereas we need the smaller MMZIP storage only for two days. The table below shows the savings in time and money for road matching one year of Wejo Alabama data on AWS with MMZIP.

This table shows MMZIP saved 37 days of server time. If a process is running for 37 days IT staff needs to monitor it on a regular basis to make sure it is still running well. This easily comes to 30-45 minutes per day. Apart from the inconvenience of having to wait 37 days for the process to be finished MMZIP also saves 20-30 hours of work for the data engineers.

There are a number of other processes that take up time and create expenses that we have not quantified here. The PostgreSQL solution likely needs to first generate indices to run more efficiently. Indexing processes on large data sets can take considerable time and further increases the size of the data. Depending on how it is engineered the PostgreSQL solution may need to store intermediate files further increasing the storage costs. Apart from the road matching we perform area matching to assign county, city, ZIP code, and Census Block Groups to every waypoint. We did not include numbers for these processes in this comparison.

When we started developing MMZIP this summer we were hoping to achieve 50-70% savings over GZIP based solutions. Now that we are testing out the first large data projects with MMZIP we are experiencing that projects can start weeks earlier while we realize cost savings over 95%.

MMZIP is available as a command-line tool as well as a C-library that engineers can fully integrate into their own waypoint data processing pipeline on their own servers. Please email info@moonshadow.com or call us at 541.343.4281 for more information about MMZIP.