Moonshadow’s flagship mobility analytics platform DB4IoT, the DataBase For the Internet of Things, gets its unsurpassed speed from a combination of data compression and an internal data structure that is optimized for waypoint data. Moonshadow is now making this technology available in a stand-alone compression tool: MMZIP.
Connected Vehicles (CV) and Location Based Services (LBS) Mobile Apps send location updates to the cloud while moving to generate a series of waypoints or ‘breadcrumbs’. Waypoint data has become an important source of information for many industries from transportation analytics, planning and public transit to marketing and store site analysis. This data, however, is only valuable in volume.
Data from millions of vehicles or mobile phones contain hundreds of billions of records. Downloading and storing this data in the cloud can be very expensive. Just downloading a year of data from a single data vendor can cost upwards of $50,000 and take weeks of time depending on the connection speed. Storing this data in the cloud and creating backups can easily run another $5,000 per month depending on the amount of data. Moonshadow’s MMZIP waypoint compression reduces the size of the data by over 90% without loss of information. When compared to existing compression technologies that are widely used for transmitting and storing data such as GZIP -9 MMZIP is 50-80% more efficient. As a result MMZIP files will reduce the transmission and storage costs by 50-80% while the data is moved from one server to another 2-5x faster. Data that took a day to download is now available in 5-12 hours. When processing the data MMZIP provides further benefits: if reading the data from disk into memory took an hour, this is now done in 5-12 minutes saving IT staff-time every time it is done.
Waypoint data needs to be filtered and enriched before it can be used for projects. A dataset with months of data for a county or a city contains billions of records and is too large to use efficiently in any analytics software. Projects, however, only need a fraction of the data. When analyzing traffic congestion on a bridge, for instance, transportation engineers only need to look at the waypoints in the trips that used the bridge during, for instance, midweek peak hours. This is a small fraction of the citywide data. Filtering out the ten million relevant waypoints out of a dataset with 100 billion records is challenging and can take days, or weeks, of server time. Generic compression tools do not have a mechanism to do this. With MMZIP users can provide filters to decompress only the records that are needed for a project. Different filters can be combined; users can provide a polygon and specify days and times and combine this with any other fields that may be in the data such as vehicle type or speed and MMZIP will create a file with only the selected data. MMZIP can even filter on fields that are not in the original dataset as we will describe below.
Traffic engineers think in roads whereas planners often think in areas. Waypoint data by itself is not useful to them; they need to derive information about roads or areas from the waypoint data. Before the data is useful to traffic engineers waypoints need to be matched to roads whereas planners need waypoints, origins and destinations to be matched to counties, cities, ZIP-codes, census tracts or other geographical areas. MMZIP can match origins, waypoints and destinations to road networks and geographical area meshes in the decompression process. A DOT engineer can provide MMZIP with a shapefile of its road network and MMZIP will match each waypoint to its road segments using the location and heading. When shapefiles with areas meshes are provided MMZIP will match every waypoint, origin and destination to every geographical area type. The matched road segment IDs and areas can be saved as new fields in the decompressed data. These new fields can also be used as filters and in this way MMZIP can filter the data on fields that were not available in the original waypoint data. Matching billions of waypoints to road segments and areas is a very time-consuming process that can take weeks of server time. MMZIP matches road segments or areas to waypoints at a speed of 2-3 million waypoints per second depending on the server speed. This is easily 20-50x faster than traditional technologies. A road-matching process that took three weeks is completed in under a day with MMZIP.
Since the data enrichment is so fast it can be more efficient to do the enrichments after the data has been delivered as this will reduce transmission time and costs. At Moonshadow we have processed over three trillion waypoints from connected vehicles, fleet software and mobile devices and provided access to our DB4IoT Mobility Analytics Platform to hundreds of users. We used to have four physical (not virtual!) servers running in parallel continuously for this process to enrich the incoming data with road and area snapping and load it into DB4IoT before we created project datasets for customers. MMZIP has completely changed our process. We will now keep our files in the MMZIP format and perform the enrichments only on data that needs to be delivered to a customer. We use MMZIP as our data repository to save significantly on storage and processing costs. By releasing MMZIP as a stand-alone tool we are making this available to other organizations.
MMZIP is available as a Linux command-line tool as well as a C-library to enable data engineers to completely embed it in their data processing and data distribution pipelines. Moonshadow can customize MMZIP to support specific customer file formats to further increase compression ratios and decompression speeds.
Please contact wander@moonshadow.com for more information.