One week of connected vehicle data for Maricopa contains 100 million records

The Maricopa Association of Governments (MAG) is a Council of Governments (COG) that serves as the regional planning agency for the metropolitan Phoenix area. MAG utilizes connected vehicle data for transportation planning and transportation system analysis purposes. The data consists of millions of anonymized trips on regional roads. The data include large, daily files and come to more than 25 Terabytes in size for a few years period. Managing and using this data is a challenge. One essential function that needs to be done before the data can be used is that the waypoints need to be snapped to a road network. MAG uses both OSM and ESRI as map layers and it maintains its own road network and travel demand models. That means that MAG has the need to assign waypoints to different map layers depending on the project. Assigning billions of datapoints to multiple road networks is a computational challenge that MAG needed to solve. The other challenge for MAG was data selection. MAG participates in the FHWA Office of Operations’ Pilot Study – “Emerging Data and Applications for Work Zone Safety”, a collaborative effort led by Iowa State University. This study specifically explores the utilization of connected vehicle data for tasks like queue analysis, lane closure identification and the development of efficient traffic management plans to enhance work zone safety. The challenge here is to select only those datapoints that are relevant for the specific road segments that are being analyzed. The datapoints that are needed for a specific analysis may ‘only’ be a few hundred million, which is workable in many software products. MAG needed a way to do data selections. When doing data selections, MAG found out that selecting the relevant datapoints from one month of connected vehicle data would take five days of processing. This would not work for a project with many road segments and years of data, so MAG looked for technology to speed up this process. 

MAG decided to use Moonshadow’s MMZIP software to filter the connected vehicle data and perform road snapping. MAG installed MMZIP on its own servers in Phoenix and developed scripts to use MMZIP through its command line interface. MMZIP compresses the data to less than five percent of its original size and it can do road assignments and data filtering during decompression at speeds of over one million waypoints per second per server. MMZIP can also do these enrichments and filtering operations at high speeds on uncompressed input files and this is how MAG decided to use MMZIP. MAG decided to use the polygon filter function in MMZIP to do data selections. MAG staff defined polygons for all the road segments that were relevant for the FHWA Pilot Study and wrote scripts to automatically load data into MMZIP, apply all polygon filters, and output only the selected data. Using MMZIP reduced the time to filter a month of data from five days to just ten hours speeding up this process over 10 times. MAG further reports a 90 percent decrease in compressed file size, and a 95 percent savings in network matching runtime. “It was technically challenging to look at a full year of data for a particular use case before using MMZIP, now we can,” said Dr. Wang Zhang,MAG’s Transportation Data Program Manager. MMZIP enables MAG to use connected vehicle data to its fullest extent for many different projects. 

MAG has plans to expand the use of MMZIP. MAG also has extensive models for travel demand and microsimulation. MAG plans to use MMZIP to attach the MAG network data to the connected vehicle data records.