Scaling Maps To Zettabytes and Beyond
The following Great Innovative Idea is from Mohamed Sarwat, Assistant Professor in Computer Science and Engineering at Arizona State University. Sarwat presented his poster, GeoExpo-Interactive and Scalable Exploration of Big GeoSpatial Data, at the CCC Symposium on Computing Research, May 9-10, 2016.
The Idea
GIS software packages are useful tools to make sense of spatial data. Such data includes but is not limited to: weather maps, vegetation indices, and geological maps. In addition, technology allows hundreds of millions of users to frequently use their GPS-enabled devices to access their healthcare information and bank accounts, interact with friends, buy items online, search interesting places to visit on-the-go, ask for driving directions, and more. In consequence, everything we do leaves breadcrumbs of digital traces on the map. State-of-the-art GIS tools face the following data infrastructure challenges: (1) Heterogeneity: Spatial data come from different sources and holds a variety of attributes. Many modern cities install sensors (e.g., air quality, temperature, road traffic) in buildings and traffic intersections to monitor the environment. Such sensors possess spatial location attributes, sensory reading attributes (e.g., carbon monoxide level, temperature, barometric pressure), and temporal attributes. Furthermore, citizens use their mobile devices to post geo-tagged social media that consist of spatial data (e.g., the location of the user who posted the tweet), graph data (i.e., social relationships among social networking users living in a city), as well as textual data like tweets and Facebook posts. (2) Scalability and Interactivity: Recently, the volume of available geospatial data increased tremendously and since data continuously streams into the map (e.g., CheckIns, Uber Trips), it can be challenging to store, index, query, maintain, and visualize the tremendous amount of evolving spatial data. Also, assume a data scientist analyzing the geospatial autocorrelation between city traffic and air pollution. Such analytics task requires a huge amount of interactive computation over a large amount of traffic and carbon monoxide sensors’ readings. Hence, it is necessary to design and develop data management systems that are able to digest the massive amount of map data, effectively stores it, and allows users to retrieve and analyze such data with interactive performance.
Impact
Making sense of geospatial map data is considered beneficial for endless applications that may transform science and society. For example: (1) Space Science: that allows astronomers to study and probably discover new features of both the earth and the outer space, (2) Socio-Economic Analysis: that includes for example climate change analysis, study of deforestation, population migration, and variation in sea levels, (3) Smart City and Urban Planning: assisting government in city/regional planning, road network design, and transportation / traffic engineering, (4) Disaster Response: That helps in assessing the impact of natural (e.g., Hurricanes) and man-made disasters, (5) Commerce and Advertisement: that includes, for instance, point-of-interest (POI) recommendation services in which we analyze the user spatial behavior and recommends POIs accordingly.
Researcher’s Background
Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the Data Systems lab at Arizona State University (DataSys@ASU). Before joining ASU in August 2014, Mohamed obtained his MSc and PhD degrees in computer science from University of Minnesota in 2011 and 2014, respectively. His research interest lies in the intersection of the broad areas of spatial computing and data management systems. Mohamed is a recipient of the University of Minnesota Doctoral Dissertation Fellowship. His research work has been recognized by the Best Research Paper Award in the IEEE 16th International Conference on Mobile Data Management (MDM 2015), the Best Research Paper Award in the 12th International Symposium on Spatial and Temporal Databases (SSTD 2011), and a Best of Conference citation in the IEEE 28th International Conference on Data Engineering (ICDE 2012).