Wide-Area Data Analytics

Workshop Report

October 3-4, 2019

CRA/CCC
Sofitel Washington DC Lafayette Square, 15th Street Northwest, Washington, DC, USA

Overview

Modern datasets are often distributed across many locations. In some cases, datasets are naturally distributed because they are collected from multiple locations, such as sensors spread throughout a geographic region. In other cases, datasets are distributed across different data centers to improve scalability or reliability, or to reduce cost; these distributed locations could be a mix of public clouds, private data centers, and edge computing sites. How should we analyze data collected or stored at multiple far-flung locations? The simplest solution would backhaul all data to a single location for analysis, but this approach may introduce excessive overhead and/or delay. Yet analyzing data in a fully distributed fashion may be expensive, too, especially when the analysis task needs to combine data from different locations, or the distributed sites have limited computation, storage, or energy. Deciding where and how to analyze the data becomes even more challenging when the available resources (such as network bandwidth) vary over time, and when the system needs to strike a trade-off between the overhead of answering a query and the accuracy of the results.

Several parts of the computer science research community are exploring how to perform wide-area data analysis, including researchers and practitioners in the database, networking, distributed systems, and storage fields. These communities often focus on different aspects of the problem, consider different applications and user cases, and design and evaluate their solutions differently. We believe that bringing these researchers together at a single workshop will create opportunities for interdisciplinary collaboration, and help in bridging the gaps between related work in different areas. In addition, the workshop can help create a stronger foundation for a broader view of “systems” as a common core research area in computer science, rather than separate research communities. Increasingly, most practical systems problems span a range of areas of systems research, rather than focusing squarely in one area. We believe that research and education in these fields will benefit broadly from efforts to work across traditional boundaries.

Example topics for the workshop include:

Use cases for wide-area data analytics, including video analytics, Internet of Things, distributed network monitoring, AR/VR, etc.
Various degrees of distributed wide-area data analytics, including public and private clouds, inter-datacenter, edge and datacenter, end-user and datacenter, etc.
Network infrastructure, including in-network data collection and analysis, inter-datacenter network design, etc.
Systems infrastructure, including computation models and algorithms, geo-distributed storage, consistency and fault tolerance mechanisms, resource identification and allocation, query languages, and bandwidth/accuracy/latency tradeoffs.
Security and privacy, including secure query execution mechanisms, privacy-preserving data analytics, and policy-aware systems and network infrastructure design.

The workshop will begin with breakfast the morning of October 3rd and conclude around midafternoon on October 4th.

Agenda

October 3, 2019 (Thursday)

07:30 AM	Breakfast Available \| Concorde Room
09:00 AM	Welcome, Overview, Introduction \| Madeleine Room Rachit Agarawal: Wide-Area Data Analytics Workshop Introduction
10:00 AM	Use Cases (2-3 short talks) \| Madeleine Room Victor Bahl: “Live Video Analytics – Extracting Actionable Insights from Cameras in the Wild” Lili Qiu: “Data Analytics for Wireless Communication and Sensing” Minlan Yu: “Data Analytics for Network Telemetry“
11:00 AM	AM Break \| Madeleine Pre-Function
11:30 AM	Platforms (2-3 short talks) \| Madeleine Room Ion Stoica: “To edge, or not to edge, that’s the question” Mike Freedman: “Building an open-source time-series database for wide-area data”
12:30 PM	Lunch (social and discuss talks) \| Opaline Bar
01:30 PM	Breakout #1: Use Cases \| Madeleine Room, Montmartre Room, Bastille
02:30 PM	Readout \| Madeleine Room
03:00 PM	PM Break \| Madeleine Pre-function
03:30 PM	Breakout #2: By Disciplines \| Madeleine Room, Montmartre Room, Bastille
04:30 PM	Readout \| Madeleine Room
05:00 PM	Break/Slippage/Propose 3rd breakout \| Madeleine Room
06:00 PM	Adjourn Day 1
06:30 PM	Dinner

October 4, 2019 (Friday)

07:30 AM	Breakfast Available \| Concorde
08:30 AM	Discuss Breakout #3 \| Madeleine Room Mark Hill: Accelerator-level Parallelism
09:00 AM	Breakout #3 \| Madeleine Room, Montmartre Room, Bastille
10:00 AM	Readout \| Madeleine Room
10:30 AM	AM Break \| Madeleine Pre-function
11:00 AM	Report Outlining/Drafting \| Madeleine Room, Montmartre Room, Bastille
12:30 PM	Working Lunch \| Opaline Bar
01:30 PM	Final Discussions \| Madeleine Room
02:30 PM	End Workshop

Organizers

Organizing Committee:

Rachit Agarwal, Cornell University
Agarwal

Jen Rexford, Princeton University
Rexford

With Support From the CCC System and Architecture Task Force:

Tom Conte, Georgia Tech	Ian Foster, University of Chicago
Mark Hill, University of Wisconsin, Madison

Logistics

The Computing Community Consortium (CCC) will cover travel expenses for all participants who desire it. Participants are asked to make their own travel arrangements to get to the workshop, including purchasing airline tickets. Following the symposium, CCC will circulate a reimbursement form that participants will need to complete and submit, along with copies of receipts for amounts exceeding $75.

In general, standard Federal travel policies apply: CCC will reimburse for non-refundable economy airfare on U.S. Flag carriers; and no alcohol will be covered.

For more information, please see the Guidelines for Participant Reimbursements from CCC.

Additional questions about the reimbursement policy should be directed to Khari Douglas, (kdouglas [at] cra.org).