Wide-Area Data Analytics
October 3-4, 2019
Sofitel Washington DC Lafayette Square, 15th Street Northwest, Washington, DC, USA
Modern datasets are often distributed across many locations. In some cases, datasets are naturally distributed because they are collected from multiple locations, such as sensors spread throughout a geographic region. In other cases, datasets are distributed across different data centers to improve scalability or reliability, or to reduce cost; these distributed locations could be a mix of public clouds, private data centers, and edge computing sites. How should we analyze data collected or stored at multiple far-flung locations? The simplest solution would backhaul all data to a single location for analysis, but this approach may introduce excessive overhead and/or delay. Yet analyzing data in a fully distributed fashion may be expensive, too, especially when the analysis task needs to combine data from different locations, or the distributed sites have limited computation, storage, or energy. Deciding where and how to analyze the data becomes even more challenging when the available resources (such as network bandwidth) vary over time, and when the system needs to strike a trade-off between the overhead of answering a query and the accuracy of the results.
Several parts of the computer science research community are exploring how to perform wide-area data analysis, including researchers and practitioners in the database, networking, distributed systems, and storage fields. These communities often focus on different aspects of the problem, consider different applications and user cases, and design and evaluate their solutions differently. We believe that bringing these researchers together at a single workshop will create opportunities for interdisciplinary collaboration, and help in bridging the gaps between related work in different areas. In addition, the workshop can help create a stronger foundation for a broader view of “systems” as a common core research area in computer science, rather than separate research communities. Increasingly, most practical systems problems span a range of areas of systems research, rather than focusing squarely in one area. We believe that research and education in these fields will benefit broadly from efforts to work across traditional boundaries.
Example topics for the workshop include:
- Use cases for wide-area data analytics, including video analytics, Internet of Things, distributed network monitoring, AR/VR, etc.
- Various degrees of distributed wide-area data analytics, including public and private clouds, inter-datacenter, edge and datacenter, end-user and datacenter, etc.
- Network infrastructure, including in-network data collection and analysis, inter-datacenter network design, etc.
- Systems infrastructure, including computation models and algorithms, geo-distributed storage, consistency and fault tolerance mechanisms, resource identification and allocation, query languages, and bandwidth/accuracy/latency tradeoffs.
- Security and privacy, including secure query execution mechanisms, privacy-preserving data analytics, and policy-aware systems and network infrastructure design.
The Computing Community Consortium (CCC) will cover travel expenses for all participants who desire it. Participants are asked to make their own travel arrangements to get to the workshop, including purchasing airline tickets. Following the symposium, CCC will circulate a reimbursement form that participants will need to complete and submit, along with copies of receipts for amounts exceeding $75.
In general, standard Federal travel policies apply: CCC will reimburse for non-refundable economy airfare on U.S. Flag carriers; and no alcohol will be covered.
For more information, please see the Guidelines for Participant Reimbursements from CCC.
Additional questions about the reimbursement policy should be directed to Khari Douglas, (kdouglas [at] cra.org).