Computing Research News

March 2015     Vol. 27/No. 3

Back to Front Page

March Announcements

By CRA Staff


CRA Announces New Executive Officers

CRA has elected new Board Officers to serve two-year terms beginning July 1, 2015. Susan Davidson has been elected Chair. She will be the first female to hold this position in CRA history. In addition, Susanne Hambrusch will become Vice Chair, and Greg Morrisett will become Secretary. The Board re-elected Ron Brachman as Treasurer. The current Board Chair J Moore and Vice Chair Laura Haas will end their terms on June 30, 2015. CRA thanks both of them for contributions during their outstanding service on the Board.


Congratulations to both the current and incoming CRA Board Chairs

Susan Davidson and J Moore have both recently been recognized as Fellows of the Royal Society of Edinburgh, Scotland's National Academy. Click here for additional details.

Gift to CRA Board Chair

At the February Board Meeting, CRA Board Chair J Moore was presented with a one of a kind, handmade gift for his term of service as CRA Board Chair – a custom built steam engine, which has a unique significance for J. For the gift idea, CRA Executive Director Andy Bernat drew upon J’s history working with steam engines. In high school, J had a job servicing them in an oil refinery. The engine was custom built for J by Andy Bernat and Philip Wiborg.  Click here to view additional photos.  


CRA Executive Director Andrew Bernat (left) presents CRA Board Chair (J Moore) with a gift for his term of service as CRA Board Chair (July 1, 2013 - June 30, 2015).

gift to J Moore

CRA Executive Director Andrew Bernat (left) presents CRA Board Chair (J Moore) with a gift for his term of service as CRA Board Chair (July 1, 2013 - June 30, 2015).

 CRA Executive Director Andrew Bernat (right) presents CRA Board Chair J Moore (left) with a gift for his term of service as CRA Board Chair (July 1, 2013 - June 30, 2015).  


Current and former NSF CISE ADs at the CRA Board Meeting

On the left, Farnam Jahanian (immediate past CISE AD and 2015 Awardee of the CRA Distinguished Service Award); 

On the right, Jim Kurose (current CISE AD and former CRA Board member).

 Current and former NSF CISE ADs at the CRA Board Meeting. On the left, Farnam Jahanian (immediate past CISE AD and 2015 Awardee of the CRA Distinguished Service Award); On the right, Jim Kurose (current CISE AD and former CRA Board member).

2015 CRA Distinguished Service and A. Nico Habermann Awardees Announced

By CRA Staff

Farnam JahanianDistinguished Service Award

Farnam Jahanian 2015 Vice President for Research, Carnegie Mellon University

Farnam served as NSF Assistant Director for CISE from 2011 to 2014, the highest profile government position for computer science research. During his tenure he fought hard for computer science and launched three presidential initiatives: National Robotics Initiative, Big Data Research and Development Initiative and US Ignite. Farnam led twenty-five new solicitations, including several cross-directorate efforts such as secure and trustworthy cyberspace, cyberlearning and future learning technologies, and big data. He reintegrated the Office of Cyber Infrastructure (OCI) with CISE. Farnam served as co-chair of the NITRD subcommittee of the National Science and Technology Council Committee on Technology, providing overall coordination for the R&D activities of 17 government agencies. He often testified before congress and gave about 100 presentations at universities and conferences.

Former NSF director and current CMU president Subra Suresh writes "Farnam is both a visionary and the pragmatist, and this combination of qualities has allowed him to be effective in whatever he undertakes."

Tom Kalil, OSTP Deputy Director for Technology and Innovation states "During my more than thirteen years of service at the White House for two Presidents, I have had the opportunity to work with many individuals from the computer science research community who have been willing to serve in leadership positions at federal agencies such as NSF, DARPA, and the Department of Energy. Farnam has been second to none as measured by the breadth and depth of his impact on the direction of the field, and his ability to partner effectively with the research community, and his peers at NSF and other agencies, and the White House. His leadership and hard work has resulted in increased federal investment in critical areas such as Big Data, robotics, cyberphysical systems, cybersecurity, cyber-learning, next-generation networking, and CS education."

Ann Quiroz GatesA. Nico Habermann Award

2015 Ann Quiroz Gates Chair of the Department of Computer Science at the University of Texas at El Paso (UTEP)

For over two decades, Gates has been a leader in initiatives that support Hispanics and members of other underrepresented groups in the computing field. She is perhaps best known for leading the Computing Alliance of Hispanic-Serving Institutions (CAHSI), an alliance of 13 institutions whose work has had large and sustained positive impact on recruitment, retention, and advancement of Hispanics in computing. Mentoring is a key component of CAHSI’s approach, which builds support networks that address both academic and cultural issues for students at all stages of their college and postgraduate education and on to leadership positions. Gates helped establish the Affinity Research Group (ARG) model for research mentoring and peer support; the evaluation of its effectiveness and dissemination of the findings has led to its adoption at institutions outside of CAHSI. Through an NSF ADVANCE program, Gates has also promoted the recruitment, retention, and advancement of female faculty at her home institution, UTEP. She has greatly enabled the success of many students through her personal mentoring of over 150 Hispanic students and research supervision of over 70 students. Gates’ influence has extended to other initiatives and communities, including the Society for Advancement of Hispanics/Chicanos and Native Americans in Science (SACNAS), CMD-IT, and the AccessComputing Alliance. The scale and impact of Gates’ contributions is truly exceptional, particularly in support of Hispanics who account for 25% of the U.S. population, but less than 7% of bachelors degrees in computing and less than 2% of PhDs.

NSF and the National Big Data Initiative

By Chaitan Baru, Senior Advisor for Data Science, CISE Directorate, National Science Foundation

Three years have passed since the launch in March 2012 of the National Big Data Research and Development Initiative by the White House Office of Science and Technology Policy (OSTP). The breathtaking pace of activity in big data has continued unabated in the intervening years. In August 2014, Gartner declared that “big data” had passed the peak of the so-called “Hype Cycle.” This only means that the community can now roll up their collective sleeves and get to work on the real issues, rather than worrying about the hype!

While dealing with data is not a new phenomenon—whether in science, business, or government—there is recognition in every field and discipline that the easy availability of vast amounts of data, continuous data streams, a heterogeneous range of datasets, and the use of all of these data for action and insights—including in “real time”—has indeed created a new phenomenon, which the community is beginning to embrace as the new discipline of Data Science, or Data Science and Engineering.

Every national priority area and initiative, whether cybersecurity, Precision Medicine, BRAIN, smart energy, Materials Genome, or Advanced Manufacturing, will generate more, new data, and run into the challenges of Big Data. Scientific breakthroughs and new business, government, and societal applications will only come from effective use of all these data. The ability to convert data to action and insights will be the gating factor in our ability to create efficient and effective solutions in all of these priority areas.

Since the Big Data Initiative announcement three years ago, the White House has taken action in a number of ways. In May 2013, the Administration released the Open Data Policy so that information generated and stored by the Federal Government is made more open and accessible to innovators and the public to fuel entrepreneurship and economic growth while increasing government transparency and efficiency. Given the interest in gaining maximum value from their data assets, government agencies are hiring Chief Data Officers and Data Scientists. On February 18, 2015, the White House announced the appointment of Dr. DJ Patil as the first U.S. Chief Data Scientist. In a keynote talk the next day at the Strata+Hadoop World Conference in San Jose, Dr. Patil noted that he had already seen a number of innovative uses of data in government, sometimes even surpassing industry’s use of data.

Coordination among federal agencies for the Big Data Initiative is enabled through the Networking and Information Technology Research and Development (NITRD) Big Data Senior Steering Group (BDSSG), co-chaired by NSF and NIH, with members from DARPA, DOD OSD, DHS, DOE-Science, HHS, NARA, NASA, NIST, NOAA, NSA, and USGS. Last fall, the BDSSG issued a Request for Input to inform the development of a framework, set of priorities, and ultimately a strategic plan for the National Big Data Initiative[1]; and last month, NSF sponsored a workshop at Georgetown University to obtain additional input from academia, industry, and the community at-large. A second related workshop was hosted by the Homeland Security Advanced Research Projects Agency (HSARPA) on February 23, 2015 in Washington, DC.

One of the cornerstones of the original Big Data Initiative announcement was the creation of a research program in Core Technologies and Techniques for Advancing Big Data Science & Engineering, or BigData. In the first year, this was a joint initiative between NSF and NIH. In the second year (2014), NIH had initiated their BD2K program, and NSF continued with the BigData program. The most recent, third NSF BigData solicitation released on February 19, 2015 includes participation from all NSF Directorates, as well as participation by the Office of Financial Research, Department of Treasury. In subsequent years, we hope to collaborate with additional agencies.

In addition to leading research efforts to advance Big Data science and engineering, NSF is also providing leadership to accelerate the Big Data innovation ecosystem. Building on the momentum of the White House Data to Knowledge to Action event in November 2013, which announced new Big Data partnerships, NSF last fall issued a Request For Input on the formation of Big Data Regional Innovation Hubs. We plan to soon announce a series of regional workshops for later this year to further explore this concept.

While the Big Data program creates enormous opportunities for creating new knowledge from large-scale data across all disciplines, there are also new challenges to be addressed including, sustainability, identifying which data, from the vast ocean of data sets, need to be retained for the long term, and the business models to support that; reproducibility, ensuring that results from data experiments can be reproduced at a later time, especially by others; data to action, how to reach decisions and take confident action from data, for example, in business and/or government applications; and, data to insight, obtaining an understanding of the underlying phenomena from data, for example, in medical and/or scientific applications.

Furthermore, even as organizations grapple with issues of managing and exploiting their ever-increasing data resources, we may only be at the beginning of this data deluge. With the impending arrival of the so-called Internet-of-Things (IoT) one can expect even larger volumes of data from a vast array of sensors spanning spatial scales from the individual (e.g. wearables and personal monitoring devices), to the home or factory (Smart Homes, Industrial Internet), and urban environments (Smart Cities). NSF is planning to organize a series of workshops in 2015 on the topic of Big Data and the Internet Of Things.

The Big Data phenomenon has led to the recognition of data science and engineering as a newdisciplinary area—not just at the PhD and Master’s levels but also at the undergraduate level. A number of universities are actively developing a full undergraduate curriculum in Data Science, or Data Science and Engineering. Indeed, the picture that is emerging is that the scale of data science is much larger than we had originally anticipated. As an example from the Big Data Strategic Initiative workshop at Georgetown earlier this year, Dr. Andrew Moore’s keynote noted that Google currently employs about 10,000 people who help curate Web data to assist in Google Search[2]. The CISE-supported Expeditions in Computing project AMPLab at UC Berkeley is about Algorithms, Machines, and People. The “human in the loop” will be a key factor in our ability to fully exploit data resources in future.

As the Google example illustrates, much of this work is not necessarily at the graduate level. There will likely be an ecosystem of data science-related employment at the graduate level, undergraduate level and, possibly, at the community college level. NSF plans to organize workshops to explore educational opportunities to serve all aspects of the data science industry.

These are indeed exciting times for Big Data and Data Science and we invite you to participate in this exciting new opportunity not only for the CISE community, but for a host of related disciplines!


[1] A summary of the response to the RFI is available at:

[2] The full video of the workshop is available at:

Privacy by Design Workshop: Concepts and Connections

By CCC Blog

The following guest blog post is contributed by Ph.D. students Nick Doty and Richmond Wong working with Deirdre Mulligan from the University of California Berkeley School of Information.

For years, lawmakers, advocates and engineers have touted the potential benefits of Privacy by Design, of integrating privacy throughout the technical design process rather than an after-the-fact. Nonetheless, we still struggle with how to practice Privacy by Design, whether it’s how to conceptualize privacy, how to build privacy in the engineering process, how to present those privacy designs to users or how to incentivize practice of and compliance with Privacy by Design.

In order to identify a shared research vision to support these different facets of the practice of Privacy by Design, the Computing Community Consortium (CCC) is sponsoring a series of four workshops over this year. We kicked off the series this past week with the first workshop held in stormy Berkeley, California.

A group of over 40 collaborators represented various parts of industry, academia, government and civil society: from health care to social networking to telecommunications, from philosophy to law to computer science, from national intelligence services to state pension authorities to consumer advocates.

Based on a series of case studies of privacy complaints arising in different sectors, groups analyzed: the applicability of existing privacy frameworks such as the Fair Information Practice Principles; taxonomies of privacy harms and justifications; and new concepts or “properties” of privacy. The group struggled with the “essentially contested” concept of privacy and how nonetheless different concepts or analytical tools can help us identify and address privacy concerns.

The workshop also heard “reports from the field” on those who have implemented — or are struggling to improve — privacy programs in the wild: at large tech companies, Internet standard-setting bodies or government agencies. Often highlighted were disciplinary differences: both in the different ways that academics (lawyers versus computer scientists, say) approach the concept of privacy and its practice and in the effective organization of multi-functional teams within companies. We heard frequently that attendees had met new people and been challenged by new ideas. We hope those connections will contribute to productive workshops to come.

Reflecting over the two days and looking forward, participants discussed how to engage with the complexity of conceptualizing privacy, and how to bring in expertise from other relevant perspectives such as economics, sociology, and science & technology studies. We identified a desire to bridge the technical and social research cultures, and to bridge the research work creating new privacy tools and the adoption of those tools in the practice of Privacy by Design.

Organization is in progress for a workshop in May to discuss privacy from the perspective of design, hosted at Georgia Tech. In the fall, we will gather software engineers at Carnegie Mellon to discuss their development practices in depth. Finally, to wrap up the series, an east coast event will provide a discussion for policymakers and regulators to discuss how to catalyze Privacy by Design.

Presentations, introductions of the participants, reports from our brainstorming sessions and collected scholarly references are all available on the workshop series homepage. We will invite some participants to blog about their individual experiences here and a more detailed workshop report will follow.

G/rep{sec}  = underrepresesented groups in security research

By Terry Benzel, Susan Landau, and Hilarie Orman

Three years ago in May 2012, as Terry Benzel, Deputy Director, Computer Networks Division, Information Sciences Institute at USC, Hilarie Orman, The Purple Streak (a software security firm), and I, Susan, then a visiting scholar at Harvard, sat at the IEEE Symposium on Security and Privacy, we had trouble seeing any other women. As women researchers in security and privacy of a certain age, we were accustomed to that. But we were not accustomed to the original proposal for the program committee for the following year’s program committee: forty men, two women. We looked at each other. There was not “world enough and time” to wait for the situation to change; we needed to take action now.

Having served on the CRA Committee on the Status of Women in Computing Research (CRA-W), I knew about funding for discipline-specific workshops. The application date was a month away. Jeremy Epstein, the NSF program officer for Secure and Trustworthy Computing, was at the symposium; he mentioned funding deadlines were imminent, and suggested we get an application in quickly.

Within six weeks we had a workshop location, a draft program, and two proposals completed; by August, we had funding secured. Hilarie came up with a workshop name: "GREPSEC," G/rep{sec} = underrepresesented groups in security research. We took it.

The CRA-W funding came with a twist. The funding was joint with the Coalition to Diversify Computing; the requirement was that we also include members of underrepresented groups. If women are rare in computer security and privacy, members of underrepresented groups are even more so. We took on the challenge.

All three of us had experience with participating in such mentoring meetings. We opted for a day-and-a-half long meeting, scheduled just before the 2013 IEEE Symposium. Finding and arranging for speakers was the most complex part: we sought to cover security and privacy broadly; we wanted women and minority speakers, and we sought a balance between academic and industry speakers (government too, where we could find them).

Following the successful 2008 MIT Women in Mathematics: A Celebration meeting, in which Susan was a co-chair, we decided on a schedule that focused on technical sessions, while leaving lots of free time for informal mentoring at the coffee breaks and meal. Doing things that way, rather than presenting explicit mentoring panels, presents the clear message that the women and members of underrepresented groups are scientists and researchers. That was the most important message to convey.

Attracting speakers was fun, and largely easy. We had a wonderful list: Terry Benzel, Dan Boneh, Claudia Diaz, Rachel Greenstadt, Cynthia Irvine, Anthony Joseph, Carl Landwehr, Teresa Lunt, Deidre Mulligan, Aleatha Parker-Wood, Ron Perez, Diana Smetters, Zach Tudor, Helen Wang, Jeannette Wing. We had representation from academia (Stanford, KU Leuven, Drexel, Naval Postgraduate School, George Washington University, USC) and from industry (PARC, AMD, Google, Microsoft). (Some of the speakers also had previous experience in working in or with government.)

Attracting students was more complicated. We could find women graduate students through advertising on the Systers mailing list, through the CRA-W lists and postings, and through targeted mailings to security and privacy researchers. Reaching members of underrepresented groups was more challenging. Many of the students are not at the top research universities, and thus not in the loop just described. We arranged for posting on the Latinas mailing list, while Russ Joseph, co-chair of the CRA-W/CDC committee on discipline specific workshops, helped get the word out to faculty at minority institutions. We also arranged for flyers at the Tapia Conference. We made sure to advertise heavily.

We received a total of ninety-four applications from fifty-seven institutions. Ten applications were from undergraduates, of whom six were members of underrepresented groups. Because we had sufficiently many applications from graduate students, we opted not to host undergraduates.

The combined funds from the three grants — we had also received a small grant from Microsoft — provided funding for twenty-seven domestic and four international students from twenty-three institutions. Grants were awarded to twenty-eight women, of whom twenty-five attended the workshop, and ten members of underrepresented groups, of whom seven attended the workshop. The attendees included six men. With additional funding from Microsoft, we were able to fund some non-US students from non-US institutions; given the strength of the University of Waterloo’s cryptography program, for example, this was a great plus.

The students ranged from first-year graduate students with burgeoning interests in security and privacy to students close to completing their PhDs in the field.

The ratio of speakers to students was deliberately high since we wanted to encourage one-on-one mentoring. Half the speakers were present for both days, half for only one. Having focused the talks and panels on technical content, we encouraged—and left ample time for—the students to mingle with the speakers during coffee breaks, meals, and the Saturday evening reception. Except for the initial breakfast on Saturday morning, where the speakers mostly sat with each other, the rest of the informal time we saw the students and speakers talking, sometimes quite intensively. I, Susan, knew it was working when I saw two minority males corner Ron Perez, senior fellow at AMD. This is exactly what we had hoped for.

Despite the fact that some students were coming from top-ranked institutions while others had a much less strong background, we felt it was important that the students feel that they were on a level playing field. So instead of doing an evening poster session with haves and have-nots, we opted for everyone participating in a one-minute introduction just before the Saturday evening reception, with each student telling who they were and where they were from, and a problem they were working on, or interested in. That was terrific. We had them go in alphabetical order — no shy ones last — and the process worked wonderfully. It broke the ice even for the very shy ones. The evening reception, and the following day, showed lots of lively discussion between the students and the speakers—exactly what we had intended.

We covered a variety of technical topics: high-assurance systems (software and hardware), security research problems in industry, mixing AI and security, developing trust in cyberspace (pulling together people, laws, and technology). We had keynotes on a theory of trust in networks and people, on where cryptography and authentication are heading, and on developing a building code for software.

We had different impacts on different students. For some of the students, including those at institutions with a lower research profile, this was the first time that they were exposed to cutting-edge research. For students at universities with a higher research profile, the workshop enabled them to have a wider exposure to the broad set of research questions in security and privacy; such a perspective is extremely useful and is often unlikely to be part of graduate education. In addition, several of the students went on to attend the IEEE Symposium on Security and Privacy, a leading conference on security and privacy; this attendance contributes to creating a new generation of researchers, educators and developers in the discipline. We even had impact on researchers; one of them, Carl Landwehr, presented a keynote that later evolved into a research paper.

We learned various things from our workshop. Some students, those from the “Research I” and “Research II” categories, knew about buying plane tickets and getting reimbursements. Those from institutions that did not strongly support research floundered, and we lost several attendees (lost in the sense that they did not attend; we hope none them were unable to navigate the BART trains and are still wandering about in the system). So for GREPSEC II — yes, we are running the workshop again — we have arranged to pre-purchase tickets and directly pay hotels. The work to present a level playing field for the students really helped, but students from less research focused institutions tended to be quiet. So we’ll continue the one-minute intros in this year’s program, but we’ll move that to earlier in the program. We’ll also work to have the students mix more. 

The need for GREPSEC is clear. The funding is less so. We have again received funding from NSF and CRA-W/CDC — thank you — and from Google, Microsoft, and the Information Sciences Institute. But given the national needs in cybersecurity and privacy, and the continuing paucity of women and members of underrepresented groups in the field, a long-term grant to fund several more workshops over the next six-ten years, is probably what is needed. Terry, Hilarie, and I hope to attain that, to become the GREPSEC steering committee, and let the next set of women and members of underrepresented groups become the committee to run GREPSEC III and beyond.

Big Data Science at Johns Hopkins

By Alex Szalay, Johns Hopkins University

Last year Johns Hopkins University (JHU) started the Institute for Data Intensive Engineering and Science (IDIES, pronounced as “Ideas”), promoting the use of large data sets for scientific discovery across the whole university. IDIES spans the Schools of Arts and Sciences, Engineering, Public Health, and Medicine. Hopkins president Ron Daniels and several deans have dedicated 10 new faculty positions to IDIES, all encouraging interdisciplinary research related to Big Data in science. Currently, IDIES has more than 80 faculty associates.

A prominent data-intensive science project at JHU is the public archive of the Sloan Digital Sky Survey (SDSS), one of the world’s largest astronomy databases. Besides the professional astronomy community, the data has been accessed from more than 4 million distinct IP addresses since its creation in 2001. The SDSS data resulted in more than 5,000 refereed publications and over 200,000 citations. The system has a collaborative server-side workspace, CASJOBS, which enables users to save and share their results close to the main database. This system, aimed at the professional astronomy community, is used by more than 6,000 people world-wide. The database can be navigated via an interactive visual interface or accessed programmatically through a web-services API (Figure 1). The database has now grown to 15 Terabytes and all the archived SDSS data exceeds 200 Terabytes.

The interactive Navigation tool of the SDSS SkyServer database.

Figure 1: The interactive Navigation tool of the SDSS SkyServer database. The users can access all information in the database through visual navigation.

Beyond astronomy, IDIES has successfully expanded its activities in many other areas of science. For example, members of IDIES have built the Turbulence database [JHTDB], which contains simulations of turbulent flows spanning more than 200TB. To explore this data, JHU scientists have created a novel interface: instead of downloading the prohibitively large simulation files, scientists can launch “virtual sensors” from their laptop, which can either stay fixed or move with the flow, and report back various physical quantities, like velocity, vorticity, pressure, dissipation rate, etc. Over the last 5 years more than a 50 refereed papers have used the data, and launched more than 12 trillion (!) sensors (Figure 2). The ease of access to such large-scale simulation data is helping democratize computational turbulence research. Scientists and engineers who before could not experiment with such datasets due to their size can now easily ask relevant questions. Users range from mathematicians asking questions about near singularities in the partial differential equations that govern fluid flow, to experimentalists who wish to test measurement techniques in ideally controlled flow conditions.

In general, supercomputers are generating ever-larger simulations, but data sets in the hundreds of terabytes often sit unused because they are simply too large to manipulate. Learning from the turbulence experience, IDIES is now turning such simulations into public, open numerical laboratories in other areas of science from cosmological N-body simulations to ocean circulation models. JHU scientists have helped to build the world’s most-used database for cosmological simulations, the Millennium, hosted at the Max Planck Institute in Garching, Germany.

Another fast-growing area of data science research is neuroscience, which has emerged as a national interest area with the US President’s Office announcing the $100M BRAIN initiative in April 2013. IDIES’ Open Connectome Project (OCP) is a leader in data management and analytics for big data neuroscience. The site provides public access to more than 200 TB of neuroscience imaging data from multiple imaging modalities that capture the structure (e.g., electron microscopy, CLARITY and array tomography) and function (e.g., two-photon calcium imaging and fMRI) of the brain. This includes the largest public brain dataset− 20 Teravoxels of the mouse visual cortex that resolve single synapses. OCP connects datasets to computer vision pipelines on supercomputers that reconstructs the structure and connectivity of the brain and stores them in co-registered databases (Figure 3). Neuroscience is poised at the edge of a revolution of discovery based on recent advances in high-throughput imaging. Data-intensive science will help us understand the mechanisms for computation in the human brain and provide the foundation for research into the neurological basis of complex disorders such as autism and ADHD.

Genome sequence data is growing much faster than many other areas in science, thanks to breakthroughs in sequencing technology that make it possible to sequence a human genome in just two days. IDIES scientists today hold more than 2,000 full genomes in house, which have been collected to explore the genetic causes of common diseases including asthma and cancer. Genomics researchers can be found in each of the participating schools of IDIES, and JHU scientists are engaged not only in generating and using sequence data, but also in developing new computational and statistical methods for sequence analysis. For example, one of the most widely used experimental paradigms is RNA sequencing (“RNA-seq”), a technique that allows scientists to study the complex patterns of gene expression in different cells and tissues, and to discover genes whose activity is linked to disease. Hopkins faculty and their students have developed software systems for RNA-seq analysis that have become the standard for the field, used daily in thousands of laboratories around the world.

Johns Hopkins is investing heavily in personalized medicine, exploring new uses of digital information to create individual treatments, and how clinical data can be more efficiently used in translational medicine. For example, OncoSpace, a project integrating a variety of radiation oncology data was built from the elements of the SDSS database. Now it is introduced in clinical use across several universities, it is well on its way to demonstrate how interactive databases can help personalizing complex treatments.

Figure 2: Color contours representing  velocity on a 2D slice through the 4D turbulent dataset available at JHTDB.

Figure 2: Color contours representing velocity on a 2D slice through the 4D turbulent dataset available at JHTDB.

Figure 3: Dense reconstruction of the wiring (dendrites and axons) of a mouse visual cortex.

Figure 3: Dense reconstruction of the wiring (dendrites and axons) of a mouse visual cortex.

Figure 4: The JHU-based Data-Scope, with its 10PB of storage and 0.5Tbytes/sec I/O performance.

Figure 4: The JHU-based Data-Scope, with its 10PB of storage and  0.5Tbytes/sec I/O performance.

The JHU Sheridan Libraries are running the Data Conservancy, originally started by an NSF grant, a project focusing on long term curation of digital collections. The Data Conservancy community and software incorporates lessons learned from over a decade of experience with archiving the SDSS data. Data Conservancy infrastructure design and architecture is based on the Open Archival Information System (OAIS) reference model developed initially by the space sciences community. This architecture accounts for the potential use of high-performance computing over data within an archive. Through a comprehensive information and library science research agenda, Data Conservancy now incorporates requirements from a range of disciplines including the “long-tail” of scientific data. In addition to supporting the functionality of other data repositories or archives, Data Conservancy software also includes a packaging tool based on the community standard BagIt format and the ability to generate and preserve Open Archives Initiative-Object Reuse and Exchange information graphs that connect data and publications and the associated provenance chains. This comprehensive approach has resulted in a data archive platform well suited for a range of scientific data formats and types.

The NSF has awarded several grants to JHU to improve the data-intensive capabilities on campus. First, JHU built the Data-Scope, a 10PB data supercomputer with 100 GPU cards, and a 500Gbytes/sec I/O bandwidth. In collaboration with the Mid-Atlantic Crossroads (MAX), JHU was among the first universities in the world to have 100G data connectivity. Recently, IDIES was awarded a $9.5M grant by the NSF DIBBs program to build more generally usable building blocks from elements of the SDSS archive. This new project, the SciServer, is well along the way to integrate management of the different data sets from astronomy to turbulence, connectomics and genomics, using economies of scale to serve large public collections of scientific data. In collaboration with the University of Maryland, College Park, JHU IDIES is close to completing a shared computational facility, hosted at the Bayview campus of JHU. The system, the Mid-Atlantic Research Computing Consortium (MARCC) will have more than 16,000 cores, about 100 Kepler K80 units and 20 petabytes of disk space. It will be connected to the two campuses and to Internet2 through a 100G connection.

Recently IDIES has awarded nine seed grants to jumpstart an effort in new areas, such as material science, urban planning, combining machine learning with molecular dynamics simulations.

Several new classes are designed in data science, and soon there will be a new data science concentration offered as part of the standard curriculum in the Whiting School of Engineering. Faculty in Biostatistics have created an immensely popular Coursera class in Data Science, with more than a million registered students.

The emergence of Big Data has a transformational role in research. Data driven discoveries are becoming the new “Fourth Paradigm” of science. It is clear that universities are well on the way to respond to these new challenges. IDIES scientists are working very hard to build multidisciplinary collaborations and create new, innovative projects at the frontier of the Science of Big Data.






March 2015 CERP Infographic

By Ama Nyame-Mensah, CERP Research Associate

Black and Hispanic Students at Minority-Serving Institutions are more interested in a computing research career than their counterparts attending  Non-Minority Serving Institutions

Note: Interest in a computing research career was measured by asking students to indicate how interested are you in having a career as a computing researcher in industry or a government lab after you finish your highest degree. A five-point Likert scale was used, ranging from (1) Very Disinterested to (5) Very Interested. Black: Includes Black or African American. Hispanic: Includes Hispanic or Latina/o. Minority Serving Institutions are (postsecondary) institutions in which minorities (e.g., Blacks or African Americans, Hispanics or Latina/os) exceed 50% of the total enrollment.

These data are brought to you by the CRA’s Center for Evaluating the Research Pipeline (CERP). CERP provides social science research and comparative evaluation for the computing community. To learn more about CERP, visit our website at

Disseminating CERP Research Findings to Promote Diversity in Computing and Other STEM Fields

By Jane Stout, CERP Director

Two years since its inception, CRA’s Center for Evaluating the Research Pipeline (CERP) has proven to be a valuable resource for the computing community. CERP’s benchmark survey research mechanism, the Data Buddies Project, generates reliably large and diverse datasets pertaining to computing students’ experiences in their degree programs. CERP’s data had originally been slated primarily for “comparative evaluation” purposes; students’ experiences gleaned from survey data are pitted against each other as a function of whether or not they have participated in a given professional development program. Since August 2014, CERP’s data have been harnessed for a second purpose, which is to conduct basic social science research on issues of diversity of computing. This new focus is supported by a new grant awarded to CRA: NSF DUE-1431112, Promoting a Diverse Computing Workforce: Using National Survey Data to Understand Persistence Across Undergraduate Student Groups, which was written and is overseen by CERP Director, Jane Stout.

CERP staff has been hard at work generating and disseminating preliminary research findings pertaining to underrepresented students’ experiences in the computing community. Findings are “preliminary,” given that CERP initiated a longitudinal research design during the fall of 2014 and has since collected one of many rounds of data from a sample of computing students. The ultimate goals of this research are to track students’ progress over time, measure antecedents to students’ successes, and generate Best Practices that departments can use to ensure underrepresented groups thrive in computing.

One line of inquiry focuses on women’s experiences in computing, with special focus on how women’s cultural background factors into those experiences. For instance, Asian American women report that they are less confident in their computing abilities than other women in computing (see Figure 1), despite the fact that Asian American women are among the top performers in their female peer group (see Figure 2). Research of this nature shines a spotlight on a specific group of talented students in computing who are relatively lacking in confidence; longitudinal data has the potential to assess whether and how these students overcome low confidence and succeed in computing. Stout has shared these and similar preliminary findings during invited presentations at prominent research institutions such as the Massachusetts Institute of Technology (MIT) and the Colorado School of Mines; at smaller institutions interested in fostering diversity, such as Augustana College and Millikin University; and departments outside of computing also suffering from low diversity (e.g., Physics Department, University of Colorado Boulder).

Figure1. Women's confidence that they can succeed in computing as a function of their ethnic background. Scale ranged from (1) Low confidence to (5) High confidence.

Figure1. Women's confidence that they can succeed in computing as a function of their ethnic background. Scale ranged from (1) Low confidence to (5) High confidence.

Figure 2. Women's GPA in their computing major as a function of their ethnic background.

Figure 2. Women's GPA in their computing major as a function of their ethnic background.

CERP staff are also sharing research findings with professional societies at conference venues. For instance, CERP Research Associate, Ama-Nyame Mensah will present research at the 2015 SIGCSE meeting (Computer Science Education SIG) suggesting that Research Experiences for Undergraduate Students (REUs) are more beneficial for students from underrepresented groups than well-represented students in computing. Heather Wright, CERP Research Associate, will present findings at the 2015 meeting of the Association for Psychological Science highlighting the value of creating a welcoming computing department environment, particularly for first generation college students.

CERP’s audiences for dissemination of research findings are intentionally diverse; they include educators and administrators within computing and in other STEM fields lacking diversity. CERP’s research also appeals to social scientists that do not have access to large and diverse data pools made possible by the CRA’s unique Data Buddies Project infrastructure. To find out more information on the Data Buddies Project, and become involved, visit:

2014 Taulbee Report Sneak Preview

By Stu Zweben and Betsy Bizot

The 2014 Taulbee Report will be published in the May 2015 issue of CRN. As we have done for the past few years, we’re providing a preview of the degree and enrollment numbers for bachelor’s and doctoral level programs in the departments responding to the survey.

The total number of Ph.D.s awarded dropped slightly (by 2.6 percent) from last year’s all-time high. The departments that responded this year reported 1,940 graduates in 2013-14; last year’s respondents reported 1,991 graduates.

The set of departments reporting from one year to the next varies somewhat. Thus, for trends it is of interest to focus on the set of departments that reported in both years. The accompanying table shows the one year comparison of some key bachelor’s and doctoral data for these departments.

Bachelor’s program enrollment, and indeed enrollment growth, showed little sign of abating. There was an 18.6 percent increase in enrollment from 2012-13 to 2013-14 in U.S. CS departments that reported both years. The corresponding increase for all departments reporting both years was 17.4 percent. Last year’s respective increases were 22.0 percent (U.S. CS) and 21.1 percent (overall). The number of new bachelor’s students in fall 2014 is up 17.0 percent over the fall 2013 figure in U.S. CS departments reporting new majors for both years (compared to 13.7 percent last year), and is up 18.0 percent among all departments reporting both years (13.8 percent last year). The number of bachelor’s graduates increased 13.6 percent among U.S. CS departments and 12.0 percent among all departments reporting both years.

At the doctoral level, overall Ph.D. production for 2013-14 among U.S. CS departments reporting both years fell 3.7 percent, and fell 4.1 percent among all departments reporting both years. However, total doctoral enrollment increased 3.9 percent among U.S. CS departments and 4.4 percent among all departments reporting both years. The number of new doctoral students for fall 2014 rose 4.7 percent among U.S. CS departments and 3.6 percent among all departments, when compared with the fall 2013 figures.

Watch the May 2015 CRN for a more complete analysis of the Taulbee data.

2014 Taulbee Report Sneak Preview

CCC Logo

CERP Logo CRA-E Logo CRA-W Logo

1828 L STREET, NW SUITE 800, WASHINGTON, DC 20036 | P: 202-234-2111 | F: 202-667-1066