This article is published in the May 2017 issue.

Research Highlight: CRA Board Member Margo Seltzer


“What are computer users doing thatSeltzer_margo is wasting their time?” This question guides my research. I construe computer systems research quite broadly. If I can build it, it’s a systems problem. This breadth has let me pursue questions in visualization as well as operating systems; machine learning and computer architecture; file systems, performance analysis, graph processing, databases, and numerous other areas. Some people might say I have a short attention span; I like to claim that I have broad interests!

There are many examples where my or my students’ observations about the world and how people’s time and energy were being wasted led to interesting systems research problems.

For example, my group engaged in about a decade of research on data provenance because we kept hearing from scientists who wanted to be able to document their experiments, make them reproducible, and manage their unwieldy collections of code and data.

Data provenance is a formal description of the way data comes to be in its current form and location. It contains answers to questions such as “Where did this data come from? Who has seen it? How has it been transformed?”

While we started working in this area to solve the problems of computational scientists, we now hear from colleagues in industry who want to use data provenance to help them enforce their companies’ data management policies or to comply with federal regulations such as HIPAA or Sarbanes-Oxley.

Provenance data is most frequently represented as a graph, so the study of data provenance led us to the investigation of graph databases and graph processing systems. We thought we were starting out to build a graph database, and we developed some important pieces of technology — native graph storage structures that support mutability and multi-version querying and high performance graph partitioning algorithms. Then we got distracted and started studying the structure of graphs and what metrics best described the semantic information encoded in them. Recently, one of my students completed a study that suggests that the community has not been doing a great job of evaluating graph processing systems.

Another area of research that has kept me busy for quite a while is our Automatically Scalable Computation (ASC) project, which is a collaboration with Boston University. It grew out of watching our colleagues in the domain sciences grow frustrated having to rewrite their software every time they got a new parallel machine from which they wanted to extract maximum performance. Our dream is “push button parallelization”; that is, you write your program just as you would for a single processor and if you detect that computational resources are available, you can take advantage of them to make your program run more quickly. Our current implementation of this vision transforms the problem into a machine learning and speculative execution problem. Our system monitors a program’s execution, builds a model of it, and then when cores are available, launches threads to speculatively execute computations it thinks might be useful in the future.

Those are just a few examples of the kinds of projects my research group addresses. My other passion (besides women’s soccer) is teaching or, as I like to call it, facilitating my students’ learning.

Approximately five years ago, I began revising all my undergraduate courses so that I could teach them in a flipped style. That is a teaching approach where students complete reading or video assignments and answer a few questions on them prior to attending class. Then we spend class time engaged in small-group problem solving, where I and my teaching staff wander around the classroom, interacting with the different groups and trying to spend our time with those students who can benefit most from our assistance.

My first experience flipping a course was with my insanely time-consuming operating systems course. Students report spending 30 hours per week completing the long but rewarding problem sets–students start with a simple operating system kernel and build user-level processes, a virtual memory system, and a journaling file system.

I blogged about my first experience flipping the course here: http://mis-misinformation.blogspot.com/2013/08/an-index-to-my-flipping-blog-postings.html

I distilled the experience into these 10 bullet points:

1. It’s good for an old dog to learn new tricks.

This is really about making sure your teaching doesn’t get stale. It’s way too easy to keep teaching the same thing over and over again. Whether you use new pedagogy, new technological breakthroughs, or just good self-discipline, it’s important to keep classes fresh.

2. Flipping lets me spend time with students for whom the material is challenging.

This is so obvious in retrospect, but so exhilarating in practice. I have always run a relatively interactive class, but for the most part, the students who ask and answer questions in class are not the ones who need you the most. They are typically the most confident students and are not struggling to understand the material. The silent ones are the ones who are frequently struggling, and the time spent helping these students in small groups during class time is incredibly useful.

3. Learning takes place by doing, not by listening to me.

There are a lot of different styles of hands-on learning, but I think this point cannot be emphasized enough. Learning is not just the process of transferring information from me to students. Learning is about gaining new information and knowing how to use it. That latter part requires practice.

4. Teaching fellows’ engagement is critical.

We call our teaching assistants “teaching fellows” or TFs for short. Flipping effectively requires a staff who are comfortable engaging with students, walking them through problems, and posing the right questions. I am extraordinarily fortunate to have truly amazing and dedicated teaching staff.

5. It takes a lot of effort to come up with effective in-class work.

It’s important that the in-class exercises or problems relate to both the concepts the students are learning and the homework or problem sets they will be doing. Designing these exercises so they can be completed in the time allotted and add real value to the course is demanding.

6. Pre-class web forms are AWESOME.

They let me engage with students in an entirely different way and to gather lots of interesting data. This is perhaps the best surprise of all! I used Google forms to have students submit answers to the pre-class questions. This created a mechanism I could use to obtain all sorts of useful information, including how things were going in partnerships, how much time people were spending on various parts of the assignment, what was working for students, what wasn’t working, etc. Once you have students regularly filling out forms, they will answer anything you put there, and you can use that information to make the class better. Score!

7. CS161 is even more time-intensive than I thought.

I had been saying 20 hours per week for decades; when the going gets rough, students were regularly reporting 30-hour weeks. Oops.

8. It would be useful to help students learn what it really means to design something.

Software design is hard! We spend a lot of time in class doing small group design exercises. I could imagine developing an entire course around this idea.

9. Flipping is a great equalizer when students enter with different experience levels or exposure to different topics.

It’s relatively easy to provide supplementary material as pre-class work so that students who have gaps in their background can catch up.

10. Fully integrated and coordinated materials take real effort but pay off tremendously.

This should be a no-brainer, but thinking deeply about the relationship between the videos I prepared, the exercises we completed in class, and the problem sets was time well spent.

About the Author

Margo I. Seltzer is the Herchel Smith Professor of Computer Science and the faculty director of the Center for Research on Computation and Society (CRCS) in Harvard’s John A. Paulson School of Engineering and Applied Sciences. Her research interests are in systems, construed quite broadly: systems for capturing and accessing data provenance, file systems, databases, transaction processing systems, storage and analysis of graph-structured data, new architectures for parallelizing execution, and systems that apply technology to problems in healthcare.

She is the author of several widely used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an architect for Oracle Corporation. She is the USENIX representative to the Computing Research Association board of directors, a member of the Computer Science and Telecommunications Board of the National Academies, and a past president of the USENIX Association. She is a Sloan Foundation Fellow in computer science, an ACM Fellow, and a Bunting Fellow. She was the recipient of the 1996 Radcliffe Junior Faculty Fellowship and the University of California Microelectronics Scholarship. She is recognized as an outstanding teacher and won the Phi Beta Kappa teaching award in 1996 and the Abramson Teaching Award in 1999.

Professor Seltzer received an A.B. degree in applied mathematics from Harvard/Radcliffe College and a Ph.D. in computer science from the University of California, Berkeley.