Ratings Redux

On behalf of the CRA Board

In the May 2010 issue of Computing Research News, we provided a perspective on our interactions with the National Research Council group tasked with evaluating and ranking doctoral programs. We outlined concerns with the pending ranking system, especially with regard to its plans to evaluate faculty publications and citations using a method we believe to be flawed. As reported in the CRN article, the NRC’s compromise was to remove the citation analysis and to augment the data used in the report with a list of conferences provided by the CRA, together with CVs submitted by faculty to the NRC.

On September 28, the NRC finally released its report, and the resulting debate in the academic community has been remarkable. Already a number of articles and blogs have raised questions about the study’s validity. Some cite concerns from highly respected statisticians about the overall methodology (see http://www.chroniclecareers.com/article/A-Critic-Sees-Deep-Problems-in/124725/). In particular, Stephen Stigler of the University of Chicago raises major concerns about the methodology and the associated ability of the ranking system to distinguish between programs, concluding that: “Little credence should be given” to NRC’s ranges of rankings (http://news.uchicago.edu/btn/nrc.summary.php). Others have raised concerns about the relevance of stale data (from 2002-2006).

Since the release, many departments have examined the data in detail and the associated ranges of rank produced by the NRC system. Based on their observations, there appear to be additional concerns about the NRC rankings-some unique to our discipline and some perhaps shared by other disciplines. We highlight some of those concerns.

We understand that the NRC is in a difficult situation. The report is very late, and the approach attempts a much larger study and much more ambitious statistical analysis of data associated with doctoral programs than previous ranking reports. Hence it is exploring new ground, and there will always be institutions unhappy with their ranking. However, we think it is important for the reputation and integrity of the NRC that it release a report and associated data set that are accurate and consistent; and it is (especially) important to the field of Computer Science that it be accurately reflected. Accuracy implies not only that the data are correct, but also that the data are measuring what the NRC intended (hence our earlier dialogue about computer science publications, where the data may have been an accurate count of what was measured, but the choice of data set did not accurately reflect the publication rate of the field, which is what NRC wanted to measure). Consistency implies that all the data are measuring the same thing. This is important for categories where individual institutions provided data, since some institutions misinterpreted what was requested and supplied data that skew the statistics.

We urged the NRC to delay the public release of the report and data set until institutions could respond to the study and provide corrected data. We realize that delaying may not have been possible, given pressure on the NRC to finally publish the study. Thus, as an alternative, we urged the NRC to correct errors of fact submitted to them within some reasonable time period. We understand the NRC will accept corrections for errors made by them, and will re-release the study in a month’s time; however, they will not correct errors for which they believe they are not at fault, but will simply list these on their web site.

What are possible types of flaws in the data?

Actual errors in measuring data—one assumes that there are few such errors (we note that the original release of one category of data in the Computer Science discipline was wrong due to a programming error, and was corrected by the NRC before the full release).
Instances in which the NRC has chosen to use incomplete or flawed data sources—one assumes the NRC will claim these are not errors, but one wonders whether errors of judgment were made in selecting these sources.
Instances in which ambiguity in the requirements led some institutions to select data in a manner significantly different from others—one assumes the NRC will claim these are not their errors. We understand that the misunderstanding may have occurred at the institution, but we believe that the NRC should not be “punishing” an institution for such a mistake; the goal should be to provide a picture of the field that is as accurate and consistent as possible.

Here are some of the issues that colleagues have found that indicate inaccurate or inconsistent data. Examples of flawed data sources:

Percent of graduates destined for academic positions: The NRC originally asked institutions to provide data on the current employment of all graduates over the specified period, and many institutions went to great effort to gather these data. However, rather than using the data, NRC decided to use the 2005 NSF Doctorate Records File. Unfortunately, these data are voluntary, under-sampled, and administered at time of graduation when students may not know their job plans. For example, NRC reports a 25% rate for a school whose internal data show 50%; it reports 0% for another school whose actual rate is 40%.
Percentage of faculty with grants: This was acquired from surveys sent to a subset of faculty, rather than requested directly from institutions. Some of the reported data simply don’t make sense, and several institutions report values that are wrong by significant margins (e.g., a reported rate of 80% when the actual rate is greater than 90%, a difference of one standard deviation).
Partial allocation of faculty: Departments were originally asked to classify faculty as primary or secondary in Computer Science, in Electrical and Computer Engineering, or in Computer Engineering. NRC ultimately decided not to rate CE departments; however, we understand that departments were not allowed to reallocate CE faculty to other areas. Thus such faculty members are only partially contributing to data—their publications, citations, and other factors are distributed between programs, or may not be counted at all.
Awards: We are very puzzled by the list of awards that the NRC used for this category. Apparently being a fellow of the ACM or the IEEE is not considered an honor worthy of consideration; and many other awards seem to be missing from the list.
Publications: We hope that NRC was able to include relevant conference publications. We remain concerned about the actual numbers being reported, which several institutions have questioned. We don’t know whether the NRC was able to match acronyms of conferences as listed in CVs against the list the CRA provided. Furthermore, we don’t know how many faculty actually submitted CVs, since institutions were not allowed to collect and submit the CVs; instead, the NRC required individual faculty members to provide them.

Examples of inconsistent data reporting:

Reporting of faculty: Some schools misinterpreted this criterion, and reported all faculty and research staff involved in doctoral thesis committees or supervision. Many others interpreted this criterion more narrowly, and reported only full-time faculty in the department. This has a significant impact on the data, since it dramatically changes the denominator in any ratio-based statistic, and leads to inconsistent data among schools. More than one school suffered from this misinterpretation. While I have heard suggestions that this is “too bad” for those schools, presumably we want rankings that reflect actual consistent assessment of the field, and not have institutions “punished” for misinterpreting what is requested.
Reporting of associated faculty: Some schools misinterpreted this category, which NRC is using to measure interdisciplinary faculty. A number of schools well known for having very interdisciplinary research (as measured by funding sources or research collaborations) are reported in the NRC data with 0% interdisciplinary faculty. Unfortunately, the NRC changed its plans for assessing interdisciplinary research late in the game. Originally this was to be assessed as part of faculty surveys; after the NRC decided to measure it through associated faculty, it declined to allow institutions to update their designation of faculty.

So where does this leave the community? With questions about the quality of the data for many categories—either inconsistent across institutions or simply not an accurate reflection of the category—one has to feel that Prof. Stigler’s conclusion holds even more so. After all, as computer scientists, we certainly understand: “Garbage in, garbage out.”

Eric Grimson is the Bernard Gordon Professor of Medical Engineering and head of the electrical Engineering and Computer Science Department at MIT.