Expanding the Pipeline: Gender and Ethnic Differences in PhD Specialty Areas
By Betsy Bizot, CRA Director of Statistics and Evaluation
This article examines gender and residency/ethnicity differences in PhD specialty areas as reported to the CRA Taulbee Survey from 2012-2018. The Taulbee Survey is conducted each fall and, among other questions, asks doctoral departments of Computer Science, Computer Engineering, and Information for data about each PhD they awarded in the previous academic year. The data on each new PhD includes gender, residency/race/ethnicity, and PhD specialty area. A total of 12,968 PhDs were awarded by Taulbee respondents during the 7 year period from 2012-2018. Of those, the specialty area was listed as Other or Unknown for 3,328. Those individuals are omitted from the analyses described here; individuals where gender was not provided or where residency/ethnicity was listed as Unknown are not included in the analysis by that category.
Table 1 shows the list of specialty areas by gender. Each row lists the number and percent of women in that specialty area (that is, of all women PhD recipients, what percentage specialized in that area), the number and percent of men in the specialty area, and the total number and percent in that area. Proportion differences were tested for significance by z test; the table flags the areas where women are more likely to specialize and where men are.
Women are significantly more likely to specialize in Databases/Information Retrieval, Human-Computer Interaction, Information Science, and Social Computing/Computer Supported Cooperative Work. Note that this does not necessarily mean that there are high numbers of women in these areas, just that the proportion is relatively high compared to women’s overall representation. The highest number of women is in Artificial Intelligence/ Machine Learning, but their representation in that area is not significantly different from men’s.
Men are significantly more likely to specialize in Graphics/Visualization, Hardware/Architecture, Operating Systems, Programming Languages/Compilers, and Robotics/Vision.
Table 1. Gender Differences in PhD Specialty Areas from Taulbee Survey 2012-2018.
Female | Male | Total | |||||
Specialty Area | N | % | N | % | N | % | |
Artificial Intelligence / Machine Learning | 265 | 13.4 | 1211 | 14.6 | 1476 | 14.4 | |
Computing Education | 14 | 0.7 | 35 | 0.4 | 49 | 0.5 | |
Databases / Information Retrieval | 185 | 9.4* | 620 | 7.5 | 805 | 7.8 | |
Graphics / Visualization | 96 | 4.9 | 567 | 6.8* | 663 | 6.5 | |
Hardware / Architecture | 64 | 3.2 | 468 | 5.6* | 532 | 5.2 | |
High Performance Computing | 50 | 2.5 | 293 | 3.5 | 343 | 3.3 | |
Human-Computer Interaction | 173 | 8.8* | 312 | 3.8 | 485 | 4.7 | |
Informatics: Biomedical/Other Science | 109 | 5.5 | 374 | 4.5 | 483 | 4.7 | |
Information Science | 165 | 8.4* | 163 | 2.0 | 328 | 3.2 | |
Information Systems | 49 | 2.5 | 183 | 2.2 | 232 | 2.3 | |
Networks | 145 | 7.3 | 748 | 9.0 | 893 | 8.7 | |
Operating Systems | 52 | 2.6 | 330 | 4.0* | 382 | 3.7 | |
Programming Languages / Compilers | 42 | 2.1 | 375 | 4.5* | 417 | 4.1 | |
Robotics / Vision | 79 | 4.0 | 462 | 5.6* | 541 | 5.3 | |
Scientific / Numerical Computing | 27 | 1.4 | 131 | 1.6 | 158 | 1.5 | |
Security / Information Assurance | 99 | 5.0 | 537 | 6.5 | 636 | 6.2 | |
Social Computing / CSCW | 72 | 3.6* | 149 | 1.8 | 221 | 2.2 | |
Software Engineering | 183 | 9.3 | 790 | 9.5 | 973 | 9.5 | |
Theory and Algorithms | 106 | 5.4 | 551 | 6.6 | 657 | 6.4 | |
Total | 1975 | 19.2 | 8299 | 80.8 | 10274 | ||
* Proportion of this gender is significantly higher by z-test, p<.01
For Table 2, which shows specialty areas by residency/ethnicity, new PhDs are divided into three categories: International students (those on temporary visas), Domestic Underrepresented Minority students (URM; includes citizens and permanent residents of race/ethnicity Native American or Alaskan Native, Black/African American, Hispanic, and Native Hawaiian or Pacific Islander), and Domestic Majority students (citizens and permanent residents who are Asian or white).
The significance of difference in proportions was tested pairwise three ways: Domestic URM vs. International, Domestic URM vs. Domestic Majority, and International vs. Domestic Majority. Entries in the table are flagged for significant differences. Domestic URMs are:
- Less likely than Domestic Majority to specialize in Artificial Intelligence
- More likely than either International or Domestic Majority to specialize in Human-Computer Interaction
- More likely than either International or Domestic Majority to specialize in Information Science
- Less likely than International to specialize in Networks
- Less likely than either International or Domestic Majority to specialize in Theory and Algorithms
Compared to International PhD recipients, Domestic Majority students were more likely to specialize in Artificial Intelligence/Machine Learning, Computing Education, Human-Computer Interaction, Information Science, and Programming Languages/Compilers; they were less likely to specialize in Databases/Information Retrieval, High Performance Computing, Networks, and Operating Systems.
The highest numbers of Domestic URM students were in the specialty areas of Human-Computer Interaction and Software Engineering. The highest numbers of Domestic Majority students were in Artificial Intelligence/Machine Learning and Software Engineering.
Table 2. Residency/Ethnic Differences in PhD Specialty Areas, from Taulbee Survey 2012-2018.
International | Domestic URM | Domestic Majority | Total | ||||||
Specialty Area | N | % | N | % | N | % | N | % | |
Artificial Intelligence / Machine Learning | 754 | 13.3 | 27 | 8.4+ | 574 | 15.9** | 1355 | 14.1 | |
Computing Education | 18 | 0.3 | 3 | 0.9 | 26 | 0.7** | 47 | 0.5 | |
Databases / Information Retrieval | 491 | 8.7 | 24 | 7.5 | 215 | 5.9** | 730 | 7.6 | |
Graphics / Visualization | 378 | 6.7 | 12 | 3.8 | 226 | 6.2 | 616 | 6.4 | |
Hardware / Architecture | 305 | 5.4 | 18 | 5.6 | 185 | 5.1 | 508 | 5.3 | |
High Performance Computing | 219 | 3.9 | 11 | 3.4 | 103 | 2.8** | 333 | 3.5 | |
Human-Computer Interaction | 183 | 3.2 | 40 | 12.5*+ | 225 | 6.2** | 448 | 4.7 | |
Informatics: Biomedical/Other Science | 272 | 4.8 | 19 | 5.9 | 171 | 4.7 | 462 | 4.8 | |
Information Science | 109 | 1.9 | 30 | 9.4*+ | 172 | 4.8** | 311 | 3.2 | |
Information Systems | 135 | 2.4 | 7 | 2.2 | 69 | 1.9 | 211 | 2.2 | |
Networks | 617 | 10.9 | 12 | 3.8* | 209 | 5.8** | 838 | 8.7 | |
Operating Systems | 251 | 4.4 | 10 | 3.1 | 111 | 3.1** | 372 | 3.9 | |
Programming Languages / Compilers | 202 | 3.6 | 9 | 2.8 | 173 | 4.8** | 384 | 4.0 | |
Robotics / Vision | 298 | 5.3 | 11 | 3.4 | 210 | 5.8 | 519 | 5.4 | |
Scientific / Numerical Computing | 68 | 1.2 | 6 | 1.9 | 65 | 1.8 | 139 | 1.4 | |
Security / Information Assurance | 334 | 5.9 | 24 | 7.5 | 246 | 6.8 | 604 | 6.3 | |
Social Computing / CSCW | 108 | 1.9 | 10 | 3.1 | 93 | 2.6 | 211 | 2.2 | |
Software Engineering | 527 | 9.3 | 40 | 12.5 | 325 | 9.0 | 892 | 9.3 | |
Theory and Algorithms | 387 | 6.8 | 7 | 2.2*+ | 221 | 6.1 | 615 | 6.4 | |
Total | 5656 | 58.9 | 320 | 3.3 | 3619 | 37.7 | 9595 | ||
Significance tests, all by pairwise z-test, p<.01
* Domestic URM significantly different from International
+ Domestic URM significantly different from Domestic Majority
** Domestic Majority significantly different from International