CRA Bulletin

The CRA Bulletin frequently shares news, timely information about CRA initiatives, and items of interest to the general community.
Join the RSS feed to stay connected.


CRA Board Member Highlight: James Allan


By James Allan

James Allen

My current work focuses on support for critical literacy and efforts to foster new paths for equity in the sciences.

I currently serve as chair of the faculty in the College of Information and Computer Sciences at the University of Massachusetts Amherst. This means that, like most chairs, I divide my time between administrative work supporting our college’s faculty, advancing our college, and conducting my own research as co-director of the Center for Intelligent Information Retrieval.

One large effort that I am supporting as chair is an NSF-funded ADVANCE interdisciplinary project at UMass Amherst specifically tasked with supporting the development of an innovative professional advancement model for underrepresented faculty in STEM fields. With colleagues from the UMass Amherst colleges of engineering, natural sciences, and social and behavioral sciences, and with the guidance of the associate chancellor for equity and inclusion, I focus on using collaboration as a tool for fostering equity for underrepresented faculty in science and engineering fields. The ADVANCE project centers its research and programming on three essential elements: encouraging research collaboration, creating an inclusive community through mentoring, and promoting shared decision-making and governance at the department level. We call this the R3 model as it strategically uses resources, relationships, and recognition to encourage faculty collaboration and equity.

Outside of my work as chair, I am continuing my almost 30-year research program in Information Retrieval (IR), the science underlying search engines and related technology. One focus of my current work is how IR can support critical literacy by, for example, helping people recognize the agenda of a text’s author and a text’s broader context.

This is an exciting area of controversy detection research, and we develop algorithms to address the question “Is this a contentious topic?” when a person is looking at a web page or other document. To answer this query, we need to recognize disagreement in different modalities, to identify and describe the different stances of a topic, what the experts and trusted sources say, and how an item fits into the different stances. To analyze a document, we also need to find communities, stances, and styles of authors within or across topics. The analysis is done by determining who or what type of person wrote the text, their agenda, and by deciphering if there is a “style” for contentious text, disinformation, and fake news.

For example, you can find information on the Internet about the Issels treatment, which is described as a “comprehensive immunotherapy for cancer.” It is based on the idea that the body’s own immune system can be supercharged to get rid of cancer cells. A professionally created web site is quite convincing in its presentation of evidence and patient success stories, and many people reading the site would be willing to give the technique a try. There’s just one complication: little evidence suggests that the approach works. The American Cancer Society considers it ineffective and Quackwatch, a well-known debunker of health-care fraud and myths, lists it as a “dubious treatment.” Nonetheless, an unsuspecting reader is likely to be misled into thinking that there is no disagreement about the treatment’s effectiveness.

The aim of our newest project, Mirador, is to provide users with tools that illuminate the broader context of the topic of a single web page. It explores fundamental questions about how controversy can be modeled computationally so that it can be recognized “in the wild.” Also, the model allows an algorithm to extract an explanation of the nature of the controversy. This work will also apply and extend text analysis and comparison techniques.

We are not working specifically to recognize “fake news,” although much of the “fake” material is probably contentious and may indeed be flagged by the algorithms, but instead our goal is to assist people in their critical evaluation of the material and to help them understand why a page is educative or why it is not. Our hope is that they will recognize that a larger discussion is often involved and learn to critically evaluate information in the larger context, both online and offline.

Another project I am working on is a new search-based approach to the NLP challenge of information extraction. SearchIE is intended to allow personalized, situational identification of types (usually entities) in text, types that are of limited or ephemeral interest, for which it is unlikely resources will be available for massive annotation and learning efforts. The key advantage of SearchIE is that it defers decisions about which types should be identified until the decisions are needed. That allows new types to be searched and extracted at any point and allows types that are specific to a narrow or short-term information need to be identified and extracted, even if they are not of use to all users of the system. Previously, it has been effectively impossible to identify entities automatically unless a substantial effort went into annotating training data. The SearchIE approach makes it possible for someone to build personalized extractors contextualized by their topical interest. Since online information gathering almost always starts with search and frequently involves identifying items of interest in the found text, bringing these two together has the potential to change both search and the found text substantially.

My roles as chair and researcher overlap when faced with the problem of how to broaden participation in IR. While a little more than one third of Ph.D. students in the Center for Intelligent Information Retrieval are women, a lot of work needs to be done to attract and retain women and underrepresented minorities in IR research as well as information and computer sciences more broadly. My hope is that the faculty advancement model I’m helping to develop through UMass Amherst’s NSF ADVANCE project will help to bring a more diverse set of perspectives to our field.

About the Author
James Allan is a professor and chair of the faculty in the University of Massachusetts Amherst College of Information and Computer Sciences (CICS), which he joined in 1994 after receiving a Ph.D. in computer science from Cornell University. He is also the co-director of the Center for Intelligent Information Retrieval within CICS.

Allan has served on the organizing and program committees for major conferences, including the ACM Special Interest Group on Information Retrieval (SIGIR), Conference on Information and Knowledge Management (CIKM), and Web Search and Data Mining (WSDM). He currently serves on the editorial board of Foundation and Trends in Information Retrieval, and is a past associate editor for ACM’s Transactions on Information Systems and Elsevier’s Information Processing and Management journals. With his students, he received Best Paper awards from SIGIR as well as a Best Student Paper at the Conference on Human Information Interaction and Retrieval (CHIIR). He also received a SIGIR Test of Time Award in 2016 for a paper on event detection and tracking.