Unveiling Patterns: Undergraduate Ventures Into Statistically Significant Pattern Mining
By Yasra Chandio (CRA-E Fellow, University of Massachusetts Amherst) and Alejandro Velasco Dimate (CRA-E Fellow, College of William & Mary)
This Q&A highlight features Stefan Walzer-Goldfled, a Finalist in the 2023 CRA Outstanding Undergraduate Researchers award program. Stefan finished his undergraduate degree at Amherst College and is now pursuing a Master’s in Environmental Modelling at the University College of London.
How did you find your first research opportunity?
In the summer of 2020, after my freshman year at Amherst College, during the height of the pandemic, I reached out to all computer science professors for research opportunities at Amherst College. Despite completing only introductory computer science courses, Prof. Matteo Riondato, my advisor, took me under his wing to embark on a research journey. At the outset, Prof. Riondato provided a personalized course in probability and computing, a crucial foundation for delving into his research projects.
How did you identify your first research project?
After a semester under Prof. Riondato’s guidance, I collaborated with another student, Alexander Lee, on my inaugural research project, focusing on developing a parallel algorithm for balanced sampling. A parallel algorithm for balanced sampling is like having many helpers working together to ensure that every group in a dataset is fairly considered at the same time, preventing unfair bias. This project aimed to expand upon an existing sequential algorithm for balanced sampling. Drawing inspiration from other parallel algorithms, we successfully introduced parallelism, resulting in near-perfect speed improvements for generating balanced samples while preserving their equilibrium.
How did you navigate the transfer of knowledge between different projects?
Upon completing my initial research project with Prof. Riondato, I joined forces with fellow student Steedman Jenkins in a summer research program on campus, again working under Prof. Riondato. This second project, separate from the first, focused on exploring mining statistically significant frequent sequential patterns in transactional datasets. Drawing from the valuable lessons learned during my initial research with Prof. Riondato, an experience I found highly enriching, I eagerly embraced the opportunity to delve into this new research project despite its lack of connection to the previous one.
Can you tell us about your project?
Our research (second project), initially proposed by Prof. Riondato, evolved as we aimed to develop efficient algorithms for mining statistically significant frequent patterns (SFPs) under various data generation assumptions (null models). These assumptions encompassed scenarios like random uniform distribution, where events occur with equal probability; temporal dependency reflecting patterns influenced by time; and spatial distribution indicating patterns influenced by spatial proximity. The existing methods were slow and statistically inexact, prompting us to improve them, define novel null models, and build efficient algorithms. Subsequently, we explored existing algorithm’s applications by scrutinizing assumptions about the data generation process in each instance. This exploration, in turn, guided us in developing novel null models. We restructured the problem, applied new statistical techniques, and developed two methods for each null model. Our research yielded a significantly faster than current state-of-the-art and statistically exact method for mining SFPs in the existing null model as described in the literature. Published in the Data Mining and Knowledge Discovery journal, we presented our work at ECML PKDD ’22.
What did you learn about teamwork and collaboration from this research experience?
My most enjoyable aspect of the research was the collaborative nature of the process. Much of the work took place during the summer alongside a peer who was also one of my closest friends. The mutual exchange of ideas and joint problem-solving, even when faced with challenges, was incredibly enriching. Additionally, closely collaborating with our advisor, Prof. Riondato, offered valuable insights into how a seasoned computer scientist tackles problems.
Did you find any of your outside interests had any interplay with your research experience?
While working on my research, I held the roles of captain and president of the Men’s Club Soccer team at Amherst College, in addition to being a climbing club member. This schedule meant engaging in soccer or climbing almost every day of the week. Balancing these extracurricular activities with my research and coursework posed a definite challenge. However, after participating in these physical activities, I discovered that I was significantly more productive, both in terms of focus and the quality of my work. Therefore, I considered these extracurricular pursuits essential for my academic success.
Do you have any advice for other students looking to get into research?
I highly recommend getting involved in research if the opportunity arises! This experience proved incredibly rewarding, providing me with a distinct set of computer science skills that surpassed what I gained in my classes. It also offered valuable insights into the broader realm of academia. It is equally imperative to seek an advisor with whom you not only share a close working relationship but also find collaboration enjoyable. Additionally, selecting a research topic that genuinely engages your interest is crucial for a fulfilling research experience.