Learning from Diverse and Small Data
Ramya Korlakai Vinayak, Assistant Professor, University of Wisconsin–Madison
Machine learning (ML) algorithms are becoming ubiquitous in various application domains such as public health, genomics, psychology, and social sciences. In these domains, data is often obtained from populations that are diverse, e.g., varying demographics, phenotypes, preferences etc. Many ML algorithms focus on learning model parameters that work well on average over the population but do not capture the diversity. On the other hand, such datasets usually have few observations per individual that limits our ability to learn about each individual separately. Question of interest in these scenarios is, how can we reliably capture the diversity in the data in small data settings?
In this talk, we will address this question in the following settings:
(i) In many applications, we observe count data which can be modeled as Binomial (e.g., polling, surveys, epidemiology) or Poisson (e.g., single cell RNA data) data. As a single or finite parameters do not capture the diversity of the population in such datasets, they are often modeled as nonparametric mixtures. In this setting, we will address the following question, “how well can we learn the distribution of parameters over the population without learning the individual parameters?” and show that nonparametric maximum likelihood estimators are in fact minimax optimal.
(ii) Learning preferences from human judgements using comparison queries plays a crucial role in cognitive and behavioral psychology, crowdsourcing democracy, surveys in social science applications, and recommendation systems. Models in the literature often focus on learning average preference over the population due to the limitations on the amount of data available per individual. We will discuss some recent results on how we can reliably capture diversity in preferences while pooling together data from individuals.
Ramya Korlakai Vinayak is an assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of Computer Science and the Dept. of Statistics at the University of Wisconsin–Madison. Her research interests span the areas of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical and practical challenges that arise when learning from societal data. Prior to joining UW Madison, Ramya was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She obtained her Masters from Caltech and Bachelors from IIT Madras. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in EECS workshop in 2019. She is the recipient of NSF CAREER Award 2023-2028.