Research

Kotek, Dockum & Sun, Gender bias and stereotypes in Large Language Models (The ACM Collective Intelligence Conference Series 2023)

Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs’ behavior with respect to gender stereotypes, a known issue for prior models. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset, which is likely to be included in the training data of current LLMs. We test four recently published LLMs and demonstrate that they express biased assumptions about men and women’s occupations. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person’s gender; (b) these choices align with people’s perceptions better than with the ground truth as reflected in official job statistics; (c) LLMs in fact amplify the bias beyond what is reflected in perceptions or the ground truth; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize the ambiguity; (e) LLMs provide explanations for their choices that are factually inaccurate and likely obscure the true reason behind their predictions. That is, they provide rationalizations of their biased behavior. This highlights a key property of these models: LLMs are trained on imbalanced datasets; as such, even with the recent successes of reinforcement learning with human feedback, they tend to reflect those imbalances back at us. As with other types of societal biases, we suggest that LLMs must be carefully tested to ensure that they treat minoritized individuals and communities equitably.

Xiu, Cheng, Sun et al., Feedback Effect in User Interaction with Intelligent Assistants: Delayed Engagement, Adaption and Drop-out (The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2023)

With the growing popularity of intelligent assistants (IAs), evaluating IA quality becomes an increasingly active field of research. This paper identifies and quantifies the feedback effect, a novel component in IA-user interactions: how the capabilities and limitations of the IA influence user behavior over time. First, we demonstrate that unhelpful responses from the IA cause users to delay or reduce subsequent interactions in the short term via an observational study. Next, we expand the time horizon to examine behavior changes and show that as users discover the limitations of the IA’s understanding and functional capabilities, they learn to adjust the scope and wording of their requests to increase the likelihood of receiving a helpful response from the IA. Our findings highlight the impact of the feedback effect at both the micro and meso levels. We further discuss its macro-level consequences: unsatisfactory interactions continuously reduce the likelihood and diversity of future user engagements in a feedback loop.

Sun et al., Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution (The 28th International Conference on Computational Linguistics (COLING) 2020)

This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the result- ing error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.

Sun et al., Interpreting Spoken Requests (US Patent Published 2021)

Sun et al., Paraphrase-Invariant Intent Classification in Voice Assistant Systems

Some voice assistants allow users to invoke a series of system commands with a single pre-set utterance. To achieve this task, the assistant must be able to successfully map a given utterance to a user-defined invocation phrase (UDIP). Due to privacy and data collection constraints, it is often not possible to exhaus- tively train a personalized classifier for every UDIP. Thus we present method to efficiently extract relevant user data and a recurrent siamese stack-ensemble classifier that can generalize well to new classes (UDIPs) from unknown distributions. We show state-of-the-art results on the given task based on metrics derived from empirical considerations in context.

Understanding Coalition Dynamics in Multiparty Conflicts: An Agent-based Approach with Multi-Objective Framework (Doctoral Dissertation)

B.G. Silverman, David Q. Sun et. al., Book Chapter 17 in Modeling Sociocultural Influences on Decision Making: Understanding Conflict, Enabling Stability, CRC Press (2016)

Q. Sun, Predicting Mean-Field Theory Accuracy for SIS Model on Real-World Networks (Master Thesis, 2014)

Mean-field theories are the most common form of analytical approximation applied in the studies of dynamics on complex networks. However, the accuracy of the theory varies across networks with different topological characteristics. We developed a new metric that utilizes all available data points from numerical simulations while ensuring the computational feasibility in the process of evaluating MF accuracy, and narrowed down the search to a few non-trivial topological features with some simple statistical analysis. Instead of relying on the statistical analysis alone, we returned to the basics of mean-field theories and inspected possible assumptions of the theory that may have caused the variation of MF accuracies across different networks – we argue that the omission of dynamical correlations, among all three MF assumptions, is the main cause for such discrepancies and would therefore develop a proxy parameter to capture the particular structural property that would make such omission costly in our estimation. We provided a counter-example (and explanation) for the limitations of a predictor proposed by Gleeson et. al. (2012), and revisited the concepts of expander graph and spectral gap in search of our “reliable predictor” – and managed to establish the connection between eigenvalue gap, goodness of expansion, and neighborhood topology and prove that the spectral gap would serve as a lower bound for the goodness of MF approximation. In brief, we have proven that HMF approximation method yield less accurate results in networks with small spectral gap (and not-so-good expansion).

Q. Sun & A. Wu, Innovative Investing through Networks: A Non-Bayesian Learning Model of Venture Capital (2013)

Investors and managers learn about new technologies through inter-organizational networks to stay competitive in rapidly evolving markets. We develop a novel non-Bayesian social learning model where agents apply a learning rule to incorporate the knowledge of neighbors: agents update their knowledge based on a convex combination of the Bayesian posterior belief conditioned on private knowledge and the observed opinions of its neighbors, with heterogeneous weighting of neighborhood knowledge and a budget constraint. We evaluate and estimate our model in the context of syndication networks in the life sciences venture capital industry. Using the universe of venture capital investments in life sciences startups from 1985 to 2010, we find robust evidence that venture capitalists are learning about new technologies through their syndication networks. Venture capitalists connected through stronger ties are more likely to be influenced by others, and those with great connectivity and size are more likely to influence their peers.

V.M. Preciado, M. Zargham, Q. Sun, A Convex Framework to Control Spreading Processes in Directed Networks (2014)

We propose a convex optimization framework to compute the optimal distribution of protection resources in order to control a spreading process propagating throughout a network of contacts. The spreading process under consideration is an extension of the popular SIS model of viral infection in a network with non-identical nodes and directed edges. We assume we have a limited budget available to invest on three types of network protection resources: (i) Edge control resources, (ii), preventative resources and (iii) corrective resources. Edge control resources are employed to impose restrictions on the contact rates across directed edges in the contact network. Preventative resources are allocated to nodes in order to reduce the probability of infection at that node (e.g. vaccines), and corrective resources are allocated to nodes to increase the recovery rate at that node (e.g. antidotes). We assume these resources have monetary costs associated with them, from which we formalize an optimal budget allocation problem which maximizes containment of the infection. We present a polynomial time solution to the optimal budget allocation problem using Geometric Programming (GP) for an arbitrary weighted and directed contact network and a large class of resource cost functions. We illustrate our approach with numerical simulations in a real-world air transportation network.

V.M. Preciado, M. Zargham, Q. Sun, Traffic Control for Network Protection Against Spreading Processes (2013)

Epidemic outbreaks in human populations are facilitated by the underlying transportation network. We consider strategies for containing a viral spreading process by optimally allocating a limited budget to three types of protection resources: (i) Traffic control resources, (ii), preventative resources and (iii) corrective resources. Traffic control resources are employed to impose restrictions on the traffic flowing across directed edges in the transportation network. Preventative resources are allocated to nodes to reduce the probability of infection at that node (e.g. vaccines), and corrective resources are allocated to nodes to increase the recovery rate at that node (e.g. antidotes). We assume these resources have monetary costs associated with them, from which we formalize an optimal budget allocation problem which maximizes containment of the infection. We present a polynomial time solution to the optimal budget allocation problem using Geometric Programming (GP) for an arbitrary weighted and directed contact network and a large class of resource cost functions. We illustrate our approach by designing optimal traffic control strategies to contain an epidemic outbreak that propagates through a real-world air transportation network.