This paper develops and implements a scalable methodology for (a) estimating the noisiness of labels produced by a typical crowdsourcing semantic annotation task, and (b) reducing the result- ing error of the labeling process by as much as 20-30% in comparison to other common labeling strategies. Importantly, this new approach to the labeling process, which we name Dynamic Automatic Conflict Resolution (DACR), does not require a ground truth dataset and is instead based on inter-project annotation inconsistencies. This makes DACR not only more accurate but also available to a broad range of labeling tasks. In what follows we present results from a text classification task performed at scale for a commercial personal assistant, and evaluate the inherent ambiguity uncovered by this annotation strategy as compared to other common labeling strategies.
Sun et al., Paraphrase-Invariant Intent Classification in Voice Assistant Systems
Some voice assistants allow users to invoke a series of system commands with a single pre-set utterance. To achieve this task, the assistant must be able to successfully map a given utterance to a user-defined invocation phrase (UDIP). Due to privacy and data collection constraints, it is often not possible to exhaus- tively train a personalized classifier for every UDIP. Thus we present method to efficiently extract relevant user data and a recurrent siamese stack-ensemble classifier that can generalize well to new classes (UDIPs) from unknown distributions. We show state-of-the-art results on the given task based on metrics derived from empirical considerations in context.
Understanding Coalition Dynamics in Multiparty Conflicts: An Agent-based Approach with Multi-Objective Framework (Doctoral Dissertation)
B.G. Silverman, David Q. Sun et. al., Book Chapter 17 in Modeling Sociocultural Influences on Decision Making: Understanding Conflict, Enabling Stability, CRC Press (2016)
Q. Sun, Predicting Mean-Field Theory Accuracy for SIS Model on Real-World Networks (Master Thesis, 2014)
Mean-field theories are the most common form of analytical approximation applied in the studies of dynamics on complex networks. However, the accuracy of the theory varies across networks with different topological characteristics. We developed a new metric that utilizes all available data points from numerical simulations while ensuring the computational feasibility in the process of evaluating MF accuracy, and narrowed down the search to a few non-trivial topological features with some simple statistical analysis. Instead of relying on the statistical analysis alone, we returned to the basics of mean-field theories and inspected possible assumptions of the theory that may have caused the variation of MF accuracies across different networks – we argue that the omission of dynamical correlations, among all three MF assumptions, is the main cause for such discrepancies and would therefore develop a proxy parameter to capture the particular structural property that would make such omission costly in our estimation. We provided a counter-example (and explanation) for the limitations of a predictor proposed by Gleeson et. al. (2012), and revisited the concepts of expander graph and spectral gap in search of our “reliable predictor” – and managed to establish the connection between eigenvalue gap, goodness of expansion, and neighborhood topology and prove that the spectral gap would serve as a lower bound for the goodness of MF approximation. In brief, we have proven that HMF approximation method yield less accurate results in networks with small spectral gap (and not-so-good expansion).
Q. Sun & A. Wu, Innovative Investing through Networks: A Non-Bayesian Learning Model of Venture Capital (2013)
Investors and managers learn about new technologies through inter-organizational networks to stay competitive in rapidly evolving markets. We develop a novel non-Bayesian social learning model where agents apply a learning rule to incorporate the knowledge of neighbors: agents update their knowledge based on a convex combination of the Bayesian posterior belief conditioned on private knowledge and the observed opinions of its neighbors, with heterogeneous weighting of neighborhood knowledge and a budget constraint. We evaluate and estimate our model in the context of syndication networks in the life sciences venture capital industry. Using the universe of venture capital investments in life sciences startups from 1985 to 2010, we find robust evidence that venture capitalists are learning about new technologies through their syndication networks. Venture capitalists connected through stronger ties are more likely to be influenced by others, and those with great connectivity and size are more likely to influence their peers.
V.M. Preciado, M. Zargham, Q. Sun, A Convex Framework to Control Spreading Processes in Directed Networks (2014)
We propose a convex optimization framework to compute the optimal distribution of protection resources in order to control a spreading process propagating throughout a network of contacts. The spreading process under consideration is an extension of the popular SIS model of viral infection in a network with non-identical nodes and directed edges. We assume we have a limited budget available to invest on three types of network protection resources: (i) Edge control resources, (ii), preventative resources and (iii) corrective resources. Edge control resources are employed to impose restrictions on the contact rates across directed edges in the contact network. Preventative resources are allocated to nodes in order to reduce the probability of infection at that node (e.g. vaccines), and corrective resources are allocated to nodes to increase the recovery rate at that node (e.g. antidotes). We assume these resources have monetary costs associated with them, from which we formalize an optimal budget allocation problem which maximizes containment of the infection. We present a polynomial time solution to the optimal budget allocation problem using Geometric Programming (GP) for an arbitrary weighted and directed contact network and a large class of resource cost functions. We illustrate our approach with numerical simulations in a real-world air transportation network.
V.M. Preciado, M. Zargham, Q. Sun, Traffic Control for Network Protection Against Spreading Processes (2013)
Epidemic outbreaks in human populations are facilitated by the underlying transportation network. We consider strategies for containing a viral spreading process by optimally allocating a limited budget to three types of protection resources: (i) Traffic control resources, (ii), preventative resources and (iii) corrective resources. Traffic control resources are employed to impose restrictions on the traffic flowing across directed edges in the transportation network. Preventative resources are allocated to nodes to reduce the probability of infection at that node (e.g. vaccines), and corrective resources are allocated to nodes to increase the recovery rate at that node (e.g. antidotes). We assume these resources have monetary costs associated with them, from which we formalize an optimal budget allocation problem which maximizes containment of the infection. We present a polynomial time solution to the optimal budget allocation problem using Geometric Programming (GP) for an arbitrary weighted and directed contact network and a large class of resource cost functions. We illustrate our approach by designing optimal traffic control strategies to contain an epidemic outbreak that propagates through a real-world air transportation network.