Andreas Stuhlmüller

I am a Ph.D. student in the Department of Brain and Cognitive Sciences at MIT. I work in Josh Tenenbaum's Computational Cognitive Science group and in Noah Goodman's Computation & Cognition lab at Stanford.

The question that drives my research is: How can we better solve our problems? Machines are potentially much better at answering hard questions if such questions can be posed formally. For example, given a formal model that describes how diseases cause symptoms and a formalized prior belief about disease occurrences, machines can reason precisely about what certain observed symptoms imply.

In its various aspects, this question touches on mathematics, computer science, psychology, and philosophy:

Read more about each of these questions in the next section or skip to projects, publications, or links to my doings elsewhere on the web.

Interests

Probabilistic Programming Languages

Probabilistic programs express generative models, i.e., formal descriptions of processes that generate data. To the extent that we can mirror mechanisms “out there in the world” in a formal language, we can pose queries within this language and expect the answer to line up with what happens in the real world. For example, if we can accurately describe how genes give rise to proteins, this helps us in predicting which genetic changes lead to desirable results and which lead to harm.

Traditionally, Bayesian networks have been used to formalize and reason about such processes. However, compared to programming languages in general, this formalism is limited in its expressive power and makes reasoning about generative processes harder by not exposing structure in explicit form. Probabilistic programming languages like Church provide a potentially more useful substrate for computational modeling.

Get started with the interactive Church tutorial.

Universal Inference Algorithms

A useful system for probabilistic programming needs to provide more than the ability to formally describe mechanisms: It must answer questions about such mechanisms, i.e., infer the effects of interventions. In Bayesian terminology, a probabilistic program specifies a prior distribution over return values. Given an intervention on some variable within the program, the goal of inference is to compute the posterior distribution. For example, given a program expressing a prior over diseases and symptoms, we may fix the values of observed symptoms and compute the posterior distribution on diseases.

The problem of efficiently computing the posterior distribution provides a rich set of research directions: How do existing Bayes net inference algorithms like Metropolis-Hastings, importance sampling, and belief propagation generalize to the setting of probabilistic programs? How can we improve upon existing algorithms by making use of additional structure exposed by probabilistic programs? How can compilation techniques like flow analysis and abstract interpretation help? How can the task of inference in a particular program be formalized itself and treated as a (meta) inference problem?

Read my AIStats 2011 paper with David Wingate and Noah Goodman and learn how to turn any programming language into a probabilistic programming language with a universal MCMC inference engine.

Structure of Concepts

What is the structure of thought? The language of thought hypothesis proposes that thoughts are expressions in a mental language, and that thinking consists of syntactic manipulation of these expressions. The probabilistic language of thought hypothesis suggests that these representations have probabilistic meaning. If we treat concepts as stochastic programs and thinking as approximate Bayesian inference, can we predict human behavior in experimental settings better than using other approaches, e.g., prototype theory? If we learn more about the primitives and composition mechanisms used in the brain to represent abstract concepts, can this inform how we construct tools for reasoning?

Read my CogSci 2010 paper with Josh Tenenbaum and Noah Goodman on concept learning experiments and modeling.

Decision Theory

When we attempt to solve any particular problem, we do so because we believe that, overall, this is what we should be doing; we believe that, taking everything we know into account, we prefer to solve this problem over doing anything else. This judgement can be mistaken: We can make errors in our reasoning, fail at introspection, and forget to consider consequences. We attempt to solve a particular decision problem—the problem that defines the correct answer to the question what should be done?—and our solution is only approximately correct.

In the long run, we want to make sure that we find a good solution to this decision problem. Autonomous intelligent systems, if ever built, need to solve this decision problem if the outcome is to be of moral worth. This requires that we understand what exactly this decision problem is, and that our understanding is sufficiently precise that it can guide implementation. The most tractable approach to this seemingly impossible problem comes out of the field of decision theory, understanding notions such as choice and control under determinism to build foundations for the harder part of the problem.

Read my essay on inference and preference for a more detailed motivation of this research direction.

Projects

I wrote Cosh, a Church implementation that is based on exact inference with aggressive sharing of subcomputations.

I run Forest, a community-edited repository for generative models.

I contribute to MIT-Church, the original implementation of the Church probabilistic programming language, and to Bher, a lightweight compiler for Church.

Publications

Inducing Probabilistic Programs by Bayesian Program Merging abstract bibtex
Irvin Hwang, Andreas Stuhlmüller, and Noah D. Goodman (2011)
Technical Report. arXiv:1110.5667 [cs.AI]

This report outlines an approach to learning generative models from data. We express models as probabilistic programs, which allows us to capture abstract patterns within the examples. By choosing our language for programs to be an extension of the algebraic data type of the examples, we can begin with a program that generates all and only the examples. We then introduce greater abstraction, and hence generalization, incrementally to the extent that it improves the posterior probability of the examples given the program. Motivated by previous approaches to model merging and program induction, we search for such explanatory abstractions using program transformations. We consider two types of transformation: Abstraction merges common subexpressions within a program into new functions (a form of anti-unification). Deargumentation simplifies functions by reducing the number of arguments. We demonstrate that this approach finds key patterns in the domain of nested lists, including parameterized sub-functions and stochastic recursion.



      

Nonstandard Interpretations of Probabilistic Programs for Efficient Inference abstract bibtex
David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey M. Siskind (2011)
Advances in Neural Information Processing Systems (NIPS) 24

Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how \emph{nonstandard interpretations} of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals.



      

Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation abstract bibtex
David Wingate, Andreas Stuhlmüller, and Noah D. Goodman (2011)
Proceedings of the 14th international conference on Artificial Intelligence and Statistics

We describe a general method of transforming arbitrary programming languages into probabilistic programming languages with straightforward MCMC inference engines. Random choices in the program are "named" with information about their position in an execution trace; these names are used in conjunction with a database holding values of random variables to implement MCMC inference in the space of execution traces. We encode naming information using lightweight source-to-source compilers. Our method enables us to reuse existing infrastructure (compilers, profilers, etc.) with minimal additional code, implying fast models with low development overhead. We illustrate the technique on two languages, one functional and one imperative: Bher, a compiled version of the Church language which eliminates interpretive overhead of the original MIT-Church implementation, and Stochastic Matlab, a new open-source language.



      

Learning Structured Generative Concepts abstract bibtex
Andreas Stuhlmüller, Joshua B. Tenenbaum, and Noah D. Goodman (2010)
Proceedings of the 32nd Annual Conference of the Cognitive Science Society

Many real world concepts, such as "car", "house", and "tree", are more than simply a collection of features. These objects are richly structured, defined in terms of systems of relations, subparts, and recursive embeddings. We describe an approach to concept representation and learning that attempts to capture such structured objects. This approach builds on recent probabilistic approaches, viewing concepts as generative processes, and on recent rule-based approaches, constructing concepts inductively from a language of thought. Concepts are modeled as probabilistic programs that describe generative processes; these programs are described in a compositional language. In an exploratory concept learning experiment, we investigate human learning from sets of tree-like objects generated by processes that vary in their abstract structure, from simple prototypes to complex recursions. We compare human categorization judgements to predictions of the true generative process as well as a variety of exemplar-based heuristics.



      

Statistical Learning of Compositional Concepts abstract
Noah D. Goodman, Frank Jäkel, Andreas Stuhlmüller, Virginia Savova, and Joshua B. Tenenbaum (2009)
Poster at the 50th Annual Meeting of the Psychonomic Society

Classic laboratory experiments on concept learning used simple stimuli varying on a small number of obvious dimensions, such as shapes varying in size, color, and form. Also, the category structures that participants learned were defined in isolation from other concepts. Here, we explore learning of compositional concepts, defined in terms of generative rules and subpart concepts rather than in terms of fixed perceptual features. We study how compositional concepts can be learned from a small number of examples. Since related concepts can reuse the same subpart concepts, learning one new concept may help in learning another. We model learning in a Bayesian framework, making some assumptions about how concepts can be composed. Learning then amounts to inferring from examples the generative processes and parts underlying a set of novel concepts.

Learning Compositional Representations abstract bibtex
Andreas Stuhlmüller (2009)
Bachelor's thesis at the University of Osnabrück

How could an agent learn and reason in a complexly structured, stochastic world? This problem is at the center of both artificial intelligence and psychology. One candidate answer to this question is that both learning and reasoning can be explained as probabilistic inference over a language-like hypothesis space for generative models. The goal of this thesis is to describe what makes this approach plausible and to demonstrate the learning of generative models from structured observations in a simple world consisting of stochastic, tree-generating programs. In order to make program inference feasible, I derive a new class of adaptive importance sampling algorithms for probabilistic programs that let us compute the likelihood of structured observations under a given generative model. Using this algorithm in combination with Markov Chain Monte Carlo methods, I quantitatively show that we can reliably infer generative models from observations in our demonstration world.



      

Elsewhere

facebook.com/stuhlmueller
Shared photos and location updates.

github.com/stuhlmueller
Open source projects, mostly written in Scheme.

pinboard.in/u:stuhlmueller
Public bookmarks.

aiplayground.org
Infrequently updated blog.

ast@mit.edu

MIT Building 46-4053
77 Massachusetts Ave.
Cambridge, MA 02139