# CMI06 Abstracts

### Prof. Charles Holt

*Learning in Economics Experiments*

Laboratory experiments with financially motivated human subjects may generate dynamic patterns of data that may or may not converge to predictions based on standard equilibrium theories, e.g. the Nash equilibrium. The task of descriptive learning models is to explain these data patterns, both when they converge to equilibrium and when they do not. Since economic markets and games are interactive, the best decisions depend on what players expect others to do, and learning models can be used to track the relationship between observed histories and expectations about the future. Another task is to explain initial decisions when it is not possible to learn from past experience, and learning must be based instead on introspection about how others might behave. This talk provides an introduction to the models of learning and introspection that have proved to be useful in the explanation of laboratory data.

### Prof. William H. Sandholm

*Population Games and Evolutionary Dynamics*

We offer an overview of the theory of population games and evolutionary dynamics, and present a few recent results.

A population game is defined by a finite set of actions and a payoff function for each; payoffs depend continuously on the distribution of agents’ action choices. To avoid the assumption of equilibrium play in this large population context, it is assumed that each agent occasionally receives an opportunity to revise his choice of strategy. At such moments, the agent’s selection of a new strategy is described using a revision protocol, which specifies the conditional rates of switches between each pair of strategies. Over finite time spans, aggregate behavior is well approximated by a deterministic differential equation, the mean dynamic, that is defined by the expected changes in the population’s behavior.

The recent results we consider are these: First, we describe a class of simple, informationally undemanding revision protocols whose mean dynamics have a desirable property: the rest points of these dynamics are precisely the Nash equilibria of the underlying game. Second, we demonstrate that under most evolutionary dynamics, there are simple games under which strictly dominated strategies survive. Thus, evolutionary game theory provides surprisingly little support for a fundamental rationality postulate.

Link to presentation

### Prof. Jeff S. Shamma

*Control theoretic methods for competition and evolution*

Recent work has shown how the introduction of control theoretic methods in multiagent learning can enable qualitative changes in the limiting behaviors of various learning models, specifically with regards to stability of Nash equilibria. This talk presents an overview of these results and discusses work in progress that explores the role of such methods for other types of models, in particular, for exchange markets and single population evolutionary dynamics.

### Prof. Tamer Basar

Multi-Agent Decision Making with Limited Information Exchange

The interaction between information/communication and control (with "control" interpreted in a broader context, including teams and games) has been a dominating research topic for several decades. This interaction is in general a complex one, because considered as separate decision units, each one could help the other to achieve better performance: more information generally leads to better control performance, and a judicious use of control could improve the information content of transmitted messages. These dual roles are not always aligned, however, making sometimes the derivations of optimal solutions to team problems much more challenging than obtaining for example saddle-point solutions to similarly structured games. Regardless of these difficulties, which are inherent to stochastic decision problems with nonclassical information, the common element in all these problems has been to find a satisfactory answer to the question of "what to send", or equivalently "how to shape the information/sensor and control signals" so as to collectively meet a targeted objective.

With the emergence of remote control applications, where the plant-control and control-plant communications are conducted over a heterogeneous network, or applications that involve distributed agents over large networks, some nontraditional constraints have been imposed on designs, prompted by constraints on power usage and limits on available resources. The questions that are now being asked are not only "what to send", but also "when to send", given some constraints on the number of transmissions (which could include sensor signals, control signals, or communication between agents).

After a brief review of the classical paradigm of "what to send", this talk will introduce a mathematical framework wherein also the question of "when to send" can be given a precise meaning, and addressed along with the former. Solutions to these problems involve online dynamic scheduling and offline computation. This is a rich paradigm with relevance not only to remote control but also to multi-agent teams and games.

### Prof. Joao Hespanha

*Network Routing Games*

The basic principle behind the design of the Internet was to utilize massive redundancy to achieve fault-tolerance, but it is now widely recognized that this does not necessarily result in security against malicious attacks. We show how game theory can be used to construct routing algorithms that improve the robustness of computer networks with respect to attacks against switches and links. The routing algorithms obtained have additional benefits in the context of sensor networks and networked control systems. In particular, they provide load balancing and can therefore prolong the useful life of energy-constrained networks. They can also improve the stability of control systems for which the feedback loop is closed through a faulty communication network.

### Prof. Michael Safonov

Adaptation and Learning without Assumptions

The problem of modeling uncertainties that may not conform to assumed prior bounds is considered from an adaptive control perspective, but without the standard assumptions of adaptive control. A supervisory control architecture is employed, based on the data-driven logic of unfalsification. The supervisory controller modifies or replaces controllers when sensor data falsifies the hypothesis that the currently active controller is insufficiently robust to satisfy performance goals. Unlike other adaptive systems that are subject to model-mismatch induced instability, unfalsified adaptive control systems respond rapidly and precisely with guaranteed convergence. No assumptions about the plant are required beyond the availability of at least one candidate controller capable of robustly meeting performance goals. Applications include aircraft stability augmentation systems, highly maneuverable aircraft design, missile guidance systems, and precision pointing and tracking systems. Such designs can reliably compensate for equipment failures and changing circumstances.

### Prof. John Tsitsiklis

Motivated by recent interest in resource allocation mechanisms in communication networks and power systems, we develop VCG-like mechanisms for environments where the players have concave utility functions, and the resource constraints can be represented through convex inequality constraints; multicommodity flow problems are a prime example. Unlike VCG mechanisms that require each player to communicate an entire utility function, our mechanisms only require each player to communicate a single scalar quantity. Despite the limited communication, we establish the existence of an efficient Nash equilibrium. Under some further assumptions, we also establish that all Nash equilibria are efficient. This is joint work with Ramesh Johari. Link to additional references

### Prof. Vincent Blondel

Consider the following simple multi-agents system: A finite number of agents move in the plane with unit speed and with possibly different headings. At every iteration step the agents observe those agents that are within their observation range and update their heading by forming the vector sum of the headings they observe. Simulations reported more than a decade ago by Vicsek et al. show that for this model the headings of the agents all converge to the same limit when the initial agent density is sufficiently large.

Convergence properties of long products of stochastic matrices have been used to explain this observed phenomenon. When the sequence of matrices satisfy properties that correspond to the constraint that agents communicate with each other sufficiently often, headings consensus is obtained. This argument does however not take into account the explicit dependence of the matrix products on the actual positions of the agents in the plane and there is in fact, more than a decade after its introduction and despite intense interest, no formal proof of consensus for Vicsek's model.

In this talk, we will briefly review the stochastic matrix argument given above and we will then introduce a simpler but related multi-agent model that may be amenable to a full convergence analysis. The one dimensional version of the model is equivalent to an opinion dynamics model used in social science. It has intriguing properties that we will illustrate by a few numerical simulations.

### Prof. Roy Radner

*Learning with Costly and Bounded Rationality*

This talk will review some of the problems that arise when learning takes place in the presence of costly and bounded rationality. As Marschak and Simon pointed out, the activities of decision-making are usually costly, and one should take account of such costs in developing rational decision procedures. In many cases, the resulting problem can still be formulated as one of optimization, possibly subject to constraints, and thus remains within the Savage Paradigm, suitably extended. Recent examples include the work of Van Zandt, Radner, and others, on the organization of information-processing and decision-making in teams, taking account of the direct costs of information-processing and the indirect costs of delayed, and hence partially obsolete, decisions. In more severe cases of bounded rationality, the Savage Paradigm is no longer fully applicable, e.g., in the context of vagueness faced in formulating the model required by the Savage Paradigm. I shall discuss an example of this in a preliminary “Bayesian” analysis of model revision. Finally, I shall discuss the open question of whether there is a useful concept of “rationality” in the presence of the “failure of logical omniscience,” i.e., our inability to know all the logical consequences of what we know.

Link to background material.

### Prof. Rajiv Maheswaran

To create a system of intelligent agents that perform effectively in the "real world", one must consider bounded rationality, typically manifested in limits on processing and memory. These limitations have significant repercussions in large-scale or uncertain domains. In addressing uncertainty, formalisms such as Partially Observable Markov Decision Processes yield methods that are often functionally intractable or have arbitrary solution quality. We discuss how one can use structure and approximation to reduce the time to obtain a solution, while maintaining a desired level of quality.

### Rahul Jain

The empirical process theory in Statistics has its origins in the classical Glivenko-Cantelli theorem. This was generalized by Vapnik and Chervonenkis to obtain uniform laws of large numbers (ULLN) for measurable boolean functions and i.i.d. processes. Further extensions to bounded real-valued functions and dependent processes were provided by Pollard, Haussler, Vidyasagar and others. Such results essentially depend on the epsilon-entropy of a function class, introduced by Kolmogorov and Tihomirov. At the same time, recent concentration of measure inequalities obtained by Talagrand and others have provided an impetus in developing a non-asymptotic theory of statistics. These have thus yielded tighter bounds on non-asymptotic rates of convergence for uniform laws of large numbers as well.

We develop similar results for Markov decision processes and games. Specifically, we find sufficient conditions for uniform convergence of estimates of expected value functions from computer simulations. We also obtain non-asymptotic rates of convergence. We provide results for the partially observed case as well, and for both discounted and average-reward criterion. Surprisingly, we find that how sample trajectories of a Markov process are obtained from simulation matters for uniform convergence.

Such results have implications for learning and control in many ways. System identification algorithms for controlled Markov processes can be developed based on such ULLN results. Such a framework also offers the potential of developing simulation-based optimal control algorithms when dynamic programming becomes computationally infeasible. Thus, it is particularly relevant for Multi-agent problems modelled as partially-observed Markov games where no general solution methods are known.

This is joint work with Pravin Varaiya. Paper can be found at http://www.eecs.berkeley.edu/~rjain/papers

### Dr. Yu-Han Chang

Multi-agent learning research in the AI community has tended to focus on either cooperative models or learning equilibrium strategies in competitive settings. Given selfish agents whose goal is simply to maximize personal rewards, possibly at the expense of less adept agents, equilibrium strategies may no longer be the most desirable solution. In order to design agents that function in these systems, we need methods that allow our agents to autonomously and continuously learn and adapt to the changing environments they find themselves in. We also need to be clear about the metrics that we use to evaluate our agents' performance. I will discuss hedged learning, which combines regret-minimizing algorithms with learning techniques. This allows an agent to learn about and exploit new environments, while remaining safe from being exploited in return. I will also touch upon the use of reward filtering, which may be useful in cooperative situations where agent communication is limited and yet the agents seek to optimize some global reward signal, and provide some results from applying this work to a mobile ad-hoc networking domain.