# About My Research

## Dynamic Programming and Markov Decision Processes

When I was a graduate student I was asked to teach a course in operations research. This was intended as training for the Society of Actuaries exam on this subject. As a result I became fascinated (even obsessed) with dynamic programming, and I ended up writing my thesis on the subject. It was based on a simple problem, referred to as The Fitzwilliam Street Crossing Problem. This was posed and solved by E.M. Wilkinson, in a chapter from the wonderful volume Stochastic Geometry (1974), edited by E. F. Harding and D. G. Kendall. Basically, you start at one street corner, then walk down the street about one block to the street corner on the opposite side. The shortest path would be a straight line, except that if a car appears, you must proceed immediately to the sidewalk. Assume the car appears after an exponential waiting time. What is the optimal path, that is, the one which minimizes the total expected walking distance? As you can guess, the optimal path is not a straight line, but one which veers towards the sidewalk on the other side, in anticipation of a car.

My job was to generalize this problem to a multistage decision process (ie, one which allows multiple perturbations from a planned trajectory). To make a long story short, it was a wonderful experience.

Of course, one wants to publish something from a thesis. By the time I set myself to this task, my thinking on the problem evolved to the point at which the problem became just another exercise in dynamic programming. You have some goal that is met by a suitable trajectory (shortest path, perhaps). You have in your toolkit a large variety of model railroad track sections. You select one, and attempt to follow the track. Either you reach the end of the track, or you are randomly forced off the trajectory in some way to a new location. In either case, you select a new track section and continue. That is pretty much a dynamic programming model. The purpose of Almudevar (2001) is to prove that this idea works. The paper seems to have had some influence, and I count 42 citations (not including my own). I think that's a lot for a paper of this type.

My interest in dynamic programming took a new turn a few years later. I became interested in stochastic programming (the Robbins-Monro algorithm (RMA)). Without going into too much detail, I somehow figured out that the RMA was a stochastic contraction process. I mean, it is an iterative fixed-point algorithm with two interesting features: (a) the contraction rate approaches one; (b) after a conventional fixed point iteration, noise is added to the output. But it still converges to the fixed point because (a) the contraction rate approaches one slowly enough; and (b) the variance of the noise approaches zero quickly enough. In fact, these two conditions are directly related to the two regularity conditions originally given by Robbin-Monro in 1952. (Actually, one of these conditions can be relaxed, in that the variance of the noise merely needs to vanish. See Almudevar (2014).)

So, if the RMA converges, then a "stochastic contraction process" for which the contraction rate remains bounded below one should also converge. It has been known for some time that if the noise vanishes then such a fixed-point algorithm will converge to the true fixed point (Ostrowski 1964).

However, much more can be said. Suppose we have a fixed-point algorithm X[n+1] = T X[n]. If T is a contractive operator, the algorithm converges to fixed point X* = TX*. Suppose we define an approximate fixed point algorithm X[n+1] = T[n]X[n], where T[n] is an approximation of T. We can always write this X[n+1] = T X[n] + U[n+1], where U[n] represents noise. In Almudevar (2008), it was shown under quite general conditions that X[n] converges to X* at the same rate that U[n] converges to zero, unless U[n] converges to zero faster than the exact version of the algorithm would normally converge, in which case that convergence rate still holds.

My experience is that this result surprises people when explained, but I don't think it is widely applied (this paper has far fewer citations than Almudevar (2001), but I think it is a much more interesting paper). First, it is able to reproduce earlier results. For example it is able not only to verify rather directly convergence of the RMA, but also to give rates of convergence without too much more trouble. It also reproduces some of the basic theory of non-expansive fixed-point algorithms.

So, I eventually found something new it can do. Suppose we would like to implement exact fixed-point algorithm X[n+1] = T X[n], but we can only evaluate T approximately. Then consider a coarse-to-fine fixed-point algorithm X[n+1] = T[n] X[n], where T[n] defines a sequence of approximations of T. Suppose T[n] must be evaluated using some high-dimensional integral, which is approximated numerically on a grid. If we refine the grid, our calculation is more accurate, but also more time consuming. So it makes sense to start the algorithm with a coarse approximation (that is, using a coarse grid) then to refine the approximation as the iterations proceed.

In this case, we need to measure the convergence rate not with respect to iteration count, but with respect to total cumulative computational effort, the rate of which increases with iteration count.

Here is where the main result of Almudevar (2008) comes in handy. We can write the fixed-point algorithm as X[n+1] = T X[n] + U[n+1]. We control the rate at which U[n] approaches zero by the rate at which we refine our approximation of T. For example, we can relate the magnitude of U[n] directly to any approximating grid size. Then, from Almudevar (2008), we know that the approximation error of X[n] is of the same order of magnitude as U[n], unless U[n] converges to zero faster than the exact algorithm. Therefore, it will be strictly suboptimal to adopt a refinement schedule that would force U[n] to converge to zero at faster rate than that of the exact algorithm, that is, the contraction rate of T. The tricky part is then to show that it would be suboptimal to allow U[n] to converge to zero more slowly than the contraction rate of T, but it can be done. That settles the matter. It is optimal to select a refinement schedule that forces U[n] to approach zero at the same rate as the exact algorithm (or something near). This is reported in Almudevar & Arruda (2012). To be honest, this is probably the most difficult paper, mathematically speaking, I ever wrote.

This simple result is able to reproduce a well-known analysis of a specific coarse-to-fine Markov decision process model (Chow & Tsitsiklis 1991), but it is much simpler to implement, more transparent in its properties, and is of much wider applicability. Almudevar & Arruda (2012) has 20 citations at this point, but I have yet to see any direct statement of its main result, that is, that an optimal rate of refinement is known, and easily implemented.

## A selection of my publications in this field

Almudevar A (2001) A dynamic programming algorithm for the optimal control of piecewise deterministic Markov processes. SIAM Journal on Control and Optimization. 40:525-539.

Almudevar A (2008) Approximate fixed point iteration with an application to infinite-horizon Markov decision processes. SIAM Journal on Control and Optimization. 47:2303-2347.

Almudevar A, Arruda EF (2012) Optimal approximation schedules for a class of iterative algorithms, with an application to multigrid value iteration. IEEE Transactions on Automatic Control. 57(12): 3132-3146.

Arruda EF, Ourique F, LaCombe J, Almudevar A (2013) Accelerating the convergence of value iteration by using partial transition functions. European Journal of Operational Research. 229(1):190-198.

Arruda E, Ourique F, Almudevar A, Silva R (2013) On cost based algorithm selection for problem solving. American Journal of Operations Research, 3(5):431-438.