``Introduction to the Numerical Solution of Markov Chains''

William J. Stewart

Billy's Home Page


Preface,   Acknowledgements
Table of Contents,   Ordering Information

Organization

The book is organized into 10 chapters, as follows. In Chapter 1 the basic definitions, properties, and theorems of discrete-time and continuous-time Markov chains are presented. These are used throughout the remainder of the book and are collected together in one place for ease of reference. Additionally, this chapter provides sufficient background on queueing networks to permit readers whose interests lie mainly in numerical solution procedures to understand how and why queueing network models give rise to large-scale Markov chains.

Chapters 2 through 4 deal respectively with the three major classes of solution procedures, namely, direct methods, iterative methods, and projection methods. Direct methods are seen to be variants of the ubiquous Gaussian elimination algorithm. When these are applied to Markov chains, provision must be made to handle the zero pivot that results from the singularity of the system, and different approaches are examined. Implementation considerations, with special emphasis on compact storage schemes and the GTH variant, are discussed. Chapter 2 closes with a discussion of stability, conditioning, and error analysis as they pertain to the application of direct methods to Markov chain problems.

Chapter 3 examines the major single-vector iterative methods: the power method, Jacobi, Gauss--Seidel, and successive overrelaxation (SOR). Block variants as well as preconditioning procedures are given. Like the previous chapter, Chapter 3 also provides implementation details. It terminates with a discussion of known convergence results.

The application of projection methods to the solution of Markov chains is still relatively new. However, these methods have been used with much success in other domains and have the potential of being very well suited to solving Markov chain problems. They are the subject of Chapter 4. Given that this material is likely to be new to many readers, the basic projection processes are examined in some detail before consideration is given to specific algorithms. These algorithms include simultaneous iteration and the methods of Arnoldi and GMRES. Due to the large measure of success that this latter algorithm has achieved, a detailed description of its implementation is given. Chapter 5 closes with a discussion of the methods of Lanczos and Conjugate Gradients and includes implementation considerations.

The three chapters that follow deal with Markov chains whose underlying transition matrices possess a special structure. When this matrix is block upper Hessenberg (or even better, block tridiagonal), it may be possible to use block recursive methods. This provides the substance of Chapter 5. Great care must be taken with recursive methods, because they have been known to give very unusual answers when used inappropriately. The stability of recursive algorithms is considered in detail. We also look at the matrix geometric approach made popular by Neuts, provide the most recent procedures for computing his famous $R$ matrix, and consider instances in which this matrix may be computed explicitly.

Stochastic matrices that are nearly completely decomposable appear relatively frequently in Markov chain modelling and in general create enormous difficulties in computing their stationary probability distributions. They are considered in Chapter 6, where the problem is posed, the theoretical concepts of stochastic complementation introduced, and the only type of method found to bring any satisfaction --- iterative aggregation/disaggregation --- is analyzed in detail. The major methods and several variants are considered, and some results relating to convergence properties and behavior are provided.

In Chapter 7, we consider the case of Markov chains whose transition matrices possess a periodic or cyclic structure. We begin by considering the saving that this affords to the standard direct and iterative methods; then we proceed to consider methods specifically designed for matrices having this periodic property. In particular, we describe reduced schemes that work on only one portion of the matrix, but that may be used to compute the complete solution very efficiently. Also, stochastic matrices that are cyclic possess certain properties that allow us to develop block SOR--type methods for which close to optimal relaxation factors can be computed rather easily.

The methods in Chapters 2 through 7 all concern the computation of stationary probabilities. The computation of transient solutions is considered in Chapter 8. The method most often used in this context is that of uniformization, and it is with this method that the chapter begins. This is followed by an examination of methods that are applicable when the state space is small. A substantial part of Chapter 8 is devoted to the numerical solution of systems of linear ordinary differential equations (ODEs), for it is well known that the probability distribution of a Markov chain at any time $t$ may be defined by such a system. This approach is treated in considerable detail, because the numerical solution of ODEs is perhaps not as well exploited in the Markov chain community as are numerical linear algebraic techniques for stationary distributions. Finally, we conclude this chapter with a novel approach for obtaining transient solutions, one that shows enormous promise, a method based on Krylov subspaces that exploits the methods previously given for small state spaces.

Stochastic automata networks are the subject of Chapter 9. Among all the modelling paradigms that give rise to Markov chains, this is perhaps the most general, and it has the potential of becoming the basis for future developments in software geared to the automatic generation and solution of Markov chain models. After discussing the underlying concepts of stochastic automata networks, the effects of both synchronizing events and functional transitions on the complexity of computing solutions are examined. It is shown how the stationary probability vector can be obtained from a compact tensor description of the network. However, the constraints imposed by this very compact description are severe and restrict the choice of possible solution method.

The final chapter of this book provides information on some of the software that is currently available for developing, manipulating, and solving Markov chain models, as well as software that may be useful in developing specific methods, such as that found in libraries of numerical linear algebraic routines.

It has not been possible to include everything that I would have wished to include into a single, reasonably sized book. For example, I have omitted material on bounding methods and techniques for state space reduction. These are not the only topics omitted. Although I would be the first to admit that the content is biased by my personal experiences and preferences, these and other topics have been omitted, not because I think they are unimportant, but rather because of limitations of space and a desire to bring the book to market in a timely fashion. If I were to continue to extend the book by incorporating all the new and important results that become available almost on a daily basis, this book would never appear.

Billy's Home Page