x "^^^^^^ . ' (l+af-s^s I n(k)xR(k) k6 S It is also straightforward to show that PAGE 44 35 Zn(k)xR(k)= ^_ _ keS (l+a)jSkS I I n(k)xR(k)xa"^-''l Thus, PjCi I n) can be expressed as P,(i|n) = I nQ) X R(j) X a"<''J> JeS I Z n(k) X R(k) X a''^''^ j Sk S In(j)xR(j)xa"^''J> jgs (1+a)'Z n(k)xR(k)' k S Eq.4.14 and PjCm | n) is multinomially distributed as follows M! Vm,nG S':P,(m|n) = X nPadln)' n ni(l)! ies i S m(i) (M^ yirxj M X n P2(i I n)' m(i) i£ S Eq. 4.15 1 n i:na)xR(j)a' jeS naj)!""® mj (l+a)^'-is In(k)xR(k) k S The transition probability matrix of the Markov chain representing the two-operator algorithm is composed of the array of conditional probabilities defined by Eq. 4.15, i.e. P=[P2(m|H)]. Eq.4.16 Since the elements of P depend on a (and hence by Eq. 4. 1 2 on Pn,(k)), the two-operator Markov chain is generally not time-homogeneous. It is time-homogeneous if the mutation probability is fixed. Eq. 4.14-4.16 for the two-operator simple genetic algorithm are analogous to Eq. 4.2-4.6 for the one-operator variant except that PjCi | n) is strictly greater than zero for all n e S'. Thus, the two-operator analog of Eq. 4.5 is not required. Also PAGE 45 36 and limP2(i|n) = Pi(i|n) lim PjCm I n) = P,(m I n). Eq. 4.17 The rows of the state transition matrix corresponding to the one-operator absorbing states have an especially simple form. Let i^e S be the solution represented in the absorbing state n^ e S^'. Then, from Eq. 4.14, H(i.iJ P2(i|nJ = M X R(iA) X a (l+a)^xMxR(iJ Eq. 4.18 .H(i.iA) a (l+af Thus, from Eq. 4.15, P2(m|nJ = /"mV' Sm(i)xH(i,i.) [^mj (l+a)"^ Since the reward function, R, is strictly positive by hypothesis, and since Vi,j S : < H(i,j) < L, it follows that for a in the range < a < 1, then a'Z nO) X RG) < S n(j) x RQ) x a"<'-^> < I n(j) x RQ), je S j S j S Eq.4.19 and consequently from Eq. 4.14 that Vie S,Vn S' : Using Eq. 4.20 in Eq. 4.15 yields ^ a > 1 -i-a

0 (2)qir=l (3)^ = ^ . Since the stationary distribution is by definition a left eigenvector of the state transition matrix (Definition A 10), it follows from Eq. 4.15 and 4.16 that the asymptotic state probability distribution of the time-homogeneous two-operator algorithm is completely determined by the objective function and the algorithm parameters. It is independent of the starting state, mo. 4.3.3 A Three-Operator Algorithm (Reproduction. Mutation and Crossover) The three-operator simple genetic algorithm corresponds to the case Vk:Qk = (p^(k), p^(k)) with both Pm(k) and Pc(k) nonzero. Results analogous to Eq. 4.14-4.21 for the two-operator case are obtainable by defining a new function which is similar in character to the Hamming distance function employed in Section 4.3.2 for the two-operator case. This subsection completes that generalization. The result only reflects the crossover operation implicitly, however it permits some very significant conclusions concerning bounding values of the three operator conditional probabilities. The new function, I(i,j,k,s), is defined over an ordered quadruple (i,j,k, s) where i,j,k S and where s {0, 1,---,L} is a bit-string location. The states i,j e S represent respectively the first and second parent strings selected at a particular crossover opportunity and k 6 S represents a possible descendent string. The bit-string location s is the location randomly selected by the crossover operator, and normally it is unifomily distributed over its range. Thus, I is defined on S xS xS x{(), 1,2, ,L} and it takes on values selected from {0,1 } depending upon whether the indicated crossover operation is or is not consistent. That is, I assumes the value one if the bit-string k is produced by crossing the bit-strings i and j at the site s, and zero otherwise.
PAGE 47
38 In terms of this crossover operator function, the conditional probability of producing, via reproduction and crossover, a solution k e S given a current population described by n e S' is P^'Ck I H) = p, X I I P,(i U) X P,(j I H) X ^ I I(i, j, k, s) i Sj S L s +(l-pJxP,(k|H) Eq.4.22 1 *=L _ _ = PeX-xI I lPi(i|n)xP,(j|n)xI(i,j,k,s) L, ie Sje Ss=l +(l-pJxP,(k|H) where Pi(i | n) is as defined as in Eq. 4.2 and where Pj'O | n) refers to the two-operator algorithm consisting of reproduction and crossover without mutation. This result assumes uniformly distributed crossover site selection. The array of conditional probabilities [P2'(i I n)] plays a role in the three-operator simple genetic algorithm very analogous to the role played by the array [P,(i | n)] in the two-operator variant. In fact, the [P2'(i I n)] array can be used as counterparts of Eq. 4.2 to develop results exactly analogous to Eq. 4.3 and Eq. 4.6. Further, for n e Sa', Eq. 4.22 reduces to P2'(k|Hj = P,(k|Hj, Eq.4.23 and consequently this (fictitious) two-operator algorithm (reproduction and crossover) demonstrates the same sort of absorbing state behavior as the one-operator algorithm. From Eq. 4.22, the three-operator conditional probabilities and state transition matrix are expressible as P3(i I H) = ^ X Z a"^''J) x P,'Q I H), Eq. 4.24 (1-ha) jeS
PAGE 48
39 P3(m|n) = M! n ni(i)! ieS X n P3(i I n)" Eq. 4.25 i S ( M X n PjCi I nr<'^ my ie S and P = [P3(m|n)]. Eq.4.26 These results are developed in a fashion analogous to Eq. 4.14-4.16. From them, it follows that the three-operator Markov chain is time-homogeneous if both the mutation and crossover probabilities are fixed. In general it is not time-homogeneous. From Eq. 4.22, 4.24 and 4.25, it follows that lim P3(i I n) = Pj'O | n) a->0* and lim P3(m I n) = P^'Cm | n). a->0* Also, from Eq. 4.23-4.25, the three-operator analogs of Eq. 4.18-19 apply H(.,i^) P3(i I "a) = a (1+ar P3(m|nJ = 'M la \yc\) (H-a) ML Additionally, since a^ I P^'G I n) < I P^'O I n) x a""'^' < I P^'O I n), je S j S j S the three -operator analogs of Eq. 4.20-21 follow from Eq. 4.24-25, i.e. 1 ^Vie S,Vn6 S': , 1 -i-a 0* limit of P^^ ' are obtained by substituting the one-operator results in Eq. 4.2-5 into Eq. 7.7. If the three-operator case is under consideration, then Eq. 4.22 and Eq. 4.24,25 are employed. In the following, the two-operator notation is employed. Let Q' be generated from the a > 0* limit of P^ ' by replacing row m^ with the row whose elements are given by VEgS'-S/ + {m^}:Q'(m|Ej = ^,_|^^^ >Q. Eq.7.12 Thus, the row sum of row m^ in Q' is 1. Since all remaining rows of Q' are identical to those of the a > C limit of P^ ', and consequently have row sum 1 by Eq. 7.7, Q' is a stochastic matrix. Additionally, it satisfies both 0< lim P-^'

0, the fictitious Markov chain is both aperiodic (Definition A9, Theorem A2) and primitive (Definition Bl, Theorem Bl) provided that it is irreducible (Definitions A7 and A8, Theorem Al). Thus, primitivity is established by demonstrating that every state me S' S^^' -I{m^} is accessible in some finite number of transitions from every state n S' S^' -i{ni,^}. Since all states in S'-SA' + {niA} are accessible in one transition from m^ (Eq. 7.12), it is
PAGE 91
82 sufficient to demonstrate that m^ is accessible in some finite number of transitions from every state n e S' Sa'. Let iA e S be the bit-string represented in m^, let n e S' S^' and let i, e S be selected such that n(i,) > and H(i,, i^) < H(i, i^) for all i represented in n. Then, two cases must be examined. In case (1), i, = i^ while i, ^ i^ for case (2). If ij = Ia, it follows from Eq. 7.7 and the construction of Q' that Q'K I n) = lim P-^(Ha I n) = lim P^CHa | n) \M , ^riimP^CiAln)]" = P,(iA|nf = P,(i,|Hr>0 and consequently mA is accessible from n in 1 transition. Otherwise 3i2S(i,)3H(i2,iA) = H(iiA)-l and further if n, e Sa' is the one-operator absorbing state defined by the condition ni(ii) = M while n,2 S(n,)' is the adjacent nonabsorbing state defined by n^Oi) = M1, ni2(i2) = 1, then from Eq. 7.7 and the construction of Q' Q'(ni2 1 n) = lim P2(Hi2 1 n) + lim PjCH, | H) = P,(n,j|n)+-P,(i,|H) L = rP.(n,|n) L" = f[P,(i,|n)f >0. Thus, ni2 is accessible from n in one transition. If ij = Ia, then by the case (1) argument iba is accessible in one additional transition. Otherwise, the case (2) argument is repeated for some i3 E S(i2) 3 H(i3, Ia) = H(i2, Ia) 1 = H(i., Ia) 2.
PAGE 92
83 This procedure necessarily terminates with H(i,,iA) + 1 applications and the corresponding state space trajectory is executed with nonzero probability. From the foregoing argument, it follows that state m^ is accessible in some finite number of transitions from every state n e S' S^', and thus that Q' is primitive. Then, since both 0< lim P-^'

0* counterpart of Proposition 6.9, is a consequence. Proposition 7.4: The value of the determinant lim [ P^^' 1' | satisfies Vm^E S^': (1) lim p '-I'Uo I A I a-»0* (2) the algebraic sign of lim | P^jJ ~ I' | is (-1) ' _ T' I : / i\N'-N + l 'A The conditions asserted in Proposition 7.4 ensure that substitution into Eq. 7.9 and 7.1 1 yields a determinate form in the a > iT limit. Propositions 7.5 and 7.6 below represent the limiting forms, and consequently are respectively the limiting counterparts of Propositions 6.7 and 6.8. Proposition 7.5: The components of lim q exist and can be expressed in the form a-»0' lim I p^;1' I a-»0* , m = m^ e S^ lim q(m) = ^ a-»0* _1 iim|P5;-r mS'-S^'
PAGE 93
84 Proposition 7.6: The components of lim q exist and can be expressed alternatively as a-»0* lim{|P(mJ'-r|-|PE;-i' liin q(m) = qo(m) = a-»0* _Z lim||P(nJ'-r|-|P5;-I m = m^e S^' meS'-S^' An immediate consequence of Propositions 7.4 and 7.5 is strict positivity of the zero mutation probability limiting stationary distribution components for all absorbing state rows. That is, Vm^ e S^' : qo(mA) > 0The argument is analogous to that at the conclusion of Section 6.3 concerning strict positivity of all stationary distribution components when a > 0. This result is anticipated by the simulation results in Section 5.5. A consequence is that the required limiting behavior for direct application of the simulated annealing convergence theory to the genetic algorithm model does not follow. However, the results displayed in Section 5.5 and developments produced in Section 9.3 suggest that the limiting distribution can be made arbitrarily close to the desired limiting behavior. Since the a > 0^ limit of the stationary distribution exists, the definition of q can be extended to include the point a = 0. That is qJa=o = qo= lim Qa a-»0* where the values of the required limits are provided by Proposition 7.5. Proposition 7.7 below follows from this extended definition of q^ and Proposition 7.2. Proposition 7.7: For all a > 0, the components of q^ are continuous rational functions of the independent variable a.
PAGE 94
85 Proposition 7.3, which concerns the first derivative of q^, can also be extended to include the limiting case. The extension requires easily obtainable counterparts of Eq. 7.1-3 developed for | Ph^' 1' | and Eq. 7.9. The Eq. 7.1 counterpart is (l+a)^'-^|PH;-i'l = es,(a)', Eq.7.13 and that for Eq. 7.2 is Id^^iay + Oia) 7- ;r ni = m. gS.' 0(a) +0(a) ^^^j^ qa(m) Q(«) ;;:.c._c' e(a)' + 0(a) ^^ where 0(a)' is the polynomial counterpart (summed over n^ e S^') of 0(a) in Eq. 7.2. Differentiating Eq. 7.14 with respect to a yields a rational function with denominator polynomial [0(a)' + 0(a)]^ whose a -^ 0"^ limit is nonzero by Proposition 7.4, Eq. 7.13 and the definition of 0(a)'. Proposition 7.8 below follows from Proposition 7.3 and these observations. Proposition 7.8: The components of the first derivative of q with respect to a possess limits as a ^ 0^ Thus, a zero mutation probability limit exists for the time-homogeneous two and three-operator algorithm variants. The limit is represented by Propositions 7.5 and 7.6. Further, Propositions 7.7 and 7.8 establish some useful ancillary results concerning the stationary distribution behavior at the point a = 0. These latter results are employed in the following section in establishing strong ergodicity of the inhomogeneous genetic algorithm Markov chain. Propositions 7.5 and 7.6 are used in Section 9 to develop a mcthcxiology for representing the stationary distribution limit.
PAGE 95
SECTION 8 A MONOTONIC MUTATION PROBABILITY ERGODICITY BOUND 8.1 Overview The annealing schedule bounds for the simulated annealing algorithm, which are reviewed in Section 2.4.2, are derived by requiring that the nonstationary Markov chain which represents the algorithm be strongly ergodic (Definition A 13) and then deducing a monotonic lower bound on the algorithm control parameter. The methodology consists of demonstrating that the time-homogeneous Markov chain corresponding to every positive algorithm control parameter value possesses a stationary distribution, that the sequence of stationary distributions corresponding to any sequence of positive control parameter values converges to a limiting distribution if the control parameter sequence converges to zero, and then employing Definitions Al 1-A13 and Theorems A5-A7 to deduce a sufficient condition (the annealing schedule lower bound) to guarantee that the nonstationary algorithm achieves the limiting distribution (i.e. strong ergodicity). The model development in Section 4 demonstrates that for all mutation probability values in the range < p^ < 1 , the Markov chain representing either the two or threeoperator time-homogeneous simple genetic algorithm possesses a stationary distribution. Section 7 demonstrates that the stationary distribution approaches a limit as the mutation probability parameter approaches zero. This section proposes and then verifies a monotone decreasing lower bound on the mutation probability sequence of the nonstationary genetic algorithm Markov chain which is sufficient to ensure strong ergodicity. 86
PAGE 96
87 8.2 A Weak Ergodicity Bound The following paragraphs propose and then verify a mutation probability parameter bound sufficient to ensure that the Markov chain of the corresponding nonstationary simple genetic algorithm is weakly ergodic (Definition All). The bound applies to both the two and three-operator algorithms, and it appears in Proposition 8.1 below. Proposition 8.1: The mutation probability bound given by pjk)>^k"^ is sufficient to ensure weak ergodicity of the corresponding nonstationary (two or threeoperator) simple genetic algorithm Markov chain. This result is established by using the lower bounds on the two and three-operator conditional probabilities in Eq. 4.21 and 4.31 with Definitions All and A 12 and Theorems A5 and A6. Applying the lower bound in Eq. 4.21 and 4.31 to T,() of Definition A12 and Theorem A5 yields 'C,(P) = 1 min Z min(P(m | n,),P(m | n^)) Thus,

*r(i,v) otherwise which can be rewritten as Eq.9.19 P^(w|v) = Jl w i S nw(i)!x(n[r(i,v)!]P(i|v) w(i)-i r(i,v) otherwise Further, by noting that the factor
PAGE 114
105 'M M! n w(i)! i S nr(i,v)!(w(i)-r(i,v))! i S n w(i)! i S n w(i)! i S nr(i,v)!(w(i)-r(i,v))! ie S Vie S:w(i)>r(i,v) M! nr(i,v)!(w(i)-r(i,v))! i S is a multinomial coefficient and designating it (via straightforward generalization of the convention introduced in Eq. 4.4) by M! ^M^ y^,r,, Eq. 9.19 simplifies to n(w(i)-r(i,v))!r(i,v)! i S ^M^ Vie S:w(i)>r(i,v) otherwise Eq. 9.20 P^)(w|v)= -n[r(i,v)!P(i|v)*<""<''^. yW.rvy i S Eq. 9.21 If row V is undifferentiated (i.e. || r;;|| 0), then Eq. 9.21 becomes P^(w I v) = M w nP(i|v)^'> = P(w|v). i S It is noted in passing that if M < N = 2^ then it follows from Eq. 9.20 that either r M ^ vw,r^y or Eq. 9.22 3w'e S'3 In the latter case, it is also true that w' ^M^ V^'^vy fM^ vw J > ^M^ vWy
PAGE 115
106 The enabling condition imposes no practical limitation because any algorithm with M > N could be effectively supplanted by exhaustive search over S. Since I P' I k^ includes every row of P' which is differentiated to nonzero order (e.g. Vn e S' 9 II rjlj > 0), it follows from Eq. 9.18, Eq. 9.20 and Eq. 9.21 that any row n for which II r^ll > M introduces an all zero row into I P ' I k^, making both | P ' | k^ = and C(m,r) = 0. Therefore, || r^H < M represents a suitable upper bound on || r^H for n v^ m. This bound, along with the previously established condition || r^|| = 0, implies || r|| < (N' 1) X M and permits the following revision of the Eq. 9. 17 definition of C(m,r) C(S,;) = ^i ^^ II "-J = 0, II r;|| < M for n * m otherwise r! Further, the conditions in this result can be expressed in terms of S", yielding C(m,r) = (_l)N.-.-.j|pfi,^l a=l r! r= 0,r5e S"-{0}forn?^m Eq. 9.23 otherwise At a= 1, using Eq. 9.6 in Eq. 9.21 yields M V^.Tvy P^(w|v)L..= --r n[r(i,v)!(l/2r""''] IE S Eq. 9.24 -(1/2^) n[r(i,v)!]. ^W,r-^ ieS Thus, every element in row v e K^ of | P | k_ includes the constant factor M-|r-| (1/2^) n[r(i,v)!]. ieS Substituting Eq. 9.24 into Eq. 9.18 and collecting these common row factors outside the determinant yields
PAGE 116
107 IP^I J =(l/2'-r^'^""xLn nr(i,v)!" ( M "j f M ^ (" M l"-''"sy M ^ r M ^ f V ^ M W "i.ri V V M ' M ' V ^ "ly ^ M ^ n2.rj r M ^ ^y Also, since r= = M "i -n.; M ^ vw,rs^ VW,Oy M ^ M ^ M r M V and since O^i, v)!=r!, W y V e Ki e S \" \y^ =(1/2 ) xr!x T"J a=l ^M^ (yC\ Tm^ m r M ni,rV ">y and substitution into Eq. 9.23 yields v">y ( M ^ "l.^H y V"2y M n2'rH, ^ M ^ ni,r-
PAGE 117
108 C(m,r) = (-1) x(l/2) Eq. 9.25 M ^ f M \ { M ^ m,rn,,r^"-^v ^M^ f M ^ f M ^ ^"2,\; ( M ^ ' M ^ m,rn,,rn^.rM nv,rNote that the condition r^^ = is implicitly asserted in this result by the form of the first row of the combinatorial determinant, and that the condition r^ £ S" {0} for n ^t m is enforced by the definition in Eq. 9.20. When Eq. 9.25 is employed with Eq. 9.9, an additional simplification becomes available. The simplification obtains by incorporating the factor (1/2^" =2^"" present in the Eq. 9.24 definition of C(m,r) with the product factor in Eq. 9.9. That is -|P5-I|=|P-I|-|P--I|=ZC(m,?)x n n(P(i|n)-l/2") f ne S'ie S LxKi-n) = ZC(m,r)x ( 1 ^''^ > 2^ , ,Ul"i x(27 n n(P(i|n)-l/2'-) n e S' i e S LxKi-n) Eq. 9.26 where = ZC'(m,?)x n n(2^P(i|n)-l) r n S'ie S C'(m,r) = C(m,r)x(l/2'-)"' Ki.5) is the coefficient of the indicated monomial in the new variables. Substitution of Eq. 9.25 into this expression for C'(m,r) yields
PAGE 118
109 C'(m,?) = (-1) N'-k-l f 1 T^'^ ')ML Eq. 9.27 fM^ ymj (M^ v"v M^ v"2y M^ V"ky r M ^
PAGE 119
no setting row hia to qT. Further, by virtue of their construction (Eq. 7.7 and 7.10), the single row dependence of the matrix elements on the conditional probability array [P(i | n)] employed in developing the results in Section 9.3 for P and P^^ applies to PChIa)' and P^;^' as well. Thus, Eq. 9.26-27 should extend with very littie modification to the determinants -| Pg^' r I = I P(mA)' I'l |P5^' -I' |, whose zero mutation probability limits are required by Propositions 7.5-6. The following paragraphs highlight the required modifications and employ the result to examine two simple examples. In the a > 0^ counterpart of Eq. 9.26-27, m is limited to membership in the set of one-operator absorbing states (i.e. m = m^ e S^'), a consequence of which is that V"^Ay = 1. Also, all rows of the determinant other than m^ cortespond to nonabsorbing states (i.e. n e K c S' S^'). Thus, the determinant order is N' N -(1 and the differentiation index array is order (N' N -t1) x N with rows corresponding to row indices n e S' Sa' -f{m^}. The rows of r are limited to r^ =0 and r^ S" for n S' S^. FurA ther, if r indicates nonzero order differentiation of any rows which are adjacent to oneoperator absorbing states, then the associated columns of the combinatorial determinant must reflect the coefficient contribution from the adjacent absorbing state. Thus, if Co'(niA>r) denotes the limiting counterpart of C'(m,r) in Eq. 9.27 and if K = {n,, nj, , ni.} c S' S^' where (in the state adjacency notation introduced in Section 7.3) n, e S(nAj)' * S(mA)' and where n2,---,n^ all satisfy nj g S^", then the coefficient of the order k monomial term uniquely identified by r is given by
PAGE 120
Ill Co'(niA,?) = (-l) N'-N-k . f 1 T + 1) Eq. 9.28 X 1 ^M^ v">y fM^ v"2y + L v"A,'rH,^ V V 'y "a'Ts, V v"a,''v ^ M ^ ^ M ^ M \ 1 f M v"A,''"s.y r M ^ M"! "ky r M ^ nir,r^ M ^ f M ^ It is noted that Eq. 9.28 is only an example, not a definition. It must be adjusted based upon r to reflect the number and location of the nonzero adjacent state contributions. The values of the determinants -jPs^' 1' | = I P(niA)' 1'| |Ps^' 1' | are given by employing Eq. 9.28 in Eq. 9.26 (with r restricted as noted above). Further, the a -> 0* limits of the -|Pm/ -I' | = | P(mA)' 1'| |Ps^' -I' | are provided by using the a -^ 0^ limits of the factors (2^P(i | n) 1) in Eq. 9.26. Those limits are provided by using either P,(i I n) or PxXi I n) depending upon whether the two or three -operator case is under consideration. It is instructive to apply these results to a simple example. The following paragraphs do so for the one-bit problem with population size 2. These parameters (L= 1,M = 2) imply that S = {0, 1 }, N = 2, S' = {(20),(1 1),(02)}, N' = 3, Sa' = {(20), (02)} and S' Sa' = {(11)}. Thus r is limited to re < ^(X)^ {QQ^ Too^ foo^ foo^ (X) LvOOy 10 ,00, 01 ,(X), ,(X), 20 vOOy 02 v(X)y >, and the combinatorial determinant required for evaluation of the nonzero order Co'(mA,r)'s for m^ = (20) by Eq. 9.28 has the general form
PAGE 121
112 r 2 w 2 (20), (20),?, A ( (ll)J^t(02) ly (llXr, ( (02),? 2+1 ^ ^ 2 W 2 ^ (20),?, ly (11),?, (02),? iiy Evaluation of the zero order coefficient proceeds as follows ( roov (20), 00 =(-1) 00 vOOy (3-2-0), (\\ 2 ^ V(20)y = (-l)x-xl \_ A' The coefficient corresponding to r,, = (10) is given by r roo^^ (20), 10 vOOy (-1) (3-2-1), ( 1 Y^' 2^ V^ J 1 r 2 ^ (20), (10) (11),(10) 2+1 + (02), (10), 1
PAGE 122
113 C ' C ' (20), (20), 00^^ 11 ,00, roo^^ 20 vOOy 8 3_ 16 and C ' (20), 02 vOOy J_ 16' With the required coefficients provided above the value of -| Ps^' 1' | for m^ = (20) can be expressed (by Eq. 9.26) as 1 1 1 -|Pao)'-n=-7-7(2P(0|(ll))-l)+7(2P(l|(ll))-l) 4 4 Eq. 9.29 +-(2P(0|(11))-1)(2P(1|(11))-1) -^(2P(0|(ll))-l)' + -^(2P(l|(ll))-l)^ 16 16 Then, since P(0 I 11) + P(1 | 11)1 =i> (2P(1 | 11)1) = -(2P(0 | 1 1)1), Eq. 9.29 simplifies to (20) -\Pn.:-r\ --^-^(2P(0 I (11))l)-^(2P(0 1(1 !))-!)' = -^[l+(2P(0|(ll))-l)f Eq. 9.30 = -P(0| 11)1 From the symmetry inherent in the problem, it follows that the m^ = (02) counterpart of Eq. 9.30 is -|P(02)'-n=-P(l|n)', Eq.9.31 and employing Eq. 9.30-31 with Proposition 7.5 yields (for the two-operator case)
PAGE 123
114 P,(0|11)' qo(20) = Pi(0| ll)' + Pi(l| 11)' and Eq. 9.32 qo(02) = V ; P,(0|ll)VP,(l|llf Then, substituting Eq. 4.2 in Eq. 9.32 yields qo(20) = ; ° R(0)' + R(1)' and Eq. 9.33 qo(02) = ; . The limit for the nonabsorbing state m = (1 1) is known to be zero by Proposition 7.5. An identical result to Eq. 9.33 obtains for the three -operator case because for the one-bit problem, crossover is nullified and Pj'Ci I n) = Pi(i I n) (see Eq. 4.22). Additional insight into the behavior of the limiting stationary distribution is obtainable by examining the one-bit problem with population size 3. These parameters (L = 1,M = 3) leave S and N unaltered but change the other state space related sets and parameters to S' = {(30), (21), (12), (03)}, N' = 4, S^' = {(30), (03)} and S'-Sa' = {(21), (12)}. By retracing the previous development (the M = 2case) with? limited as indicated by these state space sets, results analogous to Eq. 9.32 and 9.33 are obtained. Thus, the M=3 counterpart of Eq. 9.32 is P,(0|21)^[P,(0| 12)V3P,(0| 12)^P,(1 I 12)] + '^",(0| 12)'-t-3P,(0| 12)'I ^^^^ [ P,(0|12)^[P,(1|21)V3P,(0I21)P,(1|21)'] J qo(30) = and Eq. 9.34 |P,(1 |21)nP,(l I 12)^-h3P,(l I 12)^P,(0| 12)]-H 1 1 P,(l|12)nP,(0|21)^-h3P,(l|21)P,(0|21)'] J qo(03) = ^
PAGE 124
115 where D = P,(0|21)'[P,(0| 12)' + 3Pi(0| 12)'Pi(l I 12)] + P,(0| 12)'[P,(l|21)' + 3P,(0|21)P,(l|2lf] + P,(l|12f[P,(l|21)' + 3P,(l|21)'P,(0|21)] + P,(l |21)'[P,(0| 12)' + 3P,(1 I 12)P,(0| 12)']. The Eq. 9.33 counterpart is (3Q. ^ [2R(0)]' [R(0)' + 6R(0)'R(1)] + R(0)^ [R(l )^ + 6R(0)R(1 )'] and Eq. 9.35 [2R( 1 )f [R( 1 ) V 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)'] qo(03) = D' where D' = [2R(0)]' [R(0)' + 6R(0)'R( 1 )] + R(0)' [R( 1 )' + 6R(0)R( 1 f] +[2R( 1 )]' [R( 1 f + 6R( 1 )'R(0)] + R( 1 )' [R(0)' + 6R( 1 )R(0)']. Again, the three-operator case yields an identical result. These examples suggest two very significant conjectural features of the limiting stationary distribution behavior. First, only order 2 monomial terms survive in the detemiinant expansions of the M=2 case and only order 6 terms survive for M=3. These facts lead to the supposition that in general, only order Mx(N'-N) terms survive. In the M=2 case, Mx(N'-N) = 2x(3-2) = 2 while for M=3, Mx(N'-N) = 3x(4-2) = 6. If this supposition is correct, then the polynomial forms required for evaluating the stationary distribution zero mutation probability limit by Propositions 7.5 and 7.6 are homogeneous order Mx(N'-N) polynomials in the P(i | n)'s. Presumably, the corresponding property (i.e. homogeneous order Mx(N'-l) order polynomial forms) applies to the general case represented by Propositions 6.7 and 6.S.
PAGE 125
116 A second conjecture concerns the limiting distribution behavior as a function of the parameter M. The computed limiting distribution entropy results displayed in Section 5.5 suggest that the limiting distribution is dominated by optimal solutions for M sufficiently large. That supposition is supported by the results in Eq. 9.33 and 9.35. In the M=2 case, it follows immediately from Eq. 9.33 that qo(02)/qo(20) = [R(1)/R(0)]^ For M=3 and R(l) < R(0) it is straightforward to show that a corresponding bounding relationship exists, i.e. qo(03)/qo(30) < [R(1)/R(0)]\ This suggests that the ratio of the probabilities of the uniform population states corresponding to i and j with R(i) < R(j) behaves at or better than [R(i)/R(j)]'' ^ Eq. 9.36 for M sufficiently large. If this supposition is indeed correct, then the desired limiting distribution behavior for the two-operator simple genetic algorithm (i.e. probability zero for sub-optimal solutions) can be approached as closely as required by selecting M sufficiently large. The corresponding general case (i.e. L>1) three-operator counterparts of Eq. 9.32 and 9.34 are expressed in terms of the P2(i | n)' array (Eq. 4.22). Thus, the numerator polynomial counterparts of Eq. 9.33 and 9.35 are expressed in terms of complex polynomial functions of the reward function values, and consequently it may be that no general case three-operator counterpart of Eq. 9.36 exists. (It is noted that the design of the reward functions employed in Section 5, in which only length 0-2 schema dependence is incorporated, tends to minimize crossover disruption, which may account for the progression toward optimality indicated by the three-operator results recorded in Figures 5-7 through 5-18). The simulated annealing global optimality may thus extrapolate onto the simple genetic algorithm only in the Pc ^ and M -><» limiting sense. 9.5 Extending the Stationary Distribution Representation Eq. 9.26 and 9.27 represent an exact expression of the value of the determinant -| Pji 1| = I P 1| 1 P1| , and with Propositions 6.7 and 6.8 constitute an exact representation of the components of the stationary distribution of the two and three-operator
PAGE 126
117 algorithms. Section 9.4 extends those results to the determinants _| p_ ' _ i' I I PCm^)' I'l |Ps^' 1' I whose a -> 0* values are required for use in Propositions 7.5-6. The utility of these representations depends upon the ability to extract useful relationships between the C'(ni,r)'s from the general form represented by Eq. 9.27 and Eq. 9.28. The following paragraphs examine the combinatorial determinants in the general forms provided by Eq. 9.27 and Eq. 9.28 and deduce some of the key relationships. The purpose of this effort is to provide a foundation for extending the stationary distribution representation methodology developed in Sections 9.2-9.4. First, if the enabling condition for Eq. 9.22 is satisfied (i.e. M < N), then every element in the combinatorial determinant of Eq. 9.27 is either zero or it is the combinatorial determinant corresponding to the order zero coefficient for some state in S'. Thus, every coefficient of the form represented by Eq. 9.27 can be written as sums and products of order zero coefficients. An analogous conclusion applies to Eq. 9.28. Second, it is clear from Eq. 9.27 that nonzero order differentiation of any two or more rows of -| P5 1| = | P 1| 1 P^ 1| in an identical pattern (e.g. f :0^t^^ = Tj^ for n, ^ nz) introduces identical rows into the combinatorial determinant, and thus makes C'(ni,r') = 0. Consequently, no monomial terms corresponding to any r' with identical nonzero rows survive in the expansion of-| P^i1| = | P1| 1 P^1| . An identical conclusion applies to the coefficients of-|PHi^'-I'| = | P(mA)' -I'| |Psa'~^'|' of which Eq. 9.28 is an exemplar. A very important class of coefficient identities derives from transpositions of nonzero rows and columns of the differentiation order array. The resulting identities are very closely connected to the algebra of symmetric and alternating polynomials, and to an associated determinant concept called alternants, of which Vandermonde determinants are a special case. Appendix C is provided to support the following paragraphs. From the form of the combinatorial determinants in Eq. 9.27 and Eq. 9.28, it is clear that exchanging any two of the k rows indexed by row indices n e K is equivalent to
PAGE 127
118 exchanging the corresponding nonzero rows of?If?' is derived from r by such a row transposition, then it follows that C'(m,?') = -C'(m,?). Thus, C'(m,r) establishes the value (to within a sign alternation) of the coefficients of k! distinct monomial terms in the expansion of-| P^-II = | P-I| -| P^-II . An identical result applies toC'o(m,r) and the expansion of -|Ps^'-I'| = IPCmJ' -I'| |P5;^'-r|. The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(ni,r) (or of C'o(m,r)) and a polynomial function of the fonn defined in Eq. C.12 of Appendix C. That is, the collection of terms is a quasi-alternating polynomial function in the array of variables (2^P(i | n) 1). In addition to the preceding result, the following identity applies to C'(m,r) and the expansion of -| P^; 1| = | P 1| 1 P^ 1| . For any n e K, transposition of columns m and n in the combinatorial determinant of Eq. 9.27 is equivalent to representing the value of C'(n,r') where r' is derived from r by exchanging rn with r^ = 0. That is ne K=>C'(n,r') = -C'(m,r). Thus, the identical quasi-alternating function, evaluated in the new set of variables generated by replacing each P(i | n) with the corresponding P(i | m), is included in the expansion of-| P5-II = I P1| 1 P5-II . Collectively, these results account for (k-i1)! of the coefficients required for representation of the stationary distribution. Another class of coefficient identities derives from transpositions of the columns of r (i.e. transpositions of i,j g S). Let m' be derived from m by setting m(j)' = m(i), m(i)' = m(j), n,' from n, by setting n,(j)' = n,(i), n,(i)' = n,(j), etc. Then, if r' is derived from r by transposition of rows m with m', n, with n/, etc. followed by transposition of columns i and j, it follows from Eq. 9.27 that C'(m',?') = C'(m,?). An identical result applies to C'o(mA',r').
PAGE 128
119 The number of distinct coefficients whose values are generated in this fashion from C'(m,r) depends upon both the number and form of the nonzero columns in r. If the number of nonzero columns is p, then exchanging any of the p nonzero columns with any of the N p zero columns generates the coefficient of a distinct monomial term. Exchanging a nonzero column with another nonzero column having a different column sum also generates a distinct coefficient. However, exchanging a nonzero column with another nonzero column having identical column sum may or may not generate a distinct coefficient, depending upon the distribution of the nonzero entries in the two columns, because it is possible for the transformation described above to translate one column into the other. A lower bound on the number of distinct coefficients thus generated is The collection of monomial terms corresponding to this coefficient identity can be written as the product of C'(m,r) (or of C'o(m,r)) and a polynomial function of the form defined in Eq. CIO of Appendix C. That is, the collection of terms is a quasi-symmetric polynomial function in the array of variables (2^P(i | n) 1). These coefficient identities and their connection to the quasi-symmetric and quasialternating polynomials of Appendix C offer a promising mechanism for extending the stationary distribution representation work begun here. Examination of the general form (2^P(i I n) 1) reveals that it is zero mean in the sense that I(2'P(i|n)-l) = 0. ie S This property, along with the common form of the elements in the conditional probability array [P(i | n)], suggests that the symmetric and alternating polynomial forms required for evaluation of Propositions 6.7-6.8 or 6.5-6.6 may admit to large scale simplifications, and ultimately yield a tractable, explicit closed form expression for the stationary distribution components.
PAGE 129
SECTION 10 CONCLUSIONS AND FUTURE DIRECTION 10.1 Summary This dissertation reports an effort to establish an analytical framework for the simple genetic algorithm, based upon the asymptotic probability distribution of the generated solution sequences. The mechanism employed herein is extrapolation of the extensive existing theoretical foundation of the simulated annealing algorithm onto the genetic algorithm. That foundation is based upon the asymptotic behavior of a nonstationary Markov chain simulated annealing algorithm model. The simulated annealing literature is reviewed in Section 2, with particular emphasis on the methodology employed to develop the key theoretical results. Those results include a demonstration that provided a lower bound of the form K/log(k) on the algorithm parameter corresponding to absolute temperature is observed, the asymptotic probability distribution over the algorithm state space is zero for all states corresponding to sub-optimal solutions. Thus, the simulated annealing algorithm obtains (asymptotically) a globally optimal soludon. The genetic algorithm literature is reviewed in Section 3. The significant conclusion of that section is that while certain important theoretical results exist, notably the so called schema theorem and some work on a problem construct referred to as the minimal deceptive problem, no genetic algorithm model or accompanying convergence theory comparable in scope to that of simulated annealing exists in the literature. The fundamental purpose of the work described herein is to provide such an analytical framework by extrapolating the known simulated annealing theory onto the genetic algorithm. An essential first step toward that goal is development of a nonstationary Markov chain algorithm model for the genetic algorithm. That task is accomplished in Section 4. . 120
PAGE 130
121 The product of that effort is a very general nonstationary Markov chain model for variants of the algorithm incorporating combinations of the three fundamental genetic algorithm operators. The model is tailored to resemble the model employed in the analysis of the simulated annealing algorithm convergence behavior, with the mutation probability algorithm parameter playing a role analogous to absolute temperature in simulated annealing. Additionally, some salient features of the model state behavior are pointed out in Section 4. In particular, the one-operator (reproduction only) simple genetic algorithm is shown to possess exactly 2^ absorbing states, one for each possible uniform population, while the two-operator (reproduction/mutation) and three-operator (reproduction/mutation/crossover) variants possess a unique stationary distribution. The expected value of the absorption time for the one-operator algorithm is finite and an upper bound is provided by Eq. 4.8. The probability distribution of the final solution state produced by the one-operator simple genetic algorithm depends upon the initial state, mo. The inclusion of the mutation operator is shown in Section 4 to provide a significant additional dimension to the state behavior of the time-homogeneous (stationary) two and three-operator variants of the algorithm, the existence of a unique stationary distribution. The significance of the unique stationary distribution is that the asymptotic state behavior is independent of the starting state. It is completely determined by the objective function and the algorithm parameters. In Section 5, the genetic algorithm model is employed to generate some computer simulation results. Specifically, a combinatorial interpretation of the model state space is explored numerically in Section 5.2 and the limiting stationary distribution of the three operator algorithm is approximated for a variety of algorithm parameter sets in Section 5.5. A very significant feature of the limiting stationary distribution is suggested by the Section 5.5 results and later verified theoretically (in Section 7). It is that the limiting two and three-operator algorithm stationary distribution behavior necessary for extrapolating
PAGE 131
122 the simulated annealing asymptotic global optimality result does not follow. The limiting distribution is nonzero for all states corresponding to uniform populations (one-operator absorbing states), including those representing sub-optimal solutions. This complication precludes an exact extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm. The Section 5 results do however reinforce the intuitive notion that increasing the algorithm population size parameter biases the limiting distribution towards the desired limiting behavior. Section 6 employs the Perron-Frobenius Theorem (which is summarized in Appendix B) to formulate the time-homogeneous two and three-operator algorithm unique stationary distribution existence argument into a system of equations whose solution is the stationary distribution components. The solution is formulated in terms of Cramer's Rule, and is not explicitly solved, however the Section 6 results provide a remarkable degree of insight into the form of the solution and its behavior with respect to the algorithm parameters. Those results provide the foundation for the remaining sections. The unique stationary distribution existence argument for the stationary two and three-operator algorithm variants only applies when the mutation probability parameter is stricdy greater than zero. A one-operator (zero mutation probability) stationary distribution exists but as demonstrated in Section 4.3.1 it is not unique. A very important requirement for extrapolation of the simulated annealing convergence theory onto the simple genetic algorithm is existence of a zero mutation probability limit for the stationary distribution. Section 7 is devoted to resolving that question affirmatively. It is based upon the results developed in Section 6 and it also verifies the Section 5.5 observation concerning the nonzero limit for all states corresponding to uniform populations. A very significant theoretical contribution of this work is developed in Section 8. It is a monotonic mutation probability bound sufficient to guarantee strong ergodicity of the nonstationary two and three-operator simple genetic algorithm Markov chains. The parameter bound is analogous to the simulated annealing temperature schedule bound.
PAGE 132
123 The bound is asserted in Proposition 8.1, and its form (i.e. j^-r) is asymptotically superior to the K/log(ic) bound associated with the simulated annealing algorithm. It is very noteworthy that the same bound applies both to the two and three-operator algorithm variants. At least in terms of the Section 8 bound, the crossover operator does not expedite convergence. All of the results developed in Sections 7 and 8 are obtained without explicitly solving the stationary distribution system. Section 9 attacks the problem of explicit solution. It is a very extensive and somewhat tedious development. The product of that work is an expression for the general term in a multivariate Taylor's series expansion of the determinant form required for explicit solution of the stationary distribution equations. The results are expressed in Eq. 9.26 and 9.27 for the general nonzero mutation probability case, augmented by Eq. 9.28 for the zero mutation probability limit. These results stop short of a useable answer but they do provide some insight into the nature of the solution. Further, Section 9.5 provides some intriguing ideas for extending the work started in Section 9. The attempt to extrapolate the simulated annealing convergence theory onto the genetic algorithm fails in the sense that the zero mutation probability stationary distribution limits of the two and three-operator simple genetic algorithm variants do not satisfy the required form for extrapolation of the simulated annealing global optimality result. However, evidence is provided which suggests that for the two-operator algorithm variant, the required behavior can be approached by increasing the population size parameter (Eq. 9.36). The question is more complicated for the three-operator case, and as pointed out in Section 9.4, implementation of crossover with nonzero p^ may indeed preclude convergence to global optimality even in the infinite population size limiting sense of Eq. 9.36. The latter observation concerning crossover, along with the equivalence of the mutation probability sequence bounds for the two and three-operator cases noted
PAGE 133
124 previously, poses some significant questions concerning the role of the crossover operator. Indeed, from the results developed herein, it is not clear that any desirable effect on the asymptotic algorithm behavior obtains from application of the crossover operator, though it may have a desirable effect in expediting convergence in real (finite time) applications. The resolution of these questions, along with a host of other applications questions such as optimum population size, mutation and crossover probability parameter selection, number of iterations required to achieve acceptable results, etc. require further progress on the stationary distribution representation task begun in Section 9. 10.2 Contributions of the Research The research reported herein establishes a framework for modeling the genetic algorithm in terms of the asymptotic probability distribution of the solution sequences which it produces. Specific significant accomplishments include the following: (1) A very general nonstationary Markov Chain model of one, two and threeoperator variants of the genetic algorithm, and a framework for analysis of the operators based upon their impact on the state space of the Markov chain (2) Demonstration of the existence of a unique stationary distribution for the time-homogeneous (stationary) two and three-operator algorithm variants (3) A stationary distribution solution in terms of the characteristic polynomials of matrices derived from the state transition matrix (4) Demonstration of the existence of a zero mutation probability stationary distribution limit for the time-homogeneous two and three-operator algorithms (5) A mutation probability schedule bound (analogous to the annealing schedule bound of simulated annealing) sufficient for the nonstationary two and threeoperator genetic algorithm variants to achieve the limiting distribution
PAGE 134
125 (6) A methodology for representing the two and three-operator stationary distribution components at all consistent values of mutation probability (including the zero mutation probability limit), and a proposed approach for extending that methodology to produce an explicit result. 10.3 Future Direction In order to achieve the stated goal of this work, a complete analytical framework for the simple genetic algorithm, additional progress must be made on the stationary distribution solution effort begun in Section 9. The coefficient relationships noted in Section 9.5, especially the coefficient identities which attend transpositions of rows and columns in the differentiation order array and their connection with the quasi-symmetric and quasialternating polynomial notions presented in Appendix C, provide a foundation for proceeding with this effort. An explicit representation of the functional form of the stationary distribution, reduced to a rational function expression in the algorithm parameters and objective function, would provide a very valuable theoretical tool for use in the analysis of genetic algorithm performance, and is the ultimate goal. However, even if explicit solution is not attainable, it may prove possible to deduce very useful bounds on the stationary distribution components from continuation of the Section 9 development. A second promising area for continuation of this work concerns the mutation probability parameter sequence bound provided in Section 8. It is based on very simple lower bounds (Eq. 4.21 and 4.31) which exist for the conditional probabilities which compose the state transition matrix, and it only employs the one-step transition matrix in the (1 -x,(P)) sequence employed to establish weak ergodicity (Section 8.2). Some preliminary work not reported in Section 8 suggests that employing two-step transition matrices in summing the (1 x,(P)) sequence may allow a refinement of the bound to something of the form k~'. It also appears from that preliminary work that the same bound applies for both the two and three-operator algorithm variants.
PAGE 135
APPENDIX A DISCRETE TIME FINITE STATE MARKOV CHAINS A.l Introduction The following paragraphs establish some definitions and theorems on discrete time finite state Markov chains and related stochastic matrix concepts. These results fall into three main categories, (1) elementary definitions, (2) definitions and theorems concerning the state space and asymptotic behavior of time-homogeneous (stationary) Markov chains and (3) some more advanced ergodicity definitions and theorems necessary for the analysis of the asymptotic behavior of inhomogeneous Markov chains. These results are presented without proof or elaboration but the foundation required for the more elementary of them can be obtained from [Cinl75, IsMa76] or many other references on Markov chains. The ergodicity related results can be found in [IsMa76, Sene81]. Although some of the results discussed here apply to continuous time and/or denumerably infinite state space Markov chains as well, the intention is to restrict consideration to the discrete time finite state case. All references herein to Markov chains are understood to mean discrete time finite state Markov chains. In the following, let K = {0, 1 , 2, } be the set of nonnegative integers, let X {Xijik e K} be a discrete time (i.e. discrete sequence index) stochastic process with finite cardinality state space E, and let i,j e E. A. 2 Elementary Definitions Definition Al: If Vi,j E and every k £ K, it follows that Pr{X,,,=j:Xo = iX, = i-,X, = i} = Pr{X,,,=j:X, = i}, then X is a Markov chain. 126
PAGE 136
127 Definition A2: Any row vector q^ = [q(i)] ,i e E satisfying the conditions (1) VieE:q(i)>0 (2) ^= Zq(i)=l ie E is called a probability vector. Definition A3: Any square matrix P whose rows are all composed of probability vectors is called a stochastic matrix, or sometimes more explicitly a row stochastic matrix. The row sum constraint on a row stochastic matrix can be written as PI = 1, so 1 is an eigenvalue of every stochastic matrix. Definition A4: The stochastic matrix Pk=[P.(i.J)] = [Pr{X,..=j|X, = i}] is the one step transition probability matrix or state transition matrix of the Markov chain X. If the probability vectors q^ and q^^.! are respectively the probability distributions of X,( and X|j+i, then qI+i = qi!PkSimilarly, the stochastic matrix P., = fP..(iJ)l = |Pr{X,=j|X = i}l = pJ^...p,.,= nP, l = in where k = m + n, m, n e K, n > is the n-step transition probability matrix of X, and _ k-l_ qi!^=qlPn,k = qIn p,. I = m Definition A5: Let P,; be the state transition matrix of the Markov chain X at time (sequence index) k. Then, X is time-homogeneous if and only if Vk K it follows that P,j = P where Pisa constant state transition matrix.
PAGE 137
128 A. 3 Time-Homogeneous Markov Chains The time-homogeneous Markov chain X is completely specified by its initial probability distribution, qo, and state transition matrix, P. The probability distribution of X|^ ,k> 1 is given by The following definitions and theorems concern the asymptotic behavior of the chain and some conditions on the state space which make the asymptotic behavior independent of qoIn the following, let the i,j e E element of P'' be denoted by p^'(i,j). Definition A6: A subset Eq of the state space E of the Markov chain X is called closed if Vi e Eq , Vj E Eo it follows that p(i, j) = 0. If the closed set Eq contains the single state i, so that p(i,i) = 1, then the state i is called an absorbing state. Definition A7: A Markov chain is called irreducible if there exists no nonempty closed subset of its state space E other than E itself. Definition A8: The states i and j are said to intercommunicate if 3ki,kj e K 9 p^*(i,j) > and p^'^Q, i) > 0. Theorem A 1 : A Markov chain is irreducible if and only if all pairs of states intercommunicate. Definition A9: State i g E of the Markov chain X has period d if the following two conditions hold: (1) p^\i,i) = unless k = md for some positive integer m and
PAGE 138
129 (2) d is the largest integer with property (1). If d = 1, state i is called aperiodic. The Markov chain X is aperiodic if and only if Vi e E are aperiodic. Theorem A2: If X is irreducible and if 3i e E 9 p(i,i) > 0, then X is aperiodic. Definition A 10: Any probability vector q over the state space of the time homogeneous Markov chain X and satisfying is called a stationary distribution of X. It is not necessarily unique. Theorem A3: If the Markov chain X is time-homogeneous, irreducible, aperiodic and has a finite state space, then a stationary distribution exists for X. Funher, the stationary distribution is unique and is determined by and Theorem A4: If the time-homogeneous Markov chain X possesses a unique stationary distribution, q, then for every probability vector x with compatible dimensions, it follows that lim X P = q^.
PAGE 139
130 A. 4 Inhomogeneous Markov Chains Complete specification of the inhomogeneous Markov chain X requires its initial probability distribution, Qq, and the infinite sequence of state transition matrices, {P^} , k > 0. The probability distribution of X^ ,k > 1 is given by q^ = q^nP. n = If the chain is asymptotically independent of qo, then it is said to be ergodic. Two classes of ergodicity must be distinguished. The following definitions and theorems elaborate. Definition All: The inhomogeneous Markov chain X is weakly ergodic if Vi,j,l E,Vm K lim(Pmk(i.l)-PmkG>l)) = 0. Weak ergodicity does not require that either lim Pmk(i,l) or lim PmkO.l) exist. Definition A 12: Any scalar function t(), continuous on the set of nxn stochastic matrices P and satisfying < t(P) < 1 is called a coefficient of ergodicity. If in addition T(P) = 0<^P = Tq^ where q is any probability vector with compatible dimensions (i.e. when all rows of P are identical probability vectors), then T is said to be proper. Weak ergodicity is equivalent to x(P^)^0,k^oo,m>0 where T is a proper coefficient of ergodicity. Theorem A5: Let P be a nxn stochastic matrix and let n T,(P)= 1 min I min(p(i,k),p(j,k)). i.j k=l Then, x^{P) is a proper coefficient of ergodicity.
PAGE 140
131 Theorem A6: The inhomogeneous Markov chain X is weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers {k,}, 1 K such that ,li>-*

»
PAGE 141
APPENDIX B THE PERRON-FROBENIUS THEOREM AND STOCHASTIC MATRICES B.l Introduction A matrix possessing the property that all of its components are nonnegative is referred to as a nonnegative matrix. For the matrix T, this condition is indicated by T > 0. The case in which all components of T are strictly positive is indicated by T > 0. This notation extends in the obvious manner to expressions such asT>B<=>T-B>0 relating nonnegative matrices with compatible dimensions. The definitions, theorems and corollary in Section B.2 below concern nonnegative matrices. They are the foundation for the Markov chain stationary distribution existence and representation theorem and related results summarized in Appendix A and employed in Sections 2, 4, 7 and 8. They are extracted from [SeneSl], and are specialized in Section B.3 from the case of finite nonnegative matrices to the case of finite stochastic matrices. They are employed extensively in Sections 6 and 7. B.2 The Perron-Frobenius Theorem and Ancillary Results for Primitive Matrices Theorem B2 below is called the strong version of the Perron-Frobenius theorem. It applies to a class of nonnegative matrices referred to as primitive. A version of the theorem which applies to the wider class of irreducible nonnegative matrices is usually invoked for applications involving stochastic matrices, but the flexibility of the more general version is not required for the purposes herein. The connection of these results to those of Appendix A is provided by Theorem Bl. It asserts that primitivity (Definition Bl) is equivalent to the combination of irreducibility and aperiodicity, as defined in Appendix A. 132
PAGE 142
133 Definition Bl: A square nonnegative matrix, 7, is primitive if there exists a positive integer k such that T* > 0. Theorem B 1 : If the n x n nonnegative matrix T is irreducible (Definition A7) and aperiodic (Definition A9), then T is primitive and conversely. Theorem B2: Let T be an n x n nonnegative primitive matrix. Then there exists an eigenvalue r of T such that (a) r is real, r > (b) r has corresponding left and right eigenvectors with strictly positive components (c) r > I ^1 for any eigenvalue X^t (d) the eigenvectors associated with r are unique to constant multiples (e) If < B < T and P is an eigenvalue of B, then | P| < r. Moreover, | Pl = r implies B = T. (0 r is a simple root of the characteristic polynomial of T. Definition B2: The eigenvalue r asserted in Theorem B2 is called the Perron-Frobenius eigenvalue of the nonnegative primitive matrix T. Corollary Bl: Let Tj j be the components of a nonnegative primitive matrix T having Perron-Frobenius eigenvalue r. Then min X Tj J < r < max X T^ j 1 j i j ' with equality on either side implying equality throughout (i.e. r can only be equal to the maximal or minimal row sum if all row sums are equal). A similar proposition holds for column sums.
PAGE 143
134 Theorem B3: Let j be an nxn nonnegative primitive matrix with Perron-Frobenius eigenvalue r, let V and w be strictly positive left and right eigenvectors respectively of T corresponding to r with v and w normed so that v'w = 1, and let the t < n distinct eigenvalues of T be ordered such that r > | ^| > | A^| > > 1 ^| with the additional condition that I A^l has multiplicity mj equal to or greater than the multiplicity of any other eigenvalue X^ for which | A^| = | X^l . It follows that (a) if ^2 ?t 0, then as k > «> elementwise, where s = mj 1 ; (b) ifA2 = 0,thenfork>n-l T = r''wv . B.3 The Perron-Frobenius Theory for Stochastic Matrices A stochastic matrix (e.g. the state transition matrix of a Markov chain) is a special case of a square nonnegative matrix in which all row sums are equal to the constant 1. The following results specialize those of Section B.2 to the case of T an nxn stochastic primitive matrix, P. Theorem B4: Let P be an nxn stochastic primitive matrix. Then (a) r = 1 is an eigenvalue of P (b) r = 1 has corresponding left and right eigenvectors with strictly positive components (c) r = 1 > I A,| for any eigenvalue X^t (d) the eigenvectors associated with r = 1 are unique to constant multiples
PAGE 144
135 (e) If < B < P and (3 is an eigenvalue of b, then | p| < r = 1 . Moreover, I PI =r=l implies B = P. (f) r = 1 is a simple root of the characteristic polynomial of P. This theorem follows immediately from Theorem 32 by application of Corollary Bl with T a stochastic primitive matrix, P. Among its consequences are the following. Proposition Bl: The right eigenvector asserted in Theorem B4(b) and (d) can be selected as the vector 1 . This result follows from the row sum constraint on nxn stochastic matrices, which can be expressed as PI = 1. Thus, 1 is a right eigenvector, corresponding to eigenvalue 1, of every nxn stochastic matrix. Theorem B4 asserts that for finite primitive stochastic matrices, it is unique to within a nonzero scalar multiple. Proposition B2: Let the vector q be the left eigenvector asserted in Theorem B4(b) and (d). Then, the additional constraint qT = 1 is consistent and makes q unique. Since the left eigenvector asserted in Theorem B4(b) and (d) has strictly positive components, its inner product with the vector 1 is a strictly positive (nonzero) number. Consequently, that inner product can be used to normalize the eigenvector to produce a q which satisfies both requirements, and Proposition B2 follows. Proposition B3: If P is an n x n stochastic primitive matrix, then lim(P'') = Tq^ where q is the unique vector asserted in Proposition B2.
PAGE 145
136 This result follows from Theorem B3 by specializing it to nxn stochastic primitive matrices via Theorem B4, Proposition Bl and Proposition B2. A very significant consequence is the following. Proposition B4: If P is an n x n stochastic primitive matrix and x an arbitrary n-dimensional probability vector, then lim(x^) = x'^q' = q^ k »o" where q is the unique vector asserted in Proposition B2. Definition B3: If the n x n stochastic primitive matrix P in Theorem B4 and Propositions B1-B4 is the state transition matrix of a time homogeneous Markov chain, then the unique vector q asserted in Proposition B2 is the stationary distribution of the Markov chain.
PAGE 146
APPENDIX C VANDERMONDE DETERMINANTS, SYMMETRIC AND ALTERNATING POLYNOMIALS C.l Introduction An order n determinant whose i, j element is given by ^j{x) for some set of n scalar functions ())j and a companion set of n scalar variables x^, i.e. A(x) (1),(X,) (t)2(x,) (t),(X2) (Jj^CXj)