The criterion. As with all time series modeling, it is necessary that we be able to
predict the data a step in advance in the future given previous data, which means that
some measure of the prediction error has to remain small. Some adaptive systems use the
instantaneous error ep(n) as error measure. Unfortunately, for our application ep(n) cannot
be used since it is shown that the sequence ep is an insufficient statistic [12], which means
that different experts could produce identical error statistics (or there is a many to one
mapping from different regimes to identical prediction error statistics). This also explains
why segmentation cannot be performed by only tracking the data with one adaptive
system and monitoring ep(n). Besides, in cases where the return maps of several regimes
tend to be close and where the data is noisy, using the instantaneous prediction error as
measure for the performance of the experts will probably cause the segmentation to
exhibit spurious switching (or false alarm), even when it is known that the data switches
less frequently.
As can be seen in Figure 2-2 on a trivial example, if the segmentation is performed
using ep(n) as criterion, even though Model 1 is not a good predictor at all for the data it
is still seen as the best predictor in areas A and B. In order to avoid this kind of "rattling"
from a model to another, a solution that brings memory to the performance criterion is
introduced [1].
The criterion has to be a measure of an average performance at that point in time,
so an good criterion would be the expected value of the square error E[ep2]. With the
useful lie of assuming that the system is ergodic, it can be expressed by the mean square
error (MSE).