65 reliable and less representative of the population mean than an estimate based upon multiple measurements (Fleiss, 1986). This may explain why the infrequently occurring category of Physical Negative showed excellent reliability in this study and poor reliability in Bessmer's (1996) study. The probability that a behavior will occur, or the base rate, likely affects the observers in several ways. They may have less practice coding these low base rate behaviors and have less chance to become experienced with them during the actual coding. In addition, the observer's preparedness for infrequent categories differs from their preparedness for more frequently occurring categories. Because the majority of these categories are likely to have a low frequency of occurrence in both clinic-referred and normal samples, observers may require additional training in observing these categories and more frequent reliability checks during coding to assure adequate reliability. The parent categories of Yell, Whine, Destructive, Physical Negative and Warning, and the child categories of Destructive and Physical Negative were not coded in the comparison group of this study because they occurred so infrequently (or not at all) in Bessmer's (1996) study, that no estimate could be made of their reliability. It remains possible, however, that these behaviors may occur in a pre-treatment clinic-referred sample on occasion. In addition, Contingent Praise and Labeled Praise rarely occur in pretreatment samples (clinic-referred or nonreferred) and are more likely to occur in a posttreatment sample. Therefore, these categories should remain in the system until their usefulness in studies with a pre- and post-treatment design has been evaluated.