64 process of calculating kappa were very helpful in identifying which pairs of categories were mistaken for one another. In this study, the majority of the DPICS II categories demonstrated adequate reliability by at least one of the two estimates. Those categories that occurred frequently in both the clinic-referred and comparison groups, such as Acknowledgment, Answer, and Information Question, tended to have the highest reliability estimates for fathers and children by both methods. One exception was the parent category of Information Descriptions, which was classified as having "fair" reliability despite its high frequency of occurrence. Examination of the confusion matrices for parent verbalizations suggests that this category was most often confused with Direct and Indirect Commands, two other frequently occurring parent verbalization categories. Only three parent categories (i.e., Whine, Yell, Destructive) and two child categories (i.e., Destructive, Labeled Praise) were classified as having "poor" reliability. These categories, which also demonstrated poor reliability in Bessmer's (1996) study with mothers-child dyads, had a low frequency of occurrence. When reliability estimates are based on infrequent occurrence both within and across subjects, the estimates are likely to be affected by restricted variance. Because of limited opportunity to code infrequently occurring variables, the coders have to be 100% accurate across subjects. The resulting reliability estimates will either be an overestimate or an underestimate simply due to restricted variance. Reliability estimates for infrequently occurring behaviors and estimates for behaviors occurring in only a few subjects are also likely to be inconsistent across studies. Basic statistical principles indicate that an estimate based on a single measurement is less