66 The results of this study point to the importance of accurately identifying those dyads in which inappropriate behaviors occur because of their power to discriminate between children with significant behavior problems and those without. The poor reliability estimates of some of the inappropriate behavior categories found in this study as well as Bessmer's (1996) study (i.e., Destructive, parent Yell) indicate that further work needs to be done to define these categories and/or additional training needs to be conducted with the coders to consistently attain accepted standards of reliability. One possible solution, which may be more appropriate for certain research studies than for others, would be to combine the inappropriate behavior categories and code them as a single variable. While this option might improve reliability estimates, it would not necessarily improve observer accuracy. Because inappropriate physical behaviors (i.e, Destructive, Physical Negative) and vocalizations (Whine, Yell) appear to be the most problematic for observers, additional training might consist of coding selected videotapes of dyads in which these categories occur frequently. Observers would have to reach a specified criterion (i.e., kappa estimates > .60) on these categories before coding videotapes for research. The use of videotapes to supplement the training manual and workbook is particularly important for training in coding physical behaviors, which must be observed, and vocalizations, which must be heard to code them accurately. Coding physical behaviors and vocalizations is also difficult because observers must simultaneously attend to verbalizations, of which there are many to discriminate between. Observer reliability and accuracy might improve if observers focused on one