## Agreement For Raters

There are a number of statistics that can be used to determine reliability between evaluators. Different statistics are adapted to different types of measures. Some options are the common probability of an agreement, cohens Kappa, Scotts Pi and the related fleiss-Kappa, inter-rater correlation, concordance correlation coefficient, intraclass correlation and Krippendorffs Alpha. Kappa is similar to a correlation coefficient, as it cannot exceed +1.0 or less than -1.0. Since it is used as a measure of compliance, only positive values are expected in most situations; Negative values would indicate systematic differences of opinion. Kappa can only reach very high values if both convergences are good and the rate of the target condition is close to 50% (because it includes the base rate in the calculation of common probabilities). Several authorities have proposed “ground rules” for interpreting the degree of agreement, many of which fundamentally agree, although the terms are not identical. [8] [9] [10] [11] The analytical approach used and discussed serves as an example of the evaluation of ratings and rating instruments applicable to a large number of developmental and behavioural characteristics. It assesses and documents the differences and similarities between evaluator subgroups and assessed subgroups by combining different statistical analyses. If future reports succeed in discriminating between the concepts of consistency, reliability and correlation of liner and if the statistical approaches necessary for the management of each aspect are used appropriately, greater comparability of research results and thus greater transparency will be achieved. In summary, this report has two main objectives: to provide a methodological tutorial for assessing the inter-rater reliability, consistency and linear correlation of assessment pairs and to assess whether the German parent questionnaire ELAN (Bockmann and Graviese-Himmel, 2006) can also be reliably used among Kita teachers to assess the early development of expressive vocabulary.

We compared mother-father and parent-teacher assessments in terms of compliance, correlation and reliability of the assessments. We also looked at which child- and consultant-related factors influence the consistency and reliability of assessors. In a relatively homogeneous group of middle-class families and high-quality kite environments, we expected high consistency and linear correlation of assessments. There are several operational definitions of “inter-board reliability” that reflect different views on what a reliable agreement between evaluators is. [1] There are three operational definitions of compliance: σ2bt is the variance of assessments between children, σ2in is the variance within children, and k is the number of assessors. . . .