Advanced Reliability Calculator

Calculate ICC Intraclass Correlation in SAS

Use this interactive calculator to estimate intraclass correlation coefficients from ANOVA mean squares exactly the way analysts often verify SAS output for one-way random, two-way random absolute agreement, and two-way mixed consistency ICC models.

ICC Calculator

ICC model

Choose the Shrout and Fleiss / McGraw and Wong form that matches your SAS design.

MSR: Mean square for subjects/rows

MSC: Mean square for raters/columns

MSE: Residual / error mean square

n: Number of subjects

k: Number of raters/replicates

Interpretation benchmark

Tip: In SAS, these mean squares typically come from a two-way ANOVA table using subjects and raters as factors, or from mixed-model output when you back-calculate variance components.

Results

Enter your mean squares and click Calculate ICC to see the coefficient, interpretation, and formula details.

How to calculate ICC intraclass correlation in SAS

Intraclass correlation coefficient, usually shortened to ICC, is one of the most useful reliability statistics in applied research. It tells you how strongly measurements from the same target, patient, image, classroom, or specimen resemble one another when scored by multiple raters or repeated measurements. If you are trying to calculate ICC intraclass correlation in SAS, the first thing to understand is that there is no single ICC. There are several ICC forms, each tied to a specific study design and interpretation goal. That is why many analysts get inconsistent answers: they use the wrong model, or they read the wrong mean squares from the wrong ANOVA table.

This page gives you both an interactive calculator and a practical guide to the SAS side of the workflow. The calculator above uses the standard ANOVA-based ICC equations that are commonly referenced from Shrout and Fleiss and later clarified by McGraw and Wong. In a SAS setting, these formulas are often used to verify output from PROC GLM, PROC MIXED, or custom code built from estimated variance components.

What ICC measures in practice

ICC quantifies the proportion of total variation attributable to true differences between subjects rather than measurement noise, rater disagreement, or residual error. A higher ICC means the observed scores are more stable and more reproducible across raters or repeated assessments. Researchers use ICC in clinical measurement, psychology, education, imaging, laboratory methods, biomechanics, and survey scoring.

ICC near 0 suggests little reliability beyond random variation.
ICC around 0.50 to 0.75 usually indicates moderate reliability.
ICC above 0.75 is often treated as good reliability.
ICC above 0.90 is often considered excellent when a high-stakes measure needs very strong reproducibility.

A widely cited reliability interpretation paper from experts in medicine and measurement is available through the National Library of Medicine. For statistical background on ANOVA and mean squares, many analysts also use university resources such as Penn State STAT course materials. If you need SAS coding examples, the UCLA Statistical Methods and Data Analytics site is another strong academic reference.

The key ICC models used with SAS output

To calculate ICC correctly, you must know whether your raters are random or fixed, whether you care about absolute agreement or only consistency, and whether you want reliability for a single rating or the mean of multiple ratings.

ICC(1,1) and ICC(1,k): one-way random-effects models. These are used when subjects are rated by randomly selected raters but the design does not explicitly model a separate rater factor in a two-way layout.
ICC(2,1) and ICC(2,k): two-way random-effects models for absolute agreement. These are common when all subjects are rated by the same raters and those raters are treated as sampled from a broader population.
ICC(3,1) and ICC(3,k): two-way mixed-effects models for consistency. These are used when the specific raters are the only raters of interest and systematic differences among raters are not treated as reliability failure.

In SAS, analysts often fit a two-way ANOVA-like structure and then use the resulting mean squares:

MSR: mean square for subjects, sometimes called rows.
MSC: mean square for raters, sometimes called columns or judges.
MSE: residual or error mean square.
n: number of subjects.
k: number of raters or repeated measures.

Formulas used by the calculator

The calculator above applies these standard equations:

ICC(1,1) = (MSR – MSE) / (MSR + (k – 1) × MSE)
ICC(1,k) = (MSR – MSE) / MSR
ICC(2,1) = (MSR – MSE) / (MSR + (k – 1) × MSE + k × (MSC – MSE) / n)
ICC(2,k) = (MSR – MSE) / (MSR + (MSC – MSE) / n)
ICC(3,1) = (MSR – MSE) / (MSR + (k – 1) × MSE)
ICC(3,k) = (MSR – MSE) / MSR

Notice that some formulas look similar. For example, ICC(1,1) and ICC(3,1) share the same algebraic expression, but they do not mean the same thing conceptually because the study design assumptions differ. That is why a correct SAS workflow starts with design selection, not with formula hunting.

Example interpretation table with realistic reliability ranges

ICC Range	Koo and Li 2016	Cicchetti guideline	Typical interpretation in applied work
< 0.50	Poor	Poor	Measurement error or disagreement dominates true subject differences.
0.50 to 0.75	Moderate	Fair	Usable for exploratory studies, but often not ideal for clinical decisions.
0.75 to 0.90	Good	Good	Generally strong reliability for many research applications.
> 0.90	Excellent	Excellent	Very high reproducibility, often desirable for diagnostic or high-stakes use.

Realistic ANOVA example and ICC results

Suppose a study has 30 subjects and 3 raters, all rating the same subjects. A two-way ANOVA in SAS yields the following mean squares: subjects MSR = 32.45, raters MSC = 4.20, and residual MSE = 6.85. These are realistic values in a moderate-to-good reliability setting. Using those numbers:

Model	Formula basis	Computed ICC	Interpretation
ICC(2,1)	Two-way random, absolute agreement, single	0.576	Moderate reliability
ICC(2,3)	Two-way random, absolute agreement, average of 3 raters	0.803	Good reliability
ICC(3,1)	Two-way mixed, consistency, single	0.575	Moderate reliability
ICC(3,3)	Two-way mixed, consistency, average of 3 raters	0.789	Good reliability

This table shows a practical point that matters in SAS reporting: the average-measures ICC is often much higher than the single-measure ICC. That does not mean the single rater suddenly became more accurate. It means the average of multiple raters smooths out error and increases reliability. If your study uses the mean score across raters in actual decision making, then the average-measures ICC is usually the one you should emphasize.

How to obtain the ingredients in SAS

There are several ways to estimate ICC-related quantities in SAS, depending on your design and your need for confidence intervals or variance components.

1. PROC GLM for ANOVA mean squares

When your design is balanced and straightforward, PROC GLM can produce the ANOVA table needed for the formulas shown above. You model the rating as the outcome and include subject and rater effects. For many users, this is the most direct way to extract MSR, MSC, and MSE.

proc glm data=mydata; class subject rater; model score = subject rater; run; quit;

From the ANOVA output, read the mean square values for subject, rater, and error. Then apply the formula matching your chosen ICC type. If your design is unbalanced, GLM-based mean squares can become less straightforward, and PROC MIXED is often a better route.

2. PROC MIXED for variance components

For more complex data structures, repeated measures, missing data, or random effects modeling, PROC MIXED is usually preferred. With variance components in hand, you can derive an ICC as the proportion of between-subject variance over total variance. This is especially common in multilevel modeling, cluster-randomized studies, and longitudinal settings.

proc mixed data=mydata method=reml; class subject rater; model score = / solution; random subject rater; run;

In a mixed model, the exact ICC formula depends on your specification. For a simple random-intercept model, ICC is often:

ICC = subject variance / (subject variance + residual variance)

That mixed-model ICC is conceptually related to the ANOVA approach, but it is not always identical because the estimation framework differs.

3. Deciding between absolute agreement and consistency

This is one of the biggest reporting mistakes. If two raters differ systematically, absolute agreement penalizes that difference, while consistency can still be high if they rank subjects similarly. In medical or clinical contexts where the actual score matters, absolute agreement is usually the safer choice. In some psychology or education settings where rank ordering matters more than equal scoring level, consistency can be appropriate.

Common mistakes when calculating ICC intraclass correlation in SAS

Using Pearson correlation instead of ICC. Correlation can be high even when raters disagree in absolute level.
Reporting ICC without specifying the model. Saying only “ICC = 0.81” is incomplete.
Using single-measure ICC when the final reported score is an average across raters.
Ignoring whether raters are random or fixed. This changes the model choice.
Applying balanced ANOVA formulas to a noticeably unbalanced dataset without checking assumptions.
Failing to report confidence intervals. Point estimates alone can overstate precision.

Recommended reporting language

A strong methods statement should identify the ICC form, effects structure, agreement type, and whether the estimate refers to a single rater or average of raters. For example:

“Interrater reliability was evaluated with a two-way random-effects intraclass correlation coefficient for absolute agreement, single measurement, ICC(2,1). Based on ANOVA mean squares from SAS, the ICC was 0.58, indicating moderate reliability.”

Or, if the study decision is based on the average of three raters:

“Because final scores were defined as the mean of three raters, we reported ICC(2,3), yielding an ICC of 0.80, consistent with good reliability.”

When negative ICC values appear

Some datasets produce negative ICC values, especially when the between-subject variability is very small relative to error variance. Mathematically, this occurs when MSR is less than MSE in formulas that depend on their difference. In interpretation, a negative ICC usually means reliability is essentially absent, and many applied reports truncate the value to zero for practical discussion, though the exact numeric result can still be shown in supplementary output.

How this calculator helps verify SAS output

If SAS gives you ANOVA mean squares but not the exact ICC label you want, use this calculator as a validation layer. Select the ICC model, enter MSR, MSC, MSE, n, and k, and compare the result to your SAS-based derivation. The chart visualizes the relative size of subject, rater, and residual variability alongside the final ICC value, making it easier to explain the reliability structure to collaborators.

Final takeaways

To calculate ICC intraclass correlation in SAS correctly, always begin with the study design. Ask whether raters are random or fixed, whether absolute agreement or consistency matters, and whether the reported score comes from a single rating or an average. Then obtain the correct ANOVA mean squares or variance components in SAS, apply the matching formula, and report the ICC with a clear label. If you do those steps in order, your reliability analysis will be statistically defensible and much easier for readers to interpret.

For technical guidance and validation, consult high-quality sources like the National Institutes of Health archive on reliability reporting, Penn State statistics resources, and the UCLA SAS examples library. Those references, combined with careful model selection, will help you produce ICC results that are both accurate and publishable.

Calculate Icc Intraclass Correlation In Sas