Skip to content

Alpha

evalica.alpha(data, distance='nominal', solver=SOLVER)

Compute Krippendorff's alpha.

Quote

Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. Sage Publications, Thousand Oaks, CA (2018).

Parameters:

Name Type Description Default
data DataFrame

Ratings by observer (rows) and unit (columns).

required
distance DistanceFunc[T_distance_contra] | DistanceName

Distance metric (nominal, ordinal, interval, ratio) or a custom function.

'nominal'
solver SolverName

The solver to use (naive or pyo3).

SOLVER

Returns:

Type Description
AlphaResult

The alpha result.

Source code in evalica/__init__.py
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
def alpha(
    data: pd.DataFrame,
    distance: DistanceFunc[T_distance_contra] | DistanceName = "nominal",
    solver: SolverName = SOLVER,
) -> AlphaResult:
    """
    Compute Krippendorff's alpha.

    Quote:
        Krippendorff, K.: Content Analysis: An Introduction to Its Methodology.
        Sage Publications, Thousand Oaks, CA (2018).

    Args:
        data: Ratings by observer (rows) and unit (columns).
        distance: Distance metric (nominal, ordinal, interval, ratio) or a custom function.
        solver: The solver to use (naive or pyo3).

    Returns:
        The alpha result.

    """
    matrix = _as_unit_matrix(data)
    codes, unique_values = _factorize_matrix(matrix)

    if solver == "pyo3":
        if not PYO3_AVAILABLE:
            raise SolverError(solver)

        numeric_values = np.asarray(unique_values, dtype=np.float64)

        if callable(distance):
            distance_matrix = _custom_distance(distance, unique_values)
            _alpha, observed, expected = _brzo.alpha(codes, numeric_values, distance_matrix)
        else:
            _alpha, observed, expected = _brzo.alpha(codes, numeric_values, distance)

        if expected == 0.0:
            _alpha = 1.0 if observed == 0.0 else 0.0
    else:
        _alpha, observed, expected = _alpha_naive(codes, unique_values, distance)

    return AlphaResult(
        alpha=_alpha,
        observed=observed,
        expected=expected,
        solver=solver,
    )

evalica.AlphaResult dataclass

The result of Krippendorff's alpha.

Attributes:

Name Type Description
alpha float

The alpha value.

observed float

The observed disagreement.

expected float

The expected disagreement.

solver SolverName

The solver used.

Source code in evalica/__init__.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
@dataclass(frozen=True)
class AlphaResult:
    """
    The result of Krippendorff's alpha.

    Attributes:
        alpha: The alpha value.
        observed: The observed disagreement.
        expected: The expected disagreement.
        solver: The solver used.

    """

    alpha: float
    observed: float
    expected: float
    solver: SolverName

evalica.alpha_bootstrap(data, distance='nominal', solver=SOLVER, *, n_resamples=5000, confidence_level=0.95, random_state=None)

Compute confidence intervals for Krippendorff's alpha using KALPHA-style bootstrap.

Quote

Krippendorff, K.: Bootstrapping Distributions for Krippendorff's Alpha. (2006). https://www.asc.upenn.edu/sites/default/files/2021-03/Algorithm%20for%20Bootstrapping%20a%20Distribution%20of%20Alpha.pdf.

Quote

Hayes, A.F.: Statistical Methods and Macros for SPSS, SAS, and R. https://afhayes.com/spss-sas-and-r-macros-and-code.html.

Parameters:

Name Type Description Default
data DataFrame

Ratings by observer (rows) and unit (columns).

required
distance DistanceFunc[T_distance_contra] | DistanceName

Distance metric (nominal, ordinal, interval, ratio) or a custom function.

'nominal'
solver SolverName

The solver to use (naive or pyo3).

SOLVER
n_resamples int

Number of bootstrap samples.

5000
confidence_level float

The confidence level.

0.95
random_state int | None

The random seed (non-negative integer or None).

None

Returns:

Type Description
AlphaBootstrapResult

The alpha bootstrap result.

Source code in evalica/__init__.py
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
def alpha_bootstrap(
    data: pd.DataFrame,
    distance: DistanceFunc[T_distance_contra] | DistanceName = "nominal",
    solver: SolverName = SOLVER,
    *,
    n_resamples: int = 5000,
    confidence_level: float = 0.95,
    random_state: int | None = None,
) -> AlphaBootstrapResult:
    """
    Compute confidence intervals for Krippendorff's alpha using KALPHA-style bootstrap.

    Quote:
        Krippendorff, K.: Bootstrapping Distributions for Krippendorff's Alpha. (2006).
        <https://www.asc.upenn.edu/sites/default/files/2021-03/Algorithm%20for%20Bootstrapping%20a%20Distribution%20of%20Alpha.pdf>.

    Quote:
        Hayes, A.F.: Statistical Methods and Macros for SPSS, SAS, and R.
        <https://afhayes.com/spss-sas-and-r-macros-and-code.html>.

    Args:
        data: Ratings by observer (rows) and unit (columns).
        distance: Distance metric (nominal, ordinal, interval, ratio) or a custom function.
        solver: The solver to use (naive or pyo3).
        n_resamples: Number of bootstrap samples.
        confidence_level: The confidence level.
        random_state: The random seed (non-negative integer or None).

    Returns:
        The alpha bootstrap result.

    """
    if n_resamples <= 0:
        msg = "n_resamples must be a positive integer"
        raise ValueError(msg)
    if not 0.0 < confidence_level < 1.0:
        msg = "confidence_level must be in (0, 1)"
        raise ValueError(msg)
    if random_state is not None and random_state < 0:
        msg = "random_state must be a non-negative integer or None"
        raise ValueError(msg)

    random_seed = random_state

    matrix = _as_unit_matrix(data)
    codes, unique_values = _factorize_matrix(matrix)

    if solver == "pyo3":
        if not PYO3_AVAILABLE:
            raise SolverError(solver)

        numeric_values = np.asarray(unique_values, dtype=np.float64)

        if callable(distance):
            distance_matrix = _custom_distance(distance, unique_values)
            _alpha, observed, expected, distribution = _brzo.alpha_bootstrap(
                codes,
                numeric_values,
                distance_matrix,
                n_resamples,
                random_seed,
            )
        else:
            _alpha, observed, expected, distribution = _brzo.alpha_bootstrap(
                codes,
                numeric_values,
                distance,
                n_resamples,
                random_seed,
            )
    else:
        _alpha, observed, expected = _alpha_naive(codes, unique_values, distance)
        distribution = _alpha_bootstrap_naive(
            codes,
            unique_values,
            distance,
            n_resamples,
            random_seed,
        )

    distribution = np.asarray(distribution, dtype=np.float64)
    alpha_tail = (1.0 - confidence_level) / 2.0
    low_quantile, high_quantile = np.quantile(distribution, [alpha_tail, 1.0 - alpha_tail])
    low = float(low_quantile)
    high = float(high_quantile)

    return AlphaBootstrapResult(
        alpha=float(_alpha),
        observed=float(observed),
        expected=float(expected),
        low=low,
        high=high,
        distribution=pd.Series(distribution, name="alpha"),
        n_resamples=len(distribution),
        confidence_level=confidence_level,
        solver=solver,
    )

evalica.AlphaBootstrapResult dataclass

Bases: AlphaResult

The bootstrap result of Krippendorff's alpha.

Attributes:

Name Type Description
alpha float

The alpha value.

observed float

The observed disagreement.

expected float

The expected disagreement.

low float

The lower bound of the confidence interval.

high float

The upper bound of the confidence interval.

distribution Series

The bootstrap alpha distribution.

n_resamples int

The number of bootstrap samples used.

confidence_level float

The confidence level.

solver SolverName

The solver used.

Source code in evalica/__init__.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
@dataclass(frozen=True)
class AlphaBootstrapResult(AlphaResult):
    """
    The bootstrap result of Krippendorff's alpha.

    Attributes:
        alpha: The alpha value.
        observed: The observed disagreement.
        expected: The expected disagreement.
        low: The lower bound of the confidence interval.
        high: The upper bound of the confidence interval.
        distribution: The bootstrap alpha distribution.
        n_resamples: The number of bootstrap samples used.
        confidence_level: The confidence level.
        solver: The solver used.

    """

    low: float
    high: float
    distribution: pd.Series
    n_resamples: int
    confidence_level: float

evalica.DistanceName = Literal['interval', 'nominal', 'ordinal', 'ratio'] module-attribute

evalica.DistanceFunc

Bases: Protocol[T_distance_contra]

The distance function protocol.

Source code in evalica/__init__.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
class DistanceFunc(Protocol[T_distance_contra]):
    """The distance function protocol."""

    def __call__(self, left: T_distance_contra, right: T_distance_contra, /) -> float:
        """
        Compute the distance between the values.

        Args:
            left: The left-hand side value.
            right: The right-hand side value.

        Returns:
            The non-negative distance between the values.

        """
        ...

__call__(left, right)

Compute the distance between the values.

Parameters:

Name Type Description Default
left T_distance_contra

The left-hand side value.

required
right T_distance_contra

The right-hand side value.

required

Returns:

Type Description
float

The non-negative distance between the values.

Source code in evalica/__init__.py
159
160
161
162
163
164
165
166
167
168
169
170
171
def __call__(self, left: T_distance_contra, right: T_distance_contra, /) -> float:
    """
    Compute the distance between the values.

    Args:
        left: The left-hand side value.
        right: The right-hand side value.

    Returns:
        The non-negative distance between the values.

    """
    ...