Migration

Crowd-Kit

Users of the Crowd-Kit library can easily switch to Evalica by replacing their label item references with the corresponding Winner values, enjoying the faster and cleaner code.

>>> import pandas as pd
>>> from crowdkit.aggregation import BradleyTerry
>>> df = pd.DataFrame(
...     [
...         ['item1', 'item2', 'item1'],
...         ['item3', 'item2', 'item2']
...     ],
...     columns=['left', 'right', 'label']
... )
>>> agg_bt = BradleyTerry(n_iter=100).fit_predict(df)

Evalica is not bound to the specific column names, reducing the potentially expensive operation of building a data frame, while remaining fully compatible with NumPy and pandas.

>>> import pandas as pd
>>> from evalica import bradley_terry, Winner
>>> df = pd.DataFrame(
...     [
...         ['item1', 'item2', Winner.X],
...         ['item2', 'item3', Winner.Y]
...     ],
...     columns=['left', 'right', 'label']
... )
>>> scores = bradley_terry(df['left'], df['right'], df['label'], limit=100)

Or simply:

>>> from evalica import bradley_terry, Winner
>>> scores = bradley_terry(
...     ['item1', 'item2'],
...     ['item2', 'item3'],
...     [Winner.X, Winner.Y],
...     limit=100,
... )

NLTK

Users of the NLTK library computing Krippendorff’s alpha via AnnotationTask can switch to Evalica for a cleaner and more efficient interface.

>>> from nltk.metrics import binary_distance
>>> from nltk.metrics.agreement import AnnotationTask
>>> data = [
...     ('coder1', 'item1', 1),
...     ('coder1', 'item2', 2),
...     ('coder2', 'item1', 1),
...     ('coder2', 'item2', 3),
...     ('coder3', 'item1', 2),
...     ('coder3', 'item2', 2),
... ]
>>> task = AnnotationTask(data, distance=binary_distance)
>>> task.alpha()

Evalica accepts a pandas DataFrame with observers as rows and units as columns, avoiding the need to construct (coder, item, label) triples manually. The built-in distance metrics are specified by name.

>>> import pandas as pd
>>> from evalica import alpha
>>> df = pd.DataFrame(
...     [[1, 2], [1, 3], [2, 2]],
... )
>>> result = alpha(df, distance="nominal")
>>> result.alpha

NLTK’s binary_distance corresponds to Evalica’s "nominal" and interval_distance corresponds to "interval".