Task

The second component of Emmental’s pipeline is to build learning Task.

Task Class

The following describes elements of used for creating Task.

Emmental task.

class emmental.task.Action(name, module, inputs=None)[source]

Bases: object

An action to execute in a EmmentalTask task_flow.

Action is the object that populate the task_flow sequence. It has three attributes: name, module_name and inputs where name is the name of the action, module_name is the module name used in this action and inputs is the inputs to the action. By introducing a class for specifying actions in the task_flow, we standardize its definition. Moreover, Action enables more user flexibility in specifying a task flow as we can now support a wider-range of formats for the input attribute of a task_flow as follow:

1. It now supports str as inputs (e.g., inputs=”input1”) which means take the input1’s output as input for current action.

  1. It also support None as inputs which will take all modules’ output as input.

3. It also supports a list as inputs which can be constructed by three different formats:

a). x (x is str) where takes whole output of x’s output as input: this enables users to pass all outputs from one module to another without having to manually specify every input to the module.

b). (x, y) (y is int) where takes x’s y-th output as input.

c). (x, y) (y is str) where takes x’s output str as input.

Parameters
  • name (str) – The name of the action.

  • module_name – The module_name of the module.

  • inputs (Union[str, Sequence[Union[str, Tuple[str, str], Tuple[str, int]]], None]) – The inputs of the action. Details can be found above.

class emmental.task.EmmentalTask(name, module_pool, task_flow, loss_func, output_func, scorer=None, action_outputs=None, module_device={}, weight=1.0, require_prob_for_eval=True, require_pred_for_eval=True)[source]

Bases: object

Task class to define task in Emmental model.

Parameters
  • name (str) – The name of the task (Primary key).

  • module_pool (ModuleDict) – A dict of modules that uses in the task.

  • task_flow (Sequence[Action]) – The task flow among modules to define how the data flows.

  • loss_func (Callable) – The function to calculate the loss.

  • output_func (Callable) – The function to generate the output.

  • scorer (Optional[Scorer]) – The class of metrics to evaluate the task, defaults to None.

  • action_outputs (Union[str, Sequence[Union[str, Tuple[str, str], Tuple[str, int]]], None]) – The action outputs need to output, defaults to None.

  • module_device (Dict[str, Union[int, str, device]]) – The dict of module device specification, defaults to None.

  • weight (Union[float, int]) – The weight of the task, defaults to 1.0.

  • require_prob_for_eval (bool) – Whether require prob for evaluation, defaults to True.

  • require_pred_for_eval (bool) – Whether require pred for evaluation, defaults to True.

Task Utilities

These utilities are used to build task.

Emmental scorer.

class emmental.scorer.Scorer(metrics=[], customize_metric_funcs={})[source]

Bases: object

A class to score tasks.

Parameters
  • metrics (List[str]) – A list of metric names which provides in emmental (e.g., accuracy), defaults to [].

  • customize_metric_funcs (Dict[str, Callable]) – a dict of customize metric where key is the metric name and value is the metric function which takes golds, preds, probs, uids as input, defaults to {}.

score(golds, preds, probs, uids=None)[source]

Calculate the score.

Parameters
  • golds (Union[ndarray, List[ndarray]]) – Ground truth values.

  • probs (Union[ndarray, List[ndarray]]) – Predicted probabilities.

  • preds (Union[ndarray, List[ndarray]]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

Return type

Dict[str, float]

Returns

Score dict.

Metrics

This show the metrics included with Emmental. These metrics can be used alone, or combined together, to define how to evaluate the task.

Emmental metric module.

emmental.metrics.accuracy_f1_scorer(golds, probs, preds, uids=None, pos_label=1)[source]

Average of accuracy and f1 score.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • pos_label (int) – The positive class label, defaults to 1.

Return type

Dict[str, float]

Returns

Average of accuracy and f1.

emmental.metrics.accuracy_scorer(golds, probs, preds, uids=None, normalize=True, topk=1)[source]

Accuracy classification score.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • normalize (bool) – Normalize the results or not, defaults to True.

  • topk (int) – Top K accuracy, defaults to 1.

Return type

Dict[str, Union[float, int]]

Returns

Accuracy, if normalize is True, return the fraction of correctly predicted samples (float), else returns the number of correctly predicted samples (int).

emmental.metrics.f1_scorer(golds, probs, preds, uids=None, pos_label=1)[source]

F-1 score.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids.

  • pos_label (int) – The positive class label, defaults to 1.

Return type

Dict[str, float]

Returns

F-1 score.

emmental.metrics.fbeta_scorer(golds, probs, preds, uids=None, pos_label=1, beta=1)[source]

F-beta score is the weighted harmonic mean of precision and recall.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • pos_label (int) – The positive class label, defaults to 1.

  • beta (int) – Weight of precision in harmonic mean, defaults to 1.

Return type

Dict[str, float]

Returns

F-beta score.

emmental.metrics.matthews_correlation_coefficient_scorer(golds, probs, preds, uids=None)[source]

Matthews correlation coefficient (MCC).

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

Return type

Dict[str, float]

Returns

Matthews correlation coefficient score.

emmental.metrics.mean_squared_error_scorer(golds, probs, preds, uids=None)[source]

Mean squared error regression loss.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (ndarray) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

Return type

Dict[str, float]

Returns

Mean squared error regression loss.

emmental.metrics.pearson_correlation_scorer(golds, probs, preds, uids=None, return_pvalue=False)[source]

Pearson correlation coefficient and the p-value.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (ndarray) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • return_pvalue (bool) – Whether return pvalue or not, defaults to False.

Return type

Dict[str, float]

Returns

Pearson correlation coefficient with pvalue if return_pvalue is True.

emmental.metrics.pearson_spearman_scorer(golds, probs, preds, uids=None)[source]

Average of Pearson and Spearman rank-order correlation coefficients.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (ndarray) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

Return type

Dict[str, float]

Returns

The average of Pearson correlation coefficient and Spearman rank-order correlation coefficient.

emmental.metrics.precision_scorer(golds, probs, preds, uids=None, pos_label=1)[source]

Precision.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • pos_label (int) – The positive class label, defaults to 1.

Return type

Dict[str, float]

Returns

Precision.

emmental.metrics.recall_scorer(golds, probs, preds, uids=None, pos_label=1)[source]

Recall.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (Optional[ndarray]) – Predicted probabilities.

  • preds (ndarray) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • pos_label (int) – The positive class label, defaults to 1.

Return type

Dict[str, float]

Returns

Recall.

emmental.metrics.roc_auc_scorer(golds, probs, preds, uids=None)[source]

ROC AUC.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (ndarray) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • pos_label – The positive class label, defaults to 1.

Return type

Dict[str, float]

Returns

ROC AUC score.

emmental.metrics.spearman_correlation_scorer(golds, probs, preds, uids=None, return_pvalue=False)[source]

Spearman rank-order correlation coefficient and the p-value.

Parameters
  • golds (ndarray) – Ground truth values.

  • probs (ndarray) – Predicted probabilities.

  • preds (Optional[ndarray]) – Predicted values.

  • uids (Optional[List[str]]) – Unique ids, defaults to None.

  • return_pvalue (bool) – Whether return pvalue or not, defaults to False.

Return type

Dict[str, float]

Returns

Spearman rank-order correlation coefficient with pvalue if return_pvalue is True.