AI FactSheets 360

AI Lifecycle Governance

The Need for Governance

Enterprises creating AI are being challenged by an emerging problem: How to effectively govern the creation and deployment of these services. Enterprises want to understand and gain control over their current AI lifecycle processes, often motivated by internal policies or external regulation.

View a FactSheet captured from a sample AI lifecycle.

The AI Lifecycle

The AI lifecycle includes a variety of roles, performed by people with different specialized skills and knowledge that collectively produce an AI service. Each role contributes in a unique way, using different tools. Figure 1 specifies some common roles.

Figure 1: The roles in the AI lifecycle

The process starts with a business owner who requests the construction of an AI model (or service). The request includes the purpose of the model or service, how to measure its effectiveness, and any other constraints, such as bias thresholds, appropriate datasets, or levels of explainability and robustness required.

The data scientist uses this information to construct a candidate model by using, most typically, a machine learning process. This iterative process includes selecting and transforming the dataset, discovering the best machine learning algorithm, tuning algorithm parameters, etc. The goal is to produce a model that best satisfies the requirements set by the business owner.

Before this model is deployed it must be tested by an independent person, referred to as a model validator in Figure 1. This role, often falling within the scope of model risk management, is similar to a testing role in traditional software development. A person in this role applies a different dataset to the model and independently measures metrics defined by the business owner. If the validator approves the model it can be deployed.

The AI operations engineer is responsible for deploying and monitoring the model in production to ensure it operates as expected. This can include monitoring its performance metrics, as defined by the business owner. If some metrics are not meeting expectations, the operations engineer is responsible for taking actions and informing the appropriate roles.

We use this example to illustrate the AI Governance scenario. Real lifecycles will likely have additional roles. A common pattern is for a model to be combined with other models or human-written code to form a service. In such a case the validator's role will be extended to also validate the full service.

Real lifecycles will also involve iteration throughout the lifecycle, such as within a role (a data scientist building many models before passing it to a validator) or between roles (an operations person sending a model back to a data scientist because it is performing poorly).

AI Facts

A key requirement for enabling AI Governance is the ability to collect model Facts throughout the AI lifecycle. This set of facts can be used to create a FactSheet for the model or service. Figure 2 illustrates how facts can be captured from the activities of the various lifecycle roles.

Figure 2: AI model facts being generated by different AI lifecycle roles

Capturing model facts in a common place provides shareable knowledge about the model. These facts can then be queried by any role in the lifecycle or outside of it, as shown in Figure 3. These queries can be conveniently made using FactSheet Templates to obtain a customized view of the Facts appropriate to the role. Some examples might be:

a business owner wanting to know all of the model facts about the model they requested
a model validator, comparing the facts of a data scientist-produced model to a simpler challenge model
a manager presenting a summary of the facts in a slide format to an executive

Figure 3: Model facts consumed by lifecycle roles and enterprise-wide roles to meet their specific needs

Providing this level of insight into the AI lifecycle can help enterprises understand the effectiveness of their current processes and gain insights on how to improve them. This information can be used to enforce enterprise governance policies, by not allowing the process to advance to the next stage if a fact condition is not met. This provides a needed level of control to minimize risk and ensure compliance with external regulations.

Enabling Governance

As described above, AI Facts and FactSheets can be the cornerstone of AI governance by providing enterprises with the ability to

capture AI lifecycle facts, enabling greater visibility and automated documentation
perform analysis of these facts to improve business outcomes, increase overall efficiency, and learn best practices
specify enterprise policies to be enforced during the AI development and deployment lifecycle
facilitate communication and collaboration among the diverse lifecycle roles and other stakeholders

Discover more about IBM's AI Governance platform.

An Example

To make these concepts more concrete, we show some example facts for each of the four lifecycle roles mentioned above: business owner, data scientist, model validator, and AI operations engineer. The facts capture information about the development and deployment of a model for assisting mortgage approval. We leverage the Home Mortgage Disclosure Act dataset from the US Consumer Finance Protection Bureau, which contains information about each mortgage applicant (the features) and the approval outcome (label). The label is used for training and for evaluating the trained model.

We leverage open source toolboxes for measuring the fairness (AI Fairness 360), adversarial robustness (AI Adversarial Robustness 360), and explainability (AI Explainability 360) of the model. In practice, we expect the volume of facts to be much larger than what we show below and to be customized to a particular enterprise's interest. We present a subset of facts to illustrate how the facts are provided by the various roles in the lifecycle and accrue over the course of the lifecycle.

Table 1 shows the first sets of facts in the lifecycle, which come from the business owner. These facts are the purpose of the model and the risk level of the model to the company. The model purpose fact can be useful for determining if the model might be applicable in a different circumstance, such as a car loan approval scenario. The risk level fact can be useful for determining the level of validation needed for this model. Higher risk models should be validated more thoroughly.

Business Owner Fact	Value
Model purpose	Predict mortgage approval
Risk level	High

Business Owner Fact	Value
Model purpose	Predict mortgage approval
Risk level	High

Table 1: Sample facts from business owner

For the data scientist, model validator, and AI operations engineer, we focus on a set of facts related to the model's predictive performance, fairness, adversarial robustness, and explainability. Focusing on these four dimensions demonstrates the diversity of facts.

Table 2 shows the facts for various measures of accuracy, fairness, adversarial robustness, and explainability of the model (1st column). The definitions of these metrics can be found in machine learning literature (accuracy) or in the open source toolkits mentioned above (fairness, robustness, explainability). The table provides values for these metrics measured on the data scientist's dataset (2nd column) and the model validator's independent dataset (3rd column). We see small differences in the values consistent with a successful validation of the model.

Predictive Performance	Data Scientist's Dataset	Validator's Dataset
Accuracy	0.95	0.94
Balanced Accuracy	0.63	0.63
AUC	0.79	0.78
F1	0.97	0.97

Fairness
Disparate Impact	0.97	0.97
Statistical Parity Difference	-0.03	-0.03

Adversarial Robustness
Empirical Robustness	0.02	0.01

Explainability
Faithfulness Mean	0.31	0.36

Predictive Performance
Accuracy Data Scientist's Dataset: 0.95 Validator's Dataset: 0.94
Balanced Accuracy Data Scientist's Dataset: 0.63 Validator's Dataset: 0.63
AUC Data Scientist's Dataset: 0.79 Validator's Dataset: 0.78
F1 Data Scientist's Dataset: 0.97 Validator's Dataset: 0.97

Fairness
Disparate Impact Data Scientist's Dataset: 0.97 Validator's Dataset: 0.97
Statistical Parity Difference Data Scientist's Dataset: -0.03 Validator's Dataset: -0.03

Adversarial Robustness
Empirical Robustness Data Scientist's Dataset: 0.02 Validator's Dataset: 0.01

Explainability
Faithfulness Mean Data Scientist's Dataset: 0.31 Validator's Dataset: 0.36

Table 2: Model facts from data scientist and validator

Another common test performed by model validators is to determine if the performance of the created model can be replicated by using a simpler "challenge" model. If this is possible, an enterprise will likely prefer the simpler model because it is easier to understand and thus, poses less risk. Table 3 illustrates the performance of the data scientist's model (2nd column) versus a challenge model (3rd column) created by the model validator on the same dataset. The table shows that the data scientist's model performs better and is therefore a good candidate to be deployed.

Predictive Performance	Data Scientist's Model	Challenge Model
Accuracy	0.94	0.89
Balanced Accuracy	0.63	0.62
AUC	0.78	0.62
F1	0.97	0.93

Fairness
Disparate Impact	0.97	0.94
Statistical Parity Difference	-0.03	-0.06

Adversarial Robustness
Empirical Robustness	0.01	0.13

Explainability
Faithfulness Mean	0.36	0.87

Predictive Performance
Accuracy Data Scientist's Model: 0.94 Challenge Model: 0.89
Balanced Accuracy Data Scientist's Model: 0.63 Challenge Model: 0.62
AUC Data Scientist's Model: 0.78 Challenge Model: 0.62
F1 Data Scientist's Model: 0.97 Challenge Model: 0.93

Fairness
Disparate Impact Data Scientist's Model: 0.97 Challenge Model: 0.94
Statistical Parity Difference Data Scientist's Model: -0.03 Challenge Model: -0.06

Adversarial Robustness
Empirical Robustness Data Scientist's Model: 0.01 Challenge Model: 0.13

Explainability
Faithfulness Mean Data Scientist's Model: 0.36 Challenge Model: 0.87

Table 3: Model facts from data scientist's model and the validator's challenge model

Our final set of facts comes from the AI operations engineer and is shown in Table 4. As in Table 2, we have the same model created by the data scientist run on the data scientist's dataset (2nd column) and the validator's dataset (3rd column). The deployment facts (4th column) represent the model's performance on input seen by the deployed system. We see the model's performance has some degradation with the deployment data. An AI operations engineer would continue to monitor the performance of the model, using a platform like Watson OpenScale and if the performance degrades, could contact the data scientist for further investigation or the business owner to decide if the model should be retired.

Predictive Performance	Data Scientist's Dataset	Validator's Dataset	Deployment Data
Accuracy	0.95	0.94	0.92
Balanced Accuracy	0.63	0.63	0.61
AUC	0.79	0.78	0.77
F1	0.97	0.97	0.96

Fairness
Disparate Impact	0.97	0.97	0.95
Statistical Parity Difference	-0.03	-0.03	-0.04

Adversarial Robustness
Empirical Robustness	0.02	0.01	0.02

Explainability
Faithfulness Mean	0.31	0.36	0.35

Predictive Performance
Accuracy Data Scientist's Dataset: 0.95 Validator's Dataset: 0.94 Deployment Data: 0.92
Balanced Accuracy Data Scientist's Dataset: 0.63 Validator's Dataset: 0.63 Deployment Data: 0.61
AUC Data Scientist's Dataset: 0.79 Validator's Dataset: 0.78 Deployment Data: 0.77
F1 Data Scientist's Dataset: 0.97 Validator's Dataset: 0.97 Deployment Data: 0.96

Fairness
Disparate Impact Data Scientist's Dataset: 0.97 Validator's Dataset: 0.97 Deployment Data: 0.95
Statistical Parity Difference Data Scientist's Dataset: -0.03 Validator's Dataset: -0.03 Deployment Data: -0.04

Adversarial Robustness
Empirical Robustness Data Scientist's Dataset: 0.02 Validator's Dataset: 0.01 Deployment Data: 0.02

Explainability
Faithfulness Mean Data Scientist's Dataset: 0.31 Validator's Dataset: 0.36 Deployment Data: 0.35

Table 4: Model facts from data scientist, validator, and AI operations engineer

The facts shown in Table 4, although only a subset of the large number of facts that could be collected, illustrate the value of collecting these facts throughout the lifecycle. In particular, all four roles can use this information to be more informed for future actions.

The business owner can easily see how the model that they requested performed after creation (2nd column), during validation (3rd column), and in live deployment (4th column). Large discrepancies in these metrics between stages might trigger further analysis (of the model and, possibly, of the development processes used throughout the lifecycle).
The data scientist and model validator can use the information to understand how effective their approaches are to developing and validating models that perform well when deployed.
The AI operations engineer can use the information to understand the relationships between metrics generated prior to deployment and the metrics generated in production (leading, perhaps, to improvements in the metrics used by the model developer and validator).

Each of the tables in this section can be considered as at least a portion of a FactSheet; a collection of facts about a model that is tailored to the consumer's needs leveraging the Templates mentioned in the Introduction section. The Examples section of this website shows more complete FactSheets for several real models.

The Need for Governance

The AI Lifecycle

AI Facts

Enabling Governance

An Example

IBM Offerings