In this blog post, we will explore the risk of ML based workflow architecture based on ISO42001
A machine learning pipeline, also known as an ML workflow, is a process for codifying and automating the steps needed to create a machine learning model. By following a sequence of steps in a specific order, you can establish a complete ML pipeline. The four primary steps in an ML pipeline are data ingestion and preparation, model training, model deployment, and monitoring. The four main steps are considered the key components of the process. Moreover, there are intermediate processes that contribute to a comprehensive pipeline.
A machine learning pipeline, also known as an ML workflow, is a process for codifying and automating the steps needed to create a machine learning model. By following a sequence of steps in a specific order, you can establish a complete ML pipeline. The four primary steps in an ML pipeline are data ingestion and preparation, model training, model deployment, and monitoring. The four main steps are considered the key components of the process. Moreover, there are intermediate processes that contribute to a comprehensive pipeline.
Why ML workflow is important?
For many who are familiar with ETL tools such as Oozie, Talend, Pentaho, Apache Nifi, ML workflow can be considered as the more complicated version of the ETL workflow, with additional model running on top of that, and post processing workflow that happen after that.
There are several benefits of having a continuous machine learning workflow:
Standardize preprocessing and model deployment
Scale features and models through sub-workflow, and custom workflow deployment
Allow companies to do A/B testing by running various models and determine the best fit model
Does ML workflow plays a significant role to fulfill AI Audit Framework?
Yes. In fact, having robust ML architecture fulfills control and policy requirements of AI frameworks.
On the surface level, all three famous frameworks rank risk and robust system design high up in the food chain.
ISO42001 | NIST AI RMF | EU AI ACT |
Control B.6.2.7: AI System technical documentation Control B.8.2: System documentation information for users | Map 4.2: Internal risk controls for components of the AI system are identified and documented | Objective CO-06: The applicant should perform a functional analysis of the system |
Things to consider from ISO 42001 architecture designs
Acquisition of data - Data Preparation - Data Quality (ISO42001 B.7.3, B.7.4)
Acquisition of data and data preparation is the first gateway into ML workflow. Typically data that can come into ML workflow can be classified as :
streaming data (real-time) and
batch data (non real-time)
Then, the data need to be pre-processed into different formats into be to be readable by the algorithm. The ISO42001 standard put greater emphasize on the data consistency, and data reliability.
From audit perspective, some questions to considers are:
How do you make sure that the input data are consistent?
What kind of sanity test are you taking, such as regression test, integration test?
How does the failover to secondary server (Disaster Recovery) work?
Did you do any normalization such as null handling, imputation, date format handling?
A.I system validation and result evaluation (ISO 42001 B.6.2.4)
Depending on the model output, the result output from the models also needs to be validated, and captured to be used for training a better model.
From audit risk perspective, there are several questions to consider such as :
How do you validate that the model produces unbiased output? What are the edge test used to validate the output?
Does the output produces better that expected result to meet business needs? It is a different question than the first question. Results can also be unbiased but bad output.
What is the SLA (Service Level Agreement) that need to be met? This is important for time-sensitive business requirement, or the ML system need to send the data to 3rd party vendor?
How long do you need to store the output? This is important because for example financial institution in US require companies to store 7 years worth of AI prediction.
Next on the part 3, we will talk deeper about risk mitigation and prevention in ISO42001