Add AutoML functionality with Amazon SageMaker Autopilot across accounts
AutoML is a powerful capability, provided by Amazon SageMaker Autopilot, that allows non-experts to create machine learning (ML) models to invoke in their applications.
The problem that we want to solve arises when, due to governance constraints, Amazon SageMaker resources can’t be deployed in the same AWS account where they are used.
Examples of such a situation are:
A multi-account enterprise setup of AWS where the Autopilot resources must be deployed in a specific AWS account (the trusting account), and should be accessed from trusted accounts
A software as a service (SaaS) provider that offers AutoML to their users and adopts the resources in the customer AWS account so that the billing is associated to the end customer
This post walks through an implementation using the SageMaker Python SDK. It’s divided into two sections:
Create the AWS Identity and Access Management (IAM) resources needed for cross-account access
Perform the Autopilot job, deploy the top model, and make predictions from the trusted account accessing the trusting account
The solution described in this post is provided in the Jupyter notebook available in this GitHub repository.
For a full explanation of Autopilot, you can refer to the examples available in GitHub, particularly Top Candidates Customer Churn Prediction with Amazon SageMaker Autopilot and Batch Transform (Python SDK).
We have two AWS accounts:
Customer (trusting) account – Where the SageMaker resources are deployed
SaaS (trusted) account – Drives the training and prediction activities
You have to create a user for each account, with programmatic access enabled and the IAMFullAccess managed policy associated.
You have to configure the user profiles in the .aws/credentials file:
customer_config for the user configured in the customer account
saas_config for the user configured in the SaaS account
To update the SageMaker SDK, run the following command in your Python environment:
The procedure has been tested in the SageMaker environment conda_python3.
Common modules and initial definitions
Import common Python modules used in the script:
Let’s define the AWS Region that will host the resources:
and the reference to the dataset for the training of the model:
Set up the IAM resources
On the customer account, we define the single role customer_trusting_saas, which consolidates the permissions for Amazon Simple Storage Service (Amazon S3) and SageMaker access needed for the following:
The local SageMaker service that performs the Autopilot actions
The principal in the SaaS account that initiates the actions in the customer account
On the SaaS account, we define the following:
The AutopilotUsers group with the policy required to assume the customer_trusting_saas role via AWS Security Token Service (AWS STS)
The saas_user, which is a member of the AutopilotUsers group and is the actual principal triggering the Autopilot actions
For additional security, in the cross-account trust relationship, we use the external ID to mitigate the confused deputy problem.
Let’s proceed with the setup.
For each of the two accounts, we complete the following tasks:
Create the Boto3 session with the profile of the respective configuration user.
Retrieve the AWS account ID by means of AWS STS.
Create the IAM client that performs the configuration steps in the account.
For the customer account, use the following code:
Use the following code in the SaaS account:
Set up the IAM entities in the customer account
Let’s first define the role needed to perform cross-account tasks from the SaaS account in the customer account.
For simplicity, the same role is adopted for trusting SageMaker in the customer account. Ideally, consider splitting this role into two roles with fine-grained permissions in line with the principle of granting least privilege.
The role name and the references to the ARN of the SageMaker AWS managed policies are as follows:
The following customer managed policy gives the role the permissions to access the Amazon S3 resources that are needed for the SageMaker tasks and for the cross-account copy of the dataset.
We restrict the access to the S3 buckets dedicated to SageMaker in the AWS Region for the customer account. See the following code:
Then we define the external ID to mitigate the confused deputy problem:
The trust relationships policy allows the principals from the trusted account and SageMaker to assume the role:
For simplicity, we don’t include the management of the exceptions in the following snippets. See the Jupyter notebook for the full code.
We create the customer managed policy in the customer account, create the new role, and attach the two policies. We use the maximum session duration parameter to manage long-running jobs. See the following code:
Set up IAM entities in the SaaS account
We define the following in the SaaS account:
A group of users allowed to perform the Autopilot job in the customer account
A policy associated with the group for assuming the role defined in the customer account
A policy associated with the group for uploading data to Amazon S3 and managing bucket policies
A user that is responsible for the implementation of the Autopilot jobs – the user has programmatic access
A user profile to store the user access key and secret in the file for the credentials
Let’s start with defining the name of the group (AutopilotUsers):
The first policy refers to the customer account ID and the role:
The second policy is needed to download the dataset, and to manage the Amazon S3 bucket used by SageMaker:
For simplicity, we give the same value to the user name and user profile:
Now we create the two new managed policies. Next, we create the group, attach the policies to the group, create the user with programmatic access, and insert the user into the group. See the following code:
Update the credentials file
Create the user profile for saas_user in the .aws/credentials file:
This completes the configuration of IAM entities that are needed for the cross-account implementation of the Autopilot job.
Autopilot cross-account access
This is the core objective of the post, where we demonstrate the main differences with respect to the single-account scenario.
First, we prepare the dataset the Autopilot job uses for training the models.
We reuse the same dataset adopted in the SageMaker example: Top Candidates Customer Churn Prediction with Amazon SageMaker Autopilot and Batch Transform (Python SDK).
For a full explanation of the data, refer to the original example.
We skip the data inspection and proceed directly to the focus of this post, which is the cross-account Autopilot job invocation.
Download the churn dataset with the following AWS Command Line Interface (AWS CLI) command:
Split the dataset for the Autopilot job and the inference phase
After you load the dataset, split it into two parts:
80% for the Autopilot job to train the top model
20% for testing the model that we deploy
Autopilot applies a cross-validation resampling procedure, on the dataset passed as input, to all candidate algorithms to test their ability to predict data they have not been trained on.
Split the dataset with the following code:
Let’s save the training data into a file locally that we pass to the fit method of the AutoML estimator:
Autopilot training job, deployment, and prediction overview
The following are the steps for the cross-account invocation:
Initiate a session as saas_user in the SaaS account and load the profile from the credentials.
Assume the role in the customer account via the AWS STS.
Set up and train the AutoML estimator in the customer account.
Deploy the top candidate model proposed by AutoML in the customer account.
Invoke the deployed model endpoint for the prediction on test data.
Initiate the user session in the SaaS account
The setup procedure of IAM entities, explained at the beginning of the post, created the saas_user, identified by the saas_user profile in the .aws/credentials file. We initiate a Boto3 session with this profile:
The saas_user inherits from the AutopilotUsers group the permission to assume the customer_trusting_saas role in the customer account.
Assume the role in the customer account via AWS STS
AWS STS provides the credentials for a temporary session that is initiated in the customer account:
The default session duration (the DurationSeconds parameter) is 1 hour. We set it to the maximum duration session value set for the role. If the session expires, you can recreate it by performing the following steps again. See the following code:
The sagemaker_session parameter is needed for using the high-level AutoML estimator.
Set up and train the AutoML estimator in the customer account
We use the AutoML estimator from the SageMaker Python SDK to invoke the Autopilot job to train a set of candidate models for the training data.
The setup of the AutoML object is similar to the single-account scenario, but with the following differences for the cross-account invocation:
The role for SageMaker access in the customer account is CUSTOMER_TRUST_SAAS_ROLE_ARN
The sagemaker_session is the temporary session created by AWS STS
See the following code:
We now launch the Autopilot job by calling the fit method of the AutoML estimator in the same way as in the single-account example. We consider the following alternative options for providing the training dataset to the estimator.
First option: upload a local file and train by fit method
We simply pass the training dataset by referring to the local file that the fit method uploads into the default Amazon S3 bucket used by SageMaker in the customer account:
Second option: cross-account copy
Most likely, the training dataset is located in an Amazon S3 bucket owned by the SaaS account. We copy the dataset from the SaaS account into the customer account and refer to the URI of the copy in the fit method.
Upload the dataset into a local bucket of the SaaS account. For convenience, we use the SageMaker default bucket in the Region.
To allow the cross-account copy, we set the following policy in the local bucket, only for the time needed for the copy operation:
Then the copy is performed by the assumed role in the customer account:
Delete the bucket policy so that the access has been granted only for the time of the copy:
Finally, we launch the Autopilot job, passing the URI of the object copy:
Another option is to refer to the URI of the source dataset in the bucket in SaaS account. In this case, the bucket policy should include the s3:ListBucket action for the source bucket.
The bucket policy should be assigned for the duration of all the training and allow the s3:ListBucket action for the source bucket, including a statement like the following:
We can use the describe_auto_ml_job method to track the status of our SageMaker Autopilot job:
Because an Autopilot job can take a long time, if the session token expires during the fit, you can create a new session following the steps described earlier and retrieve the current Autopilot job reference by implementing the following code:
Deploy the top candidate model proposed by AutoML
The Autopilot job trains and returns a set of trained candidate models, identifying among them the top candidate that optimizes the evaluation metric related to the ML problem.
In this post, we only demonstrate the deployment of the top candidate proposed by AutoML, but you can choose a different candidate that better fits your business criteria.
First, we review the performance achieved by the top candidate in the cross-validation:
If the performance is good enough for our business criteria, we deploy the top candidate in the customer account:
The instance is deployed and billed to the customer account.
Prediction on test data
Finally, we access the model endpoint for the prediction of the label output for the test data:
If the session token expires after the deployment of the endpoint, you can recreate a new session following the steps described earlier and connect to the already deployed endpoint by implementing the following code:
To avoid incurring unnecessary charges, delete the endpoints and resources that were created when deploying the model after they are no longer needed.
Delete the model endpoint
The model endpoint is deployed in a container that is always active. We delete it first to avoid consumption of credits:
Delete the artifacts generated by the Autopilot job
Delete all the artifacts created by the Autopilot job, such as the generated candidate models, scripts, and notebook.
We use the high-level resource for Amazon S3 to simplify the operation:
Delete the training dataset copied into the customer account
Delete the training dataset in the customer account with the following code:
Clean up IAM resources
We delete the IAM resources in reverse order to the creation phase.
Remove the user from the group, and the profile from the credentials, and delete the user:
Detach the policies from the group in the SaaS account, and delete the group and policies:
Detach the AWS policies from the role in the customer account, then delete the role and the policy:
This post described a possible implementation, using the SageMaker Python SDK, of an Autopilot training job, model deployment, and prediction in a cross-account configuration. The originating account owns the data for the training and it delegates the activities to the account hosting the SageMaker resources.
You can use the API calls shown in this post to incorporate AutoML capabilities into a SaaS application, by delegating the management and billing of SageMaker resources to the customer account.
SageMaker decouples the environment where the data scientist drives the analysis from the containers that perform each phase of the ML process.
This capability simplifies other cross-account scenarios. For example: a SaaS provider who owns sensitive data, instead of sharing its data with the customer, could expose certified training algorithms and generate models on behalf of the customer. The customer will receive the trained model at the end of the Autopilot job.
For more examples of how to integrate Autopilot into SaaS products, see the following posts:
About the Authors
Francesco Polimeni is a Sr Solutions Architect at AWS with focus on Machine Learning. He has over 20 years of experience in professional services and pre-sales organizations for IT management software solutions.
Mehmet Bakkaloglu is a Sr Solutions Architect at AWS. He has vast experience in data analytics and cloud architecture, having provided technical leadership for transformation programs and pre-sales activities in a variety of sectors.