Third-Party Credentials
In order to facilitate easy and secure access to other cloud-provider data stores or services, you can configure access credentials here that you can later attach to jobs or use for loading datasets. Although configuring a third-party credential is not mandatory, it is most streamlined way of ensuring your job environment has the access to the services it needs.
Once you configure a third-party credential, you will no longer be able to retrieve the credential's secret. If a configured credential's secret is blank when revisiting the page, this does not mean the secret is lost. However, if you edit the credential and click save without re-entering the credential secret, it will be erased.
Third-party credentials are configured on a per project basis so that different projects can be assigned granular access to their own third party resources. The account settings page allows you to configure the third-party credentials for your Personal project. You can configure they credentials for other projects you own from the projects page.
AWS
proxiML uses temporary security credentials for all resource access. The IAM user and its access keys are only used to authenticate proxiML during the assume role operation.
Create a new IAM user for the proxiML platform to authenticate with. It is not recommended to reuse an existing IAM user. Follow the AWS documentation for this step. While creating the user, ensure Provide user access to the AWS Management Console
is unchecked. Do not set any permissions on this user.
Once the user is created, click on the username to view the user details. Copy the user's ARN
as this will be required in subsequent steps. Click the Security credentials tab and navigate to the Access keys section. Click Create access key
. Click Other
under Access key best practices & alternatives and click Next
. Add a description if desired and click Create access key
. Obtain the Access Key and Secret Access Key and store them in a secure location. You will need these in a later step.
Next, you must create the policy and role that proxiML will use to create and manage its resources in your account. When designing the policy, ensure you create a very specific policy that allows access only to the specific data or services needed by your proxiML workloads. As an example, the following policy allows the user to download a dataset a specific path of one bucket, and allows the ability to upload the final output to a specific path in another bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt0",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::<name of bucket with data>",
"arn:aws:s3:::<name of bucket with data>/path/to/data/*"
]
},
{
"Sid": "Stmt1",
"Effect": "Allow",
"Action": ["s3:PutObject"],
"Resource": [
"arn:aws:s3:::<name of bucket for model outputs>/path/to/outputs/*"
]
}
]
}
Create a new role that and attach this policy to it. When creating the role, select An AWS Account
as the Trusted Entity Type and leave the Account setting to This account
. Click Next
and search for the policy created in the previous step on the Add permissions
page. Select that policy and click Next
. In the Select Trusted Entities
, replace the default root account in the Principal
/AWS
section with the ARN of the IAM user you created earlier.
If your IAM configuration does not allow you to edit the Trusted Entities policy. Select Custom Trust Policy
instead of An AWS Account
on the first step and use a policy like the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Statement1",
"Effect": "Allow",
"Principal": {
"AWS": "<full ARN of user created previously>"
},
"Action": "sts:AssumeRole"
}
]
}
Once the role is created, click on its name to view the details. Copy the role ARN, as this will be required in the next step.
Once the role and user credentials are created, go back to the proxiML third-party credential configuration page, and select AWS
from the Add
menu under Third-Party Keys. Input the Access Key ID, Secret Key, and Role ARN into to the relevant fields and click the check button.
Azure
Never provide proxiML (or anyone for that matter) credentials for user with admin privileges.
Create a new service principal for proxiML in the Azure account that contains the data or services you want the proxiML platform to interact with. In the app registration, select Single Tenant
as the Supported Account Type and leave the Redirect URI unconfigured.
On the App Registrations overview page for the newly created app, locate and note the Application (client) ID
and Directory (tenant) ID
fields for later. To finish generating the credentials, create an application secret for the application by following the instructions here.
Once the service principal credentials have been obtained, grant access to the principal by attaching one or more roles to it in the Access Control (IAM)
section of the Azure Subscription. Instructions can be found here. To allow proxiML access to Azure Blob Storage, you need to assign the Storage Blob Data Contributor role. To utilize the Azure Container Registry for private job images, you need to assign the AcrPull role.
proxiML recommmends that you add custom conditions to the role assignments to further restrict access to data within your account. For example to restrict access to a specific storage account container and write access to a specific path within that container, add the following condition to the role assignment:
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
OR
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs:path] StringStartsWith 'path/to/output/data'
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
Once the service principal configured, go back to the proxiML third-party credential configuration page, and select Azure
from the Add
menu under Third-Party Keys. Copy the Client ID, Tenant ID, and Secret from the previous steps to the relevant fields and click the check button.
Docker
Docker credentials are used when pulling custom docker environments from private DockerHub repositories.
Do not use your regular DockerHub account credentials. Instead, generate an Access Token for proxiML and provide that so it can be easily scoped and revoked.
Create a new Docker Access Token from the account that has access to the private repository you wish to use by following the instructions here. If you have a Pro or Team plan, proxiML recommends that you create the token with Read Only
access permissions.
Once the access token is created, go back to the proxiML third-party credential configuration page, and select Docker
from the Add
menu under Third-Party Keys. Enter your DockerHub username as the Key ID
and the generated access token as the Key Secret
and click the check button.
GCP
Never provide proxiML (or anyone for that matter) credentials for an Google service account with admin privileges.
Create a new service account in the GCP project that contains the data or services you want the proxiML platform to interact with. When creating the account, ensure you configure permissions very narrowly and allow access only to the specific data or services needed for the model training process. For example, if you wanted to download the data from the /data
path of the input-data-bucket
bucket. You should assign the Storage Object Viewer
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/input-data-bucket/objects/data/
. If you want to upload data to the /results
path of the artifacts-bucket
, You should assign the Storage Object Viewer
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/artifacts-bucket
, as well as assign the Storage Object Creator
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/artifacts-bucket/objects/results/
. The reason full read access is required for the output bucket is because gsutil requires bucket level read access in order to copy objects. For more details about condition and resource names on buckets and objects, review the GCP documentation.
Once the service account is created, create and download the service account key JSON file. Go to the proxiML third-party credential configuration page and select GCP
from the Add
menu under Third-Party Keys. Click the Upload Json File
button, select the file you JSON file you downloaded, and click the check button.
In order to use the GCP credentials to access services from a worker, you must first activate them in the job environment by including the following command in your script prior to accessing GCP services:
gcloud auth activate-service-account --key-file ${GOOGLE_APPLICATION_CREDENTIALS}
Alternatively, if you're using the Python SDK directly, you can activate the service account credentials using the from_service_account_json function and specify the location of the key file using the environment variable GOOGLE_APPLICATION_CREDENTIALS
.
Git
If you want to run jobs using private git repositories, you must create an SSH key for the proxiML platform to use when connecting to your repository. Click the Generate
button to create a new key. Once the key is created, copy the entire public key starting from ssh-ed25519
and up to and including [email protected]
and attach this key user in your private repository. For example, the instructions for adding a SSH key to your Github account can be found here.
Hugging Face
To integrate with Hugging Face, first create a User Access Token with these instructions. If you plan to only download data, create a read
token. If you plan to upload results back to huggingface, create a write
token. Once you have the token, go back to the proxiML third-party credential configuration page, and select Hugging Face
from the Add
menu under Third-Party Keys. Enter the your Hugging Face account name as the Key ID
and the generated token as the Key Secret
and click the check button.
Kaggle
To enable Kaggle integration in the proxiML platform, you must first generate a Kaggle API token. Instructions to generate a new token can be found here. If you are already using the Kaggle CLI tool on your local computer, the API token is usually located at $HOME/.kaggle/kaggle.json
.
Once you have the kaggle.json
file for your account, Go to the proxiML third-party credential configuration page and select Kaggle
from the Add
menu under Third-Party Keys. Click the Upload Json File
button, select the file you JSON file you downloaded, and click the check button. If the file is successfully uploaded, you should see Credentials File: kaggle.json
next to the trophy icon.
NVIDIA NGC
NVIDIA NGC credentials are used for pulling NVIDIA maintained or private registry images from the NVIDIA NGC.
Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the proxiML third-party credential configuration page, and select NVIDIA NGC
from the Add
menu under Third-Party Keys. Enter the API key in the NGC API Key
field and click the check button.
Wasabi
A recommended but not required first step is to create a policy that will restrict access for the proxiML integration to just the buckets and bucket paths required. Create a new policy using the Wasabi Documention. An example policy the restricts both reading and writing to a specific bucket path is the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<name of bucket with data>"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/data/*"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/outputs/*"
}
]
}
Once you have built the policy for the integration, create a new user for the proxiML platform to use. Follow the Wasabi documentation for this step. Create the user with Programmatic
access only, and select the policy you created on step 3. Once the user is created, it will prompt you to download the access keys.
Go to the proxiML third-party credential configuration page and select Wasabi
from the Add
menu under Third-Party Keys. Input the Access Key ID and the Secret Key you just downloaded to the relevant fields and click the check button.