SCIM (System for Cross-domain Identity Management)

Posted in Technical |
September 19, 2022 8 min read

The identity team at Cloudera has been working to add the System for Cross-domain Identity Management (SCIM) support to Cloudera Data Platform (CDP) and we’re happy to announce the general availability of SCIM on Azure Active Directory! In Part One we discussed: CDP SCIM Support for Active Directory, which discusses the core elements of CDP’s SCIM support for Azure AD.

SCIM (System for Cross-domain Identity Management): an Introduction

SCIM (System for Cross-domain Identity Management) is a protocol spec for managing identities (users and groups) on the web. The SCIM protocol spec defines a series of end points, payloads, and responses that web products can implement in order to exchange identity information. “Managing identities” simply encompasses the ability to manage the full life cycle of that identity, which, again, is either a person or a group. The life cycle of an identity includes the following stages:

Create: when the identity is new to the system and needs to be entered into an identity database (such as when a new employee is onboarded),
Read: when an authorized application wants to know more about the identity (such as when a query is run),
Update/Modify: when an attribute of the identity (such as email address) has changed and needs to be updated, and
Delete: when an identity needs to be deleted (such as when an employee is terminated).

The SCIM standard allows an identity provider to create, retrieve/discover, update, and delete user and group state in web applications through the use of REST API calls. Hence, SCIM replaces a lot of manual effort around managing identities.

The power of SCIM is best illustrated with an example:

Acme Inc. is a company and Alice manages their identity provider. Back in the day, when Acme was a startup with a couple of employees and used only a few web products, Alice would manually do all user management in both the identity provider and all of their web products. When someone joined Acme, Alice would manually create their account in the identity provider. She would then send them invite links to create an account/password in all of the various web applications Acme used. This was a manual process and Acme had very little control over user permissions in those applications.

As Acme grew, the organization required more granular control over the permissions their employees had in the web applications they were using—they had outgrown the “just give everyone root” phase of their company’s growth. So Alice did what most companies do and moved account management to a single sign on (SSO) provider. This meant that for all SSO supported applications Acme employees no longer needed to remember their application-specific usernames and passwords. Instead, they could just log in to their SSO provider and click the “Login with SSO” button. Under the hood this also simplified Alice’s life: every time someone clicks the “Login with SSO” button an updated user state (user and group information) is sent to that application. This means that if an Acme employee moves organizations and needs a new set of groups, all they need to do is log in again with SSO and everything will be updated.

SSO fixed a lot of manual work for Alice, but it didn’t cover all situations. To name a few:

When new employees joined Acme they had to manually log in via SSO to create their accounts in each web application.
Each web application had different session time-outs, so Acme employees needed to learn that they had to log in again in order to get their updates into the application. This also meant that if someone was given temporary admin access in an application they would continue to have that admin access until either Alice manually revoked it, or they logged in again and their permissions were updated.
Similarly, when an employee was fired they would still have access to their accounts in the web applications until either Alice manually removed them, or their session expired.

To work around these drawbacks Alice wrote custom code to update users and groups for each product and hooked it into Acme’s identity provider webhooks. But the code was fragile; always out of date and under constant maintenance as APIs changed and new web products were added. Internal SLAs for managing user/group state—especially for terminated employees—would constantly interrupt her work. In other words, Alice was spending a significant amount of time to keep the custom code working correctly.

Through the use of SCIM (and an identity provider that supports SCIM), all these headaches go away or at least are greatly reduced for Alice. All she needs to do is to set up SCIM for each of Acme’s web products that support it, and she doesn’t need to worry about user/group state in those applications any more. She still needs to manually manage user/group state in web products that do not support SCIM (which is why there’s still a bit of a headache), but overall this is still a huge net positive for her.

Under the hood, Acme’s identity provider will follow the SCIM spec, sending payloads to each web application whenever there’s a user/group change. Someone gets added to a new group in the identity provider? The identity provider kicks off a series “add user X to group Y,” SCIM calls to all of the web applications, and the user is updated without needing to re-log in. Someone gets fired? The identity provider kicks off “delete user X,” SCIM calls to those applications. With just a couple minutes of configuration Alice reduced her work to near zero for all applications that supported SCIM.

SCIM, however, is not a silver bullet. The biggest limitation is that many web applications do not support it. For web applications that do support it, SCIM is extremely useful.

How SCIM works under the hood

This section is a little technical, and walks the reader through:

SCIM from the point of view of the identity provider.
SCIM from the point of view of the web product.
A few limitations.

The identity provider

A company’s identity provider is the source of truth for users and groups. For this context it’s also important to note that not all identity providers support SCIM, so keep that in mind if you want to use SCIM with Cloudera Data Platform (two common identity providers that support SCIM are Azure AD and Okta).

The core of the SCIM protocol spec is divided into two parts: user create, read, update, and delete (CRUD) operations and group CRUD operations. For the most part it’s what you would expect from a RESTful spec: there’s a series of end points and payloads that an identity provider can send to the web product, and a series of responses to those requests that let an identity provider know if they were successful or not. When a web product responds with an error to a SCIM call, the identity provider has two options: retry (with some back off strategy), and alert (email) a human who can try to fix it. Because of this it’s important that web products respond to errors with a human actionable message.

SCIM user CRUD operations:

Create users (POST)
Retrieve users (GET)
Retrieve a specific user (GET)
Update a user (PUT/PATCH)
Delete a user (DELETE)

SCIM group CRUD operations:

Create groups (POST)
Retrieve groups (GET)
Retrieve specific groups (GET)
Update a specific group name (PUT/PATCH)
Update specific group membership (PUT/PATCH)
Delete a group (DELETE)

SCIM also defines a couple of batch-style actions beyond the basic CRUD operations (like “remove all users from a group,” and “replace all users in a group”), along with different query parameters that can be sent to narrow down results.There’s also a couple of extra endpoints that most identity providers (and most web products) choose to not implement (/Me, /Schemas, /ServiceProviderConfig, /ResourceTypes).

There are a lot of nuances with user data and how to slice it. For example, one is which fields should be sent to the web product (for example, CDP needs an email, but doesn’t need a street address). The fields sent also determine which query parameters the identity provider can use to try to narrow down search results. Query parameters themselves are also nuanced as not all web products support narrowing results by those verticals. For example, a web product may store a last modified time, but it may not support filtering users by it.

An identity provider that supports SCIM has to maintain individual state for each SCIM–connected web product, in addition to maintaining the source of truth for all users and groups for the organization. The individual state for each SCIM–connected web product is important and complex: say Acme uses three products, A, B, and C. If product C has a outage, the identity provider needs to be able to keep track of what it believes the source of truth in C is, and sync up C when it comes back online, no matter how long the outage and how many user/group changes have happened. Or, if B doesn’t support the full SCIM spec, the identity provider needs to do back-off retries for the operations that are erroring (in case B decides to add support for that part of the spec in the future) while still syncing all other user/group changes in the meantime. The identity provider also needs to handle user/group changes in the web product that did not originate in the identity provider (i.e., when someone updates user/group information only in the web product). These are just a couple of examples, but it gives you an idea of the complexity of the identity provider’s state machine.

The web product

A web product (like CDP) has to have

A mechanism to authenticate/authorize the SCIM calls.
The SCIM endpoints.
Internal user/group CRUD operations that are SCIM-compatible.

The authentication mechanism is typically some type of access token or access token secret that is generated by the web product and given to the identity provider during a setup phase. These are usually long lived, revocable, and contain enough information to perform authorization. Some web products double dip SCIM authentication/authorization with user access tokens, but that has the downside where the token will stop working if the user is deleted (i.e., the user leaves the company), and the double downside that sometimes that user is managed by SCIM, so a SCIM update could delete the user, which deletes their token, which breaks SCIM syncs until a new trust is set up. For CDP, we implemented authentication/authorization as access tokens that:

Have a custom lifetime.
Are revocable.
Do not belong to the user who creates them (so they live outside the life cycle of any single user in the system).
Are scoped to SCIM endpoints.

The web provider’s SCIM endpoints need to be able to parse the payloads that the identity provider sends, and then map them to internal operations. There is, however, likely not a 1:1 mapping between SCIM endpoints and internal endpoints, so they will need to be converted from SCIM spec to internal APIs. For example, SCIM defines an operation to “replace all users in a group.” This may need to be transformed by the web product to a series of internal API calls like:

List all users in a group.
Remove all those users from the group.
Add all the new users to the group.
Get group info and return it in the response.

And sometimes the SCIM spec defines things that are not possible in the web product. A common example is that most web products view group names to be immutable, yet the SCIM spec defines a payload that should update a group name. In this case the only thing a web product can do is return a human actionable error and hope that the identity provider will notify a human that things are now out of sync.

A few limitations

A notable user experience of the SCIM spec is the lack of bi-directional syncing of user/group data. That is to say that the source of truth is always in the identity provider, and all web products are “downstream.” So for whichever web products you start using SCIM with, you should stop managing user information in those products because you’ll get out of sync with the source of truth in your identity provider.

Identity providers typically don’t sync changes to web applications in real time, they operate in “sync cycles.” This means that user/group changes may take a little bit of time to propagate (typically this can take up to an hour). So if your internal SLAs are less than the time between sync cycles, SCIM may not work for you. Or, if your SLA is for specific scenarios (for example, terminated employees) you may be able to use SCIM for everything else, and just have a small amount of code to cover those specific scenarios.

A few final thoughts

I hope this has been a helpful overview of SCIM. If you want to read more, the jump off point is: http://www.simplecloud.info/.

If your organization uses Azure AD and you’d like to use SCIM with Cloudera Data Platform then head to our docs to get started.

If your organization uses Okta and you’d like to start using SCIM with CDP then contact your Cloudera rep to get added to the waitlist—Okta support is coming soon.