Source From / Okta Official Blog
Identity is foundational to modern security strategy. Identity-based attacks are on the rise, and most data breaches are caused by stolen credentials. With more than 18,000 customers and an Identity platform that performs billions of authentications every month, Okta is at the frontline of most of these Identity-based attacks.
In the past month alone, Okta blocked around 2.38 billion malicious requests (more than 20% of all login attempts to the Okta workforce Identity Cloud) with a pipeline that separates malicious traffic from legitimate traffic in real time.
In this blog, we’ll describe the details of the various layers in this pipeline with particular focus on the AI-based components. We’ll also describe how customers can use these features to protect their users from various types of Identity-based attacks.
Defense in depth
We need a multi-layered defense in-depth strategy to defend against Identity-based attacks for the following reasons.
Many Identity-based attacks are very sophisticated and we wouldn’t have the necessary information in a single layer of defense to separate malicious and legitimate requests. We learn more about the legitimacy of the request as it progresses through the stack.
Automated pipelines to detect attacks will always have false positives (legitimate requests flagged as malicious) and false negatives (malicious requests flagged as legitimate). Organizations vary in their tolerance levels to false positives and false negatives. We need layers in the pipeline that organizations can self-service configure to optimize for lower false positives or lower false negatives.
There are many types of Identity-based attacks — password spray, brute force, phishing, session hijacking, etc. A single detection and enforcement strategy doesn't work for all attacks. We need layers in the pipeline that deal with specific types of attacks.
There are two main aspects to the detections at every layer in this stack.
Quality: The quality of detections can be determined by combining the rate of false positives and false negatives. While we try to optimize for both, we prioritize one based on the criteria described below, which differ from layer to layer.
Remediation options: If blocking is the only remediation option (vs. multi-factor authentication (MFA)), reducing false positives is critical.
Configuration for exemptions: If administrators can't self-service disable detections, reducing false positives is critical.
Latency: It’s critical to run all the detections without impacting the response times for legitimate users.
Let’s look at the details of all the layers in this pipeline. The layers are described in the order in which they are invoked in the path of a login request. Note that the order of these layers and the functionality supported in each layer may change in future.
Okta edge
This is the entry point of a request and the first line of defense. There are many controls in this layer to detect and protect against DDoS attacks. To help protect against large-scale credential-based attacks, we built a pipeline to identify malicious IPs that are blocked for all tenants at the edge. Okta customers can’t configure any part of the functionality supported at the Okta edge.
Blocklist zones
Customers can block specific combinations of IPs, locations, IP service categories, and autonomous system numbers (ASNs) by creating blocklist zones. Requests matching these zones are blocked from accessing any Okta Workforce Identity Cloud endpoint.
Many credential-based attacks involve requests routed through anonymizing proxy services. We recommend customers turn on the recently released self-service Early Access feature that creates a default zone to block all anonymizing proxies. If customers want to add exemptions for certain IP service categories, they can use another recently released self-service Early Access feature that introduces support for enhanced dynamic network zones.
Over the past month, Okta blocked around 318 million malicious web requests based on zone configurations. Okta relies on multiple external vendors to resolve the location, IP service categories, and ASN associated with an IP.
We continue to scale zones by focusing on the following two aspects.
Quality: To reduce false positives and negatives resulting from stale data feeds, we built data pipelines to refresh external data feeds within 24 hours of their availability.
Latency: We resolve IP metadata for every web request. To deal with this scale, we continuously make improvements to maintain a very low latency (p95 of less than 50 milliseconds) in resolving metadata from all the external providers.
ThreatInsight
ThreatInsight is Okta’s native AI-driven feature to detect and protect against large credential-based attacks. It has two components — detection pipeline and enforcement pipeline.
Detection Pipeline
We built streaming and batch data pipelines to detect malicious IPs involved in large-scale Identity-based attacks like password spray, credential stuffing, and brute force. We detect both cross-tenant and tenant-specific malicious IPs.
The detection pipeline also includes heuristics and ML models that flag tenants under large credential-based attacks. These models detect anomalies in login failures at the tenant level within minutes after an attack starts. Based on the output from these models, we notify customers through SystemLog and automatically flag malicious IPs more aggressively for the tenant under attack. These models help Okta to notify hundreds of tenants under attack every month.
Enforcement pipeline
We built a low-latency enforcement pipeline that performs these two actions (in this order):
Block or log requests from malicious IPs. Customers can choose to configure ThreatInsight in log or block mode.
Flag suspicious requests based on many attributes of the request. We run multiple heuristics and ML models to flag suspicious requests based on attributes such as the IP, location, user-agent of the request. The models output a score and a threat level that is used to determine if the request should be counted towards the tenant’s rate limit counters or towards rate limit counters isolated for suspicious requests.
Running ThreatInsight checks before rate limiting and flagging of suspicious requests helps reduce the chances of legitimate users running into rate limit violations because of large credential-based attacks.
Over the past month, ThreatInsight blocked around 2.08 billion requests and 3.4 million IPs associated with over 150 endpoints for more than 10k tenants. During some high-volume attacks, ThreatInsight has blocked >200k IPs every hour at a rate of >100k requests every minute. More than 31k IPs were seen in attacks involving multiple Okta tenants.
We recommend turning ThreatInsight to block mode to protect your tenant against large-scale credential-based attacks.
We continue to scale ThreatInsight by focusing on the following two aspects:
Quality: To reduce false positives and false negatives, we continuously improve our data pipelines and detections so that we identify malicious IPs within seconds of suspicious activity and make those IPs available for enforcement within a few minutes after detection.
Latency: We run ThreatInsight checks for every single web request that hits Okta Workforce Identity Cloud and we do that even before rate-limiting checks kick in. To do this at this scale, we made many enhancements to the way we cache and store the features for ML model scoring to ensure that the p95 latency of ThreatInsight evaluations is below 50 milliseconds.
RateLimiting
Okta platform enforces rate limits at a tenant level, which is a combination of Okta managed rate limits and customer configurable rate limits. This layer doesn’t directly deal with detecting and blocking Identity-based attacks. One component of this layer that is relevant for Identity-based attacks is the support for maintaining different rate limiting counters for legitimate and malicious requests.
Policy evaluation
We have the complete context (user, device, etc.) of the request at the policy evaluation layer.
Zones
Customers can configure IP, dynamic and enhanced dynamic network zones based on various combinations of IP, Geolocation, IP service categories and ASNs and use these zones in various types of policies (ex: Global session policy, Authentication policy, etc.). When zones are not blocklisted and configured in policies, they’re not enforced for every single web request. They’re only enforced for the requests that are scoped to that policy.
Behavior detection
Behavior detection analyzes patterns in user behavior to detect anomalous user activities. Okta supports multiple types of anomalous behavior detections (ex: New IP, New Country, New Geo-Location, New Device, Impossible Velocity) scoped to the user. Customers can define what is risky for their tenant by combining these behaviors in global session and authentication policies. For some tenants, any login from a country the user never logged in from in the last 10 attempts may be suspicious. For other tenants, any login from a device and an IP the user never logged in to from in the last 100 attempts may be suspicious. Behavior detection provides a rules engine to enable customers to customize these anomalies so that they can optimize for false positives or false negatives.
To support behavior detection, we built a data pipeline to build user profiles based on historical activity of the users.
Risk scoring
Risk scoring combines the signals from multiple layers of the pipeline to determine the risk level associated with a login attempt. Risk scoring removes the complexity of configuring behaviors, zones, and other conditions. Customers can simply configure the risk level in policies and set up actions such as MFA. Okta determines what is risky by combining various contexts (location, device, threat, behaviors, etc.).
Risk engine aggregates the risk across the following contexts to determine the risk level of a web request:
ThreatInsight Evaluation — How bad is this request based on ThreatInsight evaluation?
IP Metadata — How bad is this IP based on the metadata from external providers?
IP — How bad is this IP for this specific user, based on the user’s historical patterns?
Tenant — Is the tenant currently under a large credential-based attack?
Geolocation — How bad is this Geolocation for this specific user, based on the user’s historical patterns?
Device — How bad is this device for this specific user, based on the user’s historical patterns?
We use machine learning in this layer to identify the relative weights of the various features across various contexts associated with a request. The models are trained using the users’ MFA access patterns (success, failure, and abandonment of MFA associated with various behavioral signals of the users).
We are launching a new version of the risk model as part of the newly launched Okta Identity Threat Protection. Customers who buy the Okta Identity Threat Protection SKU benefit from continuous risk evaluations — we run risk evaluations at login and also continuously to determine the risk associated with sessions. Continuous risk evaluation relies on a more sophisticated model using more features (i.e.: Device signals from OktaVerify, Anomalous ASN, Anomalous UserAgent, etc.). In addition, we introduced the concept of user risk as part of this product. User risk captures the stateful risk associated with a user identity and aggregates the risk across sessions, devices, and all the signals we get from multiple third-party security providers.
Over the past month, Okta evaluated more than 3 billion login requests for risk. We recommend configuring strong authentication in authentication policies for high-risk login attempts.
We continue to scale RiskEngine by focusing on the following two aspects:
Quality: To reduce false positives and false negatives, we continuously improve models used to detect risk across various contexts and how we aggregate the overall risk. We also improve accuracy by scaling our data pipelines to use the latest activity associated with the users. Within a second after a user performs certain actions in the system, the user’s profile is updated with this information.
Latency: We run RiskEngine checks with a p95 latency of below 50 milliseconds for every login attempt that hits Okta Workforce Identity Cloud for Adaptive MFA customers.
Key takeaways
Okta relies on a multi-layered defense in-depth strategy to detect and protect various types of Identity-based attacks.
Blocklist zones is a self-service feature that complements ThreatInsight, an AI-driven feature. These features block billions of large-scale credential-based attacks every month. We strongly recommend customers turn on the default dynamic network zone to block anonymizing proxies and turn on ThreatInsight in block mode.
As the request progresses through the Okta stack, we generate more context. Features like behavior detection and risk scoring use this context to detect more sophisticated account takeover attacks.
Risk scoring is an AI-driven feature that considers various contexts to aggregate risk. Customers can use behavior detection to customize and define risk for their tenants. We strongly recommend customers configure risk-based policies to prompt MFA for high-risk logins.
Okta Identity Threat Protection is a newly released product that takes the capabilities of risk engine to the next level. It provides continuous evaluation of session and user risk, leveraging more advanced signals and ML models.
Have questions about this blog post? Reach out to us at eng_blogs@okta.com.
Explore more insightful Engineering Blogs from Okta to expand your knowledge.
Ready to join our passionate team of exceptional engineers? Visit our career page.
Unlock the potential of modern and sophisticated identity management for your organization. Contact Sales for more information.
ความคิดเห็น