Skip to content

Conversation

@kkthxbye-code
Copy link
Contributor

@kkthxbye-code kkthxbye-code commented Oct 28, 2021

Fixes: #7399

This should fix the excessive CPU usage when enabling AUTH_LDAP_FIND_GROUP_PERMS and having a combination of many object permissions and many users in the same group.

First, this is not an optimal fix, feel free to close the PR if a better fix is suggested. The reason for the bad performance is the filter added in 8230099:

class NBLDAPBackend(ObjectPermissionMixin, LDAPBackend_):
def get_permission_filter(self, user_obj):
permission_filter = Q(users=user_obj) | Q(groups__user=user_obj)
if self.settings.FIND_GROUP_PERMS:
permission_filter = permission_filter | Q(groups__name__in=user_obj.ldap_user.group_names)
return permission_filter

When added to the original query:

def get_permission_filter(self, user_obj):
return Q(users=user_obj) | Q(groups__user=user_obj)
def get_object_permissions(self, user_obj):
"""
Return all permissions granted to the user by an ObjectPermission.
"""
# Retrieve all assigned and enabled ObjectPermissions
object_permissions = ObjectPermission.objects.filter(
self.get_permission_filter(user_obj),
enabled=True
).prefetch_related('object_types')

The resulting SQL is essentially:

SELECT "users_objectpermission".*
FROM "users_objectpermission"
LEFT OUTER JOIN "users_objectpermission_users" ON ("users_objectpermission"."id" = "users_objectpermission_users"."objectpermission_id")
LEFT OUTER JOIN "users_objectpermission_groups" ON ("users_objectpermission"."id" = "users_objectpermission_groups"."objectpermission_id")
LEFT OUTER JOIN "auth_group" ON ("users_objectpermission_groups"."group_id" = "auth_group"."id")
LEFT OUTER JOIN "auth_user_groups" ON ("auth_group"."id" = "auth_user_groups"."group_id")
WHERE (("users_objectpermission_users"."user_id" = 1
        OR "auth_user_groups"."user_id" = 1
        OR "auth_group"."name" IN (
                                   'netbox-user',
                                   'group1',
                                   'group2'
                                   ))
       AND "users_objectpermission"."enabled")
ORDER BY "users_objectpermission"."name" ASC;

The outer join for auth_user_group essentially duplicates each users_objectpermission row once for each other user that is allocated to one of the groups not filtered out in the auth_group join.

In my test instance, with 327 objectpermissions in the netbox-user group, 30 other users has that same group, which makes the query return 30 duplicates per objectpermission totalling 9810 permissionobjects. The fix caused a load of the rack page with 50 objects per page to decrease from 1400 ms to 600 ms.

I'm not sure how to fix it in a smart way, as I have a hard time seing how the relationship is supposed to work. Even the original query is iffy I think, as I'm pretty sure it can return duplicate permissions as well if user bound objectpermissions are used.

The fix just makes the postgres remove the duplicates, which fixes the performance regression when using AUTH_LDAP_FIND_GROUP_PERMS. The query time increase should be negligible even in extreme cases.

Still someone should probably rework the queries.

Note: While the fix is pretty straightforward and I can't see how it could mess anything up, I would appreciate if someone else could test it before deciding to merge.

@kkthxbye-code
Copy link
Contributor Author

@tobiasge - Maybe you could have a look, as you are the one that worked on the LDAP part mentioned.

@kkthxbye-code kkthxbye-code changed the title Fix #7399 - LDAP excessive CPU usage when AUTH_LDAP_FIND_GROUP_PERMS is enabled Fix #7399: LDAP excessive CPU usage when AUTH_LDAP_FIND_GROUP_PERMS is enabled Oct 29, 2021
@jeremystretch
Copy link
Member

Nice work, @kkthxbye-code! Tagging @heroin-moose @buzzingbren and @lastorel as well to see if thy can help test per their comments on #7399.

If we can confirm that the fix works for others (e.g. that these aren't actually separate bugs), I don't see an issue with merging this. I may take a stab at optimizing the queries further when I have a chance.

@buzzingbren
Copy link

Hi @kkthxbye-code and @jeremystretch, thanks for linking to this.
Unfortunately In my case (using https://github.com/bb-Ricardo/netbox-sync to fill our Netbox) this fix does not resolve my issue. But I'm also not using AUTH_LDAP_FIND_GROUP_PERMS setting. But since I only have issues when the Nebox-Sync is using the API to access Netbox, a feature in Netbox to set a user as "local account only" or "LDAP-User" and forgoing the LDAP check if the API-token is linked to a local user would help in my case. Bypassing the LDAP problem altogether (for me then, but probably there are other use cases as well).
Just for authentication purposes for the web-interface, I had no problems there with the LDAP authentication.

@tobiasge
Copy link
Member

My first thought was "Why oder_by and disctinct? But after reading about the additional capabilities of Postgres in disctinct I think this is a really good solution. Nice 👍

@kkthxbye-code
Copy link
Contributor Author

@buzzingbren - Yeah, the API performance issue when using LDAP is not related to this fix. The closest issue might be #6926 - but a new issue with your specific case might be in order. I haven't noticed any performance issue with the API, but I'm not really hammering it. If you create an issue or add a comment to #6926 with more details about what your setup is and what you are doing when experiencing slowdown, I'll be happy to try to debug it.

This issue fixed in this PR will manifest if the following is true:

  • AUTH_LDAP_FIND_GROUP_PERMS enabled
  • The user is an LDAP user
  • More than one user is in the same group as you
  • The group has a fair amount of ObjectPermissions

If this is true, the number of ObjectPermissions returned by the query will be equal to ObjectPermissions * Other users in the same group as you - so if you have 200 object permissions and 100 other netbox users are in the same group as you, the query will return 20000 ObjectPermissions, causing the CPU usage when it's iterating them here:

# Create a dictionary mapping permissions to their constraints
perms = defaultdict(list)
for obj_perm in object_permissions:
for object_type in obj_perm.object_types.all():
for action in obj_perm.actions:
perm_name = f"{object_type.app_label}.{action}_{object_type.model}"
perms[perm_name].extend(obj_perm.list_constraints())

@jeremystretch
Copy link
Member

Sorry this fell off the radar for a bit. Seems like there's been a bit of overlap between #7399 and one or more other LDAP issues, bit I'm happy to merge this with the understanding that it addresses the specific issue raised in #7399.

@jeremystretch jeremystretch merged commit 8299845 into netbox-community:develop Nov 18, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gunicorn spends 80% of each request in get_object_permissions()

4 participants