Some months ago I implemented a new Neutron’s quota driver. The aim of this new driver, as described in the Launchpad bug 1926787 and Bugzilla bug 1955661, was to avoid the permanent database lock contention status generated when using the Neutron’s quota engine. Why that was happening? Follow me…
What do we need to count?
In Neutron, the resources are limited in number based on the resource type and project. The tuple (resource_type, project), can be limited (or not) depending on the project administrator needs. Neutron allows to define limits for networks, subnets, ports, routers, floating IPs security groups, security group rules and RBAC policies. The following command will show the current project limits:
$ openstack quota show <project>;
Currently, three projects can define quotas per resource and project: Nova, Cinder and Neutron.
Quota service “on duty”.
You can take a look at the QuotaDriverAPI to check what kind of operations is capable of. In this section I would like to focus on two of them: the update_quota_limit and the make_reservation.
The first one is called when the administrator changes the quota limit. That means this API method is accessible from the CLI. The administrator can rise or lower the current limit, or remove it by setting a -1 in the resource quota limit.
The second one is called during the process of a resource creation. The pecan WGSI has a set of hooks that are called in a HTTP request. One of those hooks is the QuotaEnforcementHook, that implements two of the PecanHook methods, before and after.
The before method is executed (surprise!) before the HTTP request processing, that means before the Neutron database is queried. If we are requesting a new resource, the quota engine will sanction this request by checking the current resource limits. In this method, the engine creates a Reservation, a register in the database that represents a temporary resource assignation. The quota engine, when checking the instant resource count, will also add the current reservations made. Once the resource has been created or the request has been rejected, the Reservation register is deleted.
DBQuotaDriver, the first Neutron quota driver implementation.
The main difference between the first quota driver and the new one implemented is how the resource count is tracked. The original quota driver implemented in Neutron uses a database table to track the resource usage. This table is QuotaUsage. Each register was specifically for one single (resource_type, project) tuple and had a column in_use to track the number of existing resources in the system.
When the driver is going to make a reservation, it first retrieves the resource count, calling get_resource_usage. For tracked resources, that will call TrackedResource.count_used. The first operation this method does is to lock the resource QuotaUsage register assigned to each resource by updating the dirty bit the register has. With this writing operation the database locks this register to this transaction and guarantees the quota usage is not modified by any other operation.
During this transaction, if requested, the resource usage is updated; that means the QuotaUsage.in_use column is updated.
This implementation relies on a single register to track the quota usage and uses the database locking system to ensure the isolation of the transactions. Unfortunately that causes the database deadlock detected in some customers. In highly loaded environments, this QuotaUsage register can cause a non recoverable error in the database that could lead to a Neutron server degradation. A workaround to temporarily solve this issue was to remove the quota limits (set them to -1) and manually remove the database locks.
DbQuotaNoLockDriver, because the name matters.
Exactly, the name matters. The goal of this new driver is the same as the old one: handle the quota limits and check the resource usage. But in this case this driver simplifies how the resource usage is tracked. We no longer have a database table to track the resources. Instead of this we count them directly from the database, that’s all. We use the NeutronDbObject.count method. If the resources are represented in the database in a table, why not to filter them by project and count them? Because the count method was inefficient.
The count method was retrieving the full Neutron OVO. That implies other database table registers associated with the main one. For example, we can see how the “port” retrieval required queries not only to port table but to standardattributes, securitygroupportbindings, portdataplanestatuses, portnumaaffinitypolicies, portdnses, qos_network_policy_bindings, ml2_port_bindings, portsecuritybindings, trunks, subports, portdeviceprofiles, portnumaaffinitypolicies and portuplinkstatuspropagation. That was completely unnecessary and was addressed in this neutron-lib patch and the counterpart in Neutron. Those are two examples of database queries issued by the count method, when called from the method _is_mac_in_use. This is the query without the optimization:
SELECT ports.project_id AS ports_project_id, ports.id AS ports_id, ports.name AS ports_name, ports.network_id AS ports_network_id, ports.mac_address AS ports_mac_address, ports.admin_state_up AS ports_admin_state_up, ports.`status` AS ports_status, ports.device_id AS ports_device_id, ports.device_owner AS ports_device_owner, ports.ip_allocation AS ports_ip_allocation, ports.standard_attr_id AS ports_standard_attr_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_number, standardattributes_1.created_at AS standardattributes_1_created_at, standardattributes_1.updated_at AS standardattributes_1_updated_at, securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id, securitygroupportbindings_1.security_group_id AS securitygroupportbindings_1_security_group_id, portdnses_1.port_id AS portdnses_1_port_id, portdnses_1.current_dns_name AS portdnses_1_current_dns_name, portdnses_1.current_dns_domain AS portdnses_1_current_dns_domain, portdnses_1.previous_dns_name AS portdnses_1_previous_dns_name, portdnses_1.previous_dns_domain AS portdnses_1_previous_dns_domain, portdnses_1.dns_name AS portdnses_1_dns_name, portdnses_1.dns_domain AS portdnses_1_dns_domain, qos_network_policy_bindings_1.policy_id AS qos_network_policy_bindings_1_policy_id, qos_network_policy_bindings_1.network_id AS qos_network_policy_bindings_1_network_id, qos_port_policy_bindings_1.policy_id AS qos_port_policy_bindings_1_policy_id, qos_port_policy_bindings_1.port_id AS qos_port_policy_bindings_1_port_id, ml2_port_bindings_1.port_id AS ml2_port_bindings_1_port_id, ml2_port_bindings_1.host AS ml2_port_bindings_1_host, ml2_port_bindings_1.vnic_type AS ml2_port_bindings_1_vnic_type, ml2_port_bindings_1.profile AS ml2_port_bindings_1_profile, ml2_port_bindings_1.vif_type AS ml2_port_bindings_1_vif_type, ml2_port_bindings_1.vif_details AS ml2_port_bindings_1_vif_details, ml2_port_bindings_1.`status` AS ml2_port_bindings_1_status, portsecuritybindings_1.port_id AS portsecuritybindings_1_port_id, portsecuritybindings_1.port_security_enabled AS portsecuritybindings_1_port_security_enabled, standardattributes_2.id AS standardattributes_2_id, standardattributes_2.resource_type AS standardattributes_2_resource_type, standardattributes_2.description AS standardattributes_2_description, standardattributes_2.revision_number AS standardattributes_2_revision_number, standardattributes_2.created_at AS standardattributes_2_created_at, standardattributes_2.updated_at AS standardattributes_2_updated_at, trunks_1.project_id AS trunks_1_project_id, trunks_1.id AS trunks_1_id, trunks_1.admin_state_up AS trunks_1_admin_state_up, trunks_1.name AS trunks_1_name, trunks_1.port_id AS trunks_1_port_id, trunks_1.`status` AS trunks_1_status, trunks_1.standard_attr_id AS trunks_1_standard_attr_id, subports_1.port_id AS subports_1_port_id, subports_1.trunk_id AS subports_1_trunk_id, subports_1.segmentation_type AS subports_1_segmentation_type, subports_1.segmentation_id AS subports_1_segmentation_id, portdeviceprofiles_1.port_id AS portdeviceprofiles_1_port_id, portdeviceprofiles_1.device_profile AS portdeviceprofiles_1_device_profile, portnumaaffinitypolicies_1.port_id AS portnumaaffinitypolicies_1_port_id, portnumaaffinitypolicies_1.numa_affinity_policy AS portnumaaffinitypolicies_1_numa_affinity_policy, portdataplanestatuses_1.port_id AS portdataplanestatuses_1_port_id, portdataplanestatuses_1.data_plane_status AS portdataplanestatuses_1_data_plane_status, portuplinkstatuspropagation_1.port_id AS portuplinkstatuspropagation_1_port_id, portuplinkstatuspropagation_1.propagate_uplink_status AS portuplinkstatuspropagation_1_propagate_uplink_status FROM ports LEFT OUTER JOIN standardattributes AS standardattributes_1 ON standardattributes_1.id = ports.standard_attr_id LEFT OUTER JOIN securitygroupportbindings AS securitygroupportbindings_1 ON ports.id = securitygroupportbindings_1.port_id LEFT OUTER JOIN portdnses AS portdnses_1 ON ports.id = portdnses_1.port_id LEFT OUTER JOIN qos_network_policy_bindings AS qos_network_policy_bindings_1 ON qos_network_policy_bindings_1.network_id = ports.network_id LEFT OUTER JOIN qos_port_policy_bindings AS qos_port_policy_bindings_1 ON ports.id = qos_port_policy_bindings_1.port_id LEFT OUTER JOIN ml2_port_bindings AS ml2_port_bindings_1 ON ports.id = ml2_port_bindings_1.port_id LEFT OUTER JOIN portsecuritybindings AS portsecuritybindings_1 ON ports.id = portsecuritybindings_1.port_id LEFT OUTER JOIN trunks AS trunks_1 ON ports.id = trunks_1.port_id LEFT OUTER JOIN standardattributes AS standardattributes_2 ON standardattributes_2.id = trunks_1.standard_attr_id LEFT OUTER JOIN subports AS subports_1 ON ports.id = subports_1.port_id LEFT OUTER JOIN portdeviceprofiles AS portdeviceprofiles_1 ON ports.id = portdeviceprofiles_1.port_id LEFT OUTER JOIN portnumaaffinitypolicies AS portnumaaffinitypolicies_1 ON ports.id = portnumaaffinitypolicies_1.port_id LEFT OUTER JOIN portdataplanestatuses AS portdataplanestatuses_1 ON ports.id = portdataplanestatuses_1.port_id LEFT OUTER JOIN portuplinkstatuspropagation AS portuplinkstatuspropagation_1 ON ports.id = portuplinkstatuspropagation_1.port_id WHERE ports.network_id IN (%(network_id_1)s) AND ports.mac_address IN (%(mac_address_1)s)
And this is the query optimized:
SELECT ports.id AS ports_id FROM ports WHERE ports.network_id IN (%(network_id_1)s) AND ports.mac_address IN (%(mac_address_1)s)
With that in place, it was much faster to count any resource and the new driver doesn’t add any overhead to a HTTP request.
With this new driver, the affected customers recovered the database deadlock status and didn’t see it again. The Neutron server API didn’t loose any performance while the potential deadlock trigger was removed. After this, we all went to the Homer’s Land of Chocolate to celebrate.
By the way, this is the patch the introduced this new driver.
Benchmarking.
Another question, related to the resource count, may arise: how much time does it cost to count the resources during the reservation? Before the new implementation, a single register read/write was enough to handle the (resource_type, project) count. Now we need to perform a count operation in the database. Even with the previous improvement, that could cost more.
Well, fortunately does not. I deployed an environment based of OSP16.2 with the mentioned fix and the new quota driver. This environment had a single controller hosting the Neutron server and the database engine. The controller host had 16 threads that I used for the Neutron API workers (api_workers configuration variable).
The next step was to simulate some requests. I created a small script, directly using HTTP calls, to create and delete networks. I preferred HTTP calls to the Python OpenStack Client interface because it was faster and only one call was made, the resource creation or deletion. I chose the resource network because it is faster to create than subnets or ports. The script executed a loop with 16 parallel requests.
With both drivers, the make_reservation method took between 20 and 40 milliseconds to be executed. However, with the DBQuotaDriver I experienced an initial transition delay were the method took around 4 seconds, most likely due to the initial resync process.
2022-01-03 15:07:09.188 25 WARNING neutron.quota [req-90ee56b0-f55d-49da-aee6-c3716de80a58 b15115f111bc4fc5a82a44a835e062a2 b5232ed1e96149eeaf009f14265c9ea9 - default default] RAH; reservation time: 0:00:04.343170 2022-01-03 15:07:09.213 33 WARNING neutron.quota [req-d700c38d-d7c4-4df0-8fe0-b135969d7566 b15115f111bc4fc5a82a44a835e062a2 b5232ed1e96149eeaf009f14265c9ea9 - default default] RAH; reservation time: 0:00:04.422759 2022-01-03 15:07:09.614 33 WARNING neutron.quota [req-27c765fd-9fb7-4e43-a98d-3138fb1f6b3e b15115f111bc4fc5a82a44a835e062a2 b5232ed1e96149eeaf009f14265c9ea9 - default default] RAH; reservation time: 0:00:04.738421 2022-01-03 15:07:09.786 33 WARNING neutron.quota [req-ddb765e5-3f12-4417-8138-1d5b7f2c8a70 b15115f111bc4fc5a82a44a835e062a2 b5232ed1e96149eeaf009f14265c9ea9 - default default] RAH; reservation time: 0:00:04.887482 2022-01-03 15:07:09.801 27 WARNING neutron.quota [req-fc2c4ebc-c958-48cf-a6fe-bcb3b59a71b0 b15115f111bc4fc5a82a44a835e062a2 b5232ed1e96149eeaf009f14265c9ea9 - default default] RAH; reservation time: 0:00:04.819090
This first test was executed in an “empty” environment, with almost no networks in the testing project. In the second round, I added 2000 networks to the project, to increase the database processing time during the counting command. The time was very similar, always between the initial 20 and 40 milliseconds. Awesome!
But wait, there is more to come.
This first iteration was not perfect, of course. Although we saw an improvement in some customers, others still experienced a similar issue but this time in the reservations table. That was caused due to the registers clean up done during the reservation process. Before counting the number of resources and the number of active reservations, the method deleted the inactive ones (those already expired). In highly loaded systems, several API workers were clashing during the execution of this deletion, blocking (again) the database in a deadlock.
But the database muses whispered to me how to fix that: creating a periodic worker, executed in a single thread, executing the expired reservations clean up. And here it is. This will alleviate the reservations table locks and won’t affect to the resources count because it discards the expired reservations.
As always, I hope you reach this point without falling asleep and I wish that was helpful.