Categories
Containers OpenStack OVS/OVN

Debugging ML2/OVN database discrepancies

Last week I had a customer with an issue in an OpenStack deployment running with ML2/OVN. Randomly, when creating a virtual machine, the Nova server returned a timeout during the VIF plugin. It took us some time to discover that the IP address assigned to the new port was already assigned to a rogue Logical_Switch_Port present in the OVN Northbound database but absent in the Neutron database. The new created virtual machine port was defined as virtual and Neutron cannot attach a virtual port to a virtual machine.

This kind of issues can be quickly addressed with the neutron-ovn-db-sync-util script. This script compares the Neutron database and the OVS databases to detect these kind of discrepancies, returning the differences between them. Leaving apart the fact of how this error could happen, I would like to describe how to debug Neutron and OVN databases issues, but locally, within your own computer.

What is needed.

The databases, of course. I know that could sound pedantic, but is worth mentioning it. You need the three databases: the Neutron database and the OVN Northbound and Southbound databases.

The Neutron database can be dumped using the mysqldump client:

$ mysqldump --all-databases > openstack.sql   # All OpenStack databases
$ mysqldump neutron > neutron.sql  # Only the Neutron database

The OVN databases can be retrieved directly from the filesystem. You can locate them by searching for the running processes and filtering by ovsdb-server:

$ ps aux | ag ovsdb
root      508586  0.2  0.0  12104  5944 ?        S    Jul12   0:40 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db

The last file, the one with the .db extension, is the database file. If you are running the databases inside containers, you’ll need to get into them and execute the same command.

You will also need the Neutron configuration files, that by default are /etc/neutron/neutron.conf and /etc/neutron/plugins/ml2/ml2_conf.ini.

Run the Neutron database locally.

We are going to run the dumped database locally, but in a disposable way: using containers. Once the debugging is done, the container can be deleted and nothing will remain in the local system. Thus the first step will be to install podman and download the MariaDB container. Remember that from this point, all commands should be executed as root:

$ dnf install podman
$ podman pull docker.io/library/mariadb:latest

To run a container using this downloaded image you just need to execute:

$ podman run --name=neutron_db -e MYSQL_ROOT_PASSWORD=pass -p 3306:3306 -d mariadb:latest

Now it is possible to perform any mysql command against this instance using the port (3306), user (“root”) and password (“pass”) defined. For example, we can load the OpenStack database file and perform any command:

$ mysql -h 127.0.0.1 -P 3306 -uroot -ppass < openstack.sql
$ mysql -h 127.0.0.1 -P 3306 -uroot -ppass -e "use neutron; select * from ports;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+----------------------------------------------+--------------------------+------------------+---------------+
| project_id                       | id                                   | name | network_id                           | mac_address       | admin_state_up | status | device_id                                    | device_owner             | standard_attr_id | ip_allocation |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+----------------------------------------------+--------------------------+------------------+---------------+
| b9b53fc1293f42bc9717df55969ead9b | 73a79d2b-d052-48a9-bfd2-afb454e16f68 |      | 2d826f9f-1a47-4de4-b2d0-7de9108ea824 | fa:16:3e:1c:b0:bf |              1 | DOWN   | ovnmeta-2d826f9f-1a47-4de4-b2d0-7de9108ea824 | network:distributed      |               13 | none          |
|                                  | 77c3e112-8169-49eb-bdae-670f8ec4bbc3 |      | d58576fe-1d6e-4db0-85e2-e2cb71dc430f | fa:16:3e:8a:a1:20 |              1 | ACTIVE | 394896dd-4a76-40bd-85f6-e6efd28c8e53         | network:router_gateway   |               26 | immediate     |
| b9b53fc1293f42bc9717df55969ead9b | adeb7213-e97e-4f20-b26a-064593b0c944 |      | 2d826f9f-1a47-4de4-b2d0-7de9108ea824 | fa:16:3e:45:dc:63 |              1 | ACTIVE | 394896dd-4a76-40bd-85f6-e6efd28c8e53         | network:router_interface |               24 | immediate     |
| 9668cbc179c547d2ba70d9c6f48da7d2 | c23d0070-88c8-4ef1-8539-f283b290b50e |      | d58576fe-1d6e-4db0-85e2-e2cb71dc430f | fa:16:3e:d9:a5:96 |              1 | DOWN   | ovnmeta-d58576fe-1d6e-4db0-85e2-e2cb71dc430f | network:distributed      |               23 | none          |
| 9668cbc179c547d2ba70d9c6f48da7d2 | dc83f22a-c5bd-4315-9378-6b2e020f413c |      | cd0b4dc9-df3e-42a0-ae7f-2fb75f506d88 | fa:16:3e:77:07:82 |              1 | DOWN   | ovnmeta-cd0b4dc9-df3e-42a0-ae7f-2fb75f506d88 | network:distributed      |               29 | none          |
+----------------------------------+--------------------------------------+------+--------------------------------------+-------------------+----------------+--------+----------------------------------------------+--------------------------+------------------+---------------+

Loading a big database can take time. In order to speed up the loading process, it could be better to copy the SQL file inside the container and execute the following commands:

$ podman cp openstack.sql neutron_db:~/.
$ podman exec -uroot -it neutron_db bash
# mysql
> set autocommit=0; source openstack.sql; commit;

Run the OVN Northbound and Southbound databases locally.

Same as with the Neutron database, we are going to run the OVN databases in a container. We are going to download the Fedora container image and run a container:

$ podman pull docker.io/library/fedora:latest
$ podman run --name=ovn_nb -p 6641:6641 -d fedora:latest sleep infinity

This procedure should be done twice, one per each OVN database. This is describing the OVN NB one. The next steps involve the installation of the OVN service inside the container and run the ovsdb-server process. When the local Open vSwitch service is started, both the ovsdb-server and the vswitchd services are started, but the second one fails (we won’t have a virtual switch running on the container); but it doesn’t matter, what is important here is the database server. In case of receiving a database file from a RAFT deployment, the ovsdb-tool command will convert it to a standalone database.

$ podman cp ovnnb_db.db ovn_nb:/.
$ podman exec -uroot -it ovn_nb bash
$$ dnf install openvswitch ovn procps net-tools -y
$$ ./usr/share/openvswitch/scripts/ovs-ctl start
$$ ovsdb-tool cluster-to-standalone ovnnb_db.db.sa ovnnb_db.db
$$ ovsdb-server ovnnb_db.db.sa --remote=ptcp:6641:0.0.0.0 --log-file=ovnnb-server.log --detach

Execute the neutron-ovn-db-sync-util script.

At this point we have the three containers running the databases. It is possible to access any of them using the command line tools mysql, ovn-nbctl and ovn-sbctl. The last step is to install and execute the sync tool. We’ll deploy everything in /tmp because we don’t need to preserve anything after the debug analysis.

$ cd /tmp
$ git clone https://opendev.org/openstack/neutron.git
$ cd neutron
$ python -m venv venv  # creates a Python virtual environment
$ . venv/bin/activate  # activates the virtual environment
$ python -m pip install -Ue .  # locally installs Neutron
$ python -m pip install -U pymysql

Now we have everything almost ready to execute the sync tool. We need a copy of the configuration files, that will be located in /tmp. In these files we need to change the database accesses. It is needed to change the user, the IP address and the ports:

connection = mysql+pymysql://root:pass@127.0.0.1/neutron?charset=utf8
ovn_sb_connection = tcp:127.0.0.1:6642
ovn_nb_connection = tcp:127.0.0.1:6641

Now, inside the virtual environment that is activated:

$ neutron-ovn-db-sync-util --config-file /tmp/neutron.conf --config-file /tmp/ml2_conf.ini --ovn-neutron_sync_mode=log  --log-file /tmp/log_sync.log

Playing a bit with the databases, you can, for example, delete a Logical_Switch_Port from the OVN NB database. In the log file you’ll see the following message:

WARNING neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_db_sync [^[[01;36mNone req-6db7eb4b-7a41-4cd8-bde1-947c82814add ^[[00;36mNone None] ^[[01;35mPort found in Neutron but not in OVN NB DB, port_id=77c3e112-8169-49eb-bdae-670f8ec4bbc3

Stay tuned for more posts!

Leave a Reply

Your email address will not be published. Required fields are marked *