-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Description:
There is an issue in Specify 7 where the renumber_tree function is not properly assigning nodenumber and highestchildnodenumber values, resulting in tree numbering errors that prevent merges from completing successfully. This issue was reported by Tomislav and has been consistently reproducible in their database.
Steps to Reproduce:
-
Query the database with:
select count(*) from taxon t join taxon p on t.parentid = p.taxonid where t.nodenumber not between p.nodenumber and p.highestchildnodenumber;
See that it returns a count of
0 -
Run the
renumber_tree('taxon')function directly via Django shell.❯ docker exec -it specify7 /bin/bash specify@4f497e2137b1:/opt/specify7$ ve/bin/python manage.py shell Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from specifyweb.specify import tree_extras >>> from django.db import connection >>> tree_extras.renumber_tree('taxon') [15/Sep/2025 12:14:47] [INFO] [specifyweb.specify.tree_extras:643] renumbering tree -
Query the database with:
select count(*) from taxon t join taxon p on t.parentid = p.taxonid where t.nodenumber not between p.nodenumber and p.highestchildnodenumber;
This returns a non-zero count (e.g.,
2274). -
Attempt to merge two taxon nodes (e.g., 'Palaeoniscidae (in Palaeonisciformes)' into 'Palaeoniscidae (in Palaeoniscoidea)').
-
When
validate_tree_numbering('taxon')is run, it throws:AssertionError: found 2274 nodenumbers not nested by parent>>> tree_extras.validate_tree_numbering('taxon') [15/Sep/2025 12:14:48] [INFO] [specifyweb.specify.tree_extras:598] validating tree Traceback (most recent call last): File "<console>", line 1, in <module> File "/opt/specify7/specifyweb/specify/tree_extras.py", line 624, in validate_tree_numbering assert not_nested_count == 0, \ ^^^^^^^^^^^^^^^^^^^^^ AssertionError: found 2274 nodenumbers not nested by parent
Specify 6 Behavior (Working as Expected):
- Running "Update Taxon tree" and "Update Geography tree" in Specify 6 (which triggers the tree renumbering logic) successfully resets the bad count to 0. Merges in this version work without errors after that.
- In Specify 6, further merges do not cause the numbering issue to reappear.
Specify 7 Behavior (Not Working):
- In Specify 7, even after manually correcting the numbering (either by database fix or using external tools), every time a merge is attempted, the application automatically attempts to renumber the tree, re-introducing faulty numbering.
- This results in the error resurfacing with every merge action and the blocking of merges.
- This auto-renumber process makes it impossible to maintain consistent numbering or migrate data as needed.
Impact:
- Users are unable to merge taxonomy trees in Specify 7 databases due to this bug.
- The operations work in Specify 6.
- The bug blocks critical data curation operations.
Environment:
This is happening in v7.11.1. Testing with 6.8.03.
Relevant logs and output:
After running renumber:
[INFO] renumbering tree
After validation:
AssertionError: found 2274 nodenumbers not nested by parent
After running "Update Taxon tree" or "Update Geography tree" in Specify 6:
Count updated to 0. No errors present.
Relevant code:
specify7/specifyweb/specify/tree_extras.py
Lines 597 to 625 in 2148663
| def validate_tree_numbering(table): | |
| logger.info('validating tree') | |
| cursor = connection.cursor() | |
| cursor.execute( | |
| "select count(*), count(distinct nodenumber), count(highestchildnodenumber)\n" | |
| "from {table}".format(table=table) | |
| ) | |
| node_count, nn_count, hcnn_count = cursor.fetchone() | |
| assert node_count == nn_count == hcnn_count, \ | |
| "found {} nodes but {} nodenumbers and {} highestchildnodenumbers" \ | |
| .format(node_count, nn_count, hcnn_count) | |
| cursor.execute(( | |
| "select count(*) from {table} t join {table} p on t.parentid = p.{table}id\n" | |
| "where t.rankid <= p.rankid\n" | |
| "and t.acceptedid is null" | |
| ).format(table=table)) | |
| bad_ranks_count, = cursor.fetchone() | |
| assert bad_ranks_count == 0, \ | |
| "found {} cases where node rank is not greater than its parent." \ | |
| .format(bad_ranks_count) | |
| cursor.execute(( | |
| "select count(*) from {table} t join {table} p on t.parentid = p.{table}id\n" | |
| "where t.nodenumber not between p.nodenumber and p.highestchildnodenumber\n" | |
| ).format(table=table)) | |
| not_nested_count, = cursor.fetchone() | |
| assert not_nested_count == 0, \ | |
| f"found {not_nested_count} nodenumbers not nested by parent" |
specify7/specifyweb/specify/tree_extras.py
Lines 642 to 734 in 2148663
| def renumber_tree(table): | |
| logger.info('renumbering tree') | |
| cursor = connection.cursor() | |
| # make sure rankids are set correctly | |
| cursor.execute(( | |
| "update {table} t\n" | |
| "join {table}treedefitem d on t.{table}treedefitemid = d.{table}treedefitemid\n" | |
| "set t.rankid = d.rankid\n" | |
| ).format(table=table)) | |
| # make sure there are no cycles | |
| cursor.execute(( | |
| "select p.{table}id, p.fullname, t.{table}id, t.fullName, tdef.title\n" | |
| "from {table} t\n" | |
| "join {table} p on t.parentid = p.{table}id\n" | |
| "join {table}treedefitem tdef on t.{table}treedefitemid=tdef.{table}treedefitemid\n" | |
| "where t.rankid <= p.rankid\n" | |
| "and t.acceptedid is null" | |
| ).format(table=table)) | |
| results = cursor.fetchall() | |
| formattedResults = { | |
| "nodeData" : [ | |
| { | |
| "parent" : { | |
| f"{table.capitalize()} ID" : parentID, | |
| "Full Name" : parentName | |
| }, | |
| "child" : { | |
| f"{table.capitalize()} ID" : childID, | |
| "Full Name" : childName, | |
| "Bad Rank" : childRank | |
| }} for parentID, parentName, childID, childName, childRank in results], | |
| "localizationKey" : "badTreeStructureInvalidRanks", | |
| } | |
| bad_ranks_count = cursor.rowcount | |
| formattedResults["badRanks"] = bad_ranks_count | |
| if bad_ranks_count > 0: | |
| # raise AssertionError( # Phasing out node numbering, logging as warning instead of raising an exception | |
| logger.warning( | |
| f"Bad Tree Structure: Found {bad_ranks_count} case(s) where node rank is not greater than its parent", | |
| formattedResults, | |
| ) | |
| # Get the tree ranks in leaf -> root order. | |
| cursor.execute(f"select distinct rankid from {table} order by rankid desc") | |
| ranks = [rank for (rank,) in cursor.fetchall()] | |
| depth = len(ranks) | |
| # Construct a path enumeration for each node and set the | |
| # nodenumbers according to the lexical ordering of the paths. This | |
| # ensures ancestor node numbers come before descendents and | |
| # sibling nodes get consecutive ranges. | |
| cursor.execute("set @rn := 0") | |
| cursor.execute(( | |
| "update {table} t\n" | |
| "join (select @rn := @rn + 1 as nn, p.id as id from \n" | |
| " (select t0.{table}id as id, {path} as path\n" | |
| " from {table} t0\n" | |
| " {parent_joins}\n" | |
| " order by path) p\n" | |
| ") r on t.{table}id = r.id\n" | |
| # Set the highestchildnodenumber to be the same as the | |
| # nodenumber. This is correct for leaves, and interior nodes | |
| # will be adjusted in the next step. | |
| "set t.nodenumber = r.nn, t.highestchildnodenumber = r.nn\n" | |
| ).format( | |
| table=table, | |
| path=path_expr(table, depth), | |
| parent_joins=parent_joins(table, depth), | |
| )) | |
| # Adjust the highestchildnodenumbers working from the penultimate | |
| # rank downward towards the roots. The highest rank cannot have | |
| # any children, so all nodes there correctly have | |
| # highestchildnodenumber = nodenumber as set in the previous step. | |
| # Interior nodes are updated by inner joining against their | |
| # children so that nodes with no children are not updated leaving | |
| # highestchildnodenumber = nodenumber. | |
| for rank in ranks[1:]: | |
| cursor.execute(( | |
| "update {table} t join (\n" | |
| " select max(highestchildnodenumber) as hcnn, parentid\n" | |
| " from {table} where rankid > %(rank)s group by parentid\n" | |
| ") as sub on sub.parentid = t.{table}id\n" | |
| "set highestchildnodenumber = hcnn where rankid = %(rank)s\n" | |
| ).format(table=table), {'rank': rank}) | |
| # Clear the BadNodes and UpdateNodes flags. | |
| from .models import datamodel, Sptasksemaphore | |
| tree_model = datamodel.get_table(table) | |
| tasknames = [name.format(tree_model.name) for name in ("UpdateNodes{}", "BadNodes{}")] | |
| Sptasksemaphore.objects.filter(taskname__in=tasknames).update(islocked=False) |
Reported by:
Tomislav at University of Texas at Austin Specify Help Desk ticket #159