Skip to content

Tree renumbering function introduces invalid numbering, causing merge failures #7423

@grantfitzsimmons

Description

@grantfitzsimmons

Description:
There is an issue in Specify 7 where the renumber_tree function is not properly assigning nodenumber and highestchildnodenumber values, resulting in tree numbering errors that prevent merges from completing successfully. This issue was reported by Tomislav and has been consistently reproducible in their database.

Steps to Reproduce:

  1. Query the database with:

    select count(*) from taxon t join taxon p on t.parentid = p.taxonid
    where t.nodenumber not between p.nodenumber and p.highestchildnodenumber;

    See that it returns a count of 0

  2. Run the renumber_tree('taxon') function directly via Django shell.

    ❯ docker exec -it specify7 /bin/bash
    specify@4f497e2137b1:/opt/specify7$ ve/bin/python manage.py shell
    Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    (InteractiveConsole)
    >>> from specifyweb.specify import tree_extras
    >>> from django.db import connection
    >>> tree_extras.renumber_tree('taxon')
    [15/Sep/2025 12:14:47] [INFO] [specifyweb.specify.tree_extras:643] renumbering tree
    
  3. Query the database with:

    select count(*) from taxon t join taxon p on t.parentid = p.taxonid
    where t.nodenumber not between p.nodenumber and p.highestchildnodenumber;

    This returns a non-zero count (e.g., 2274).

  4. Attempt to merge two taxon nodes (e.g., 'Palaeoniscidae (in Palaeonisciformes)' into 'Palaeoniscidae (in Palaeoniscoidea)').

  5. When validate_tree_numbering('taxon') is run, it throws:

    AssertionError: found 2274 nodenumbers not nested by parent
    
     >>> tree_extras.validate_tree_numbering('taxon')
     [15/Sep/2025 12:14:48] [INFO] [specifyweb.specify.tree_extras:598] validating tree
     Traceback (most recent call last):
       File "<console>", line 1, in <module>
       File "/opt/specify7/specifyweb/specify/tree_extras.py", line 624, in validate_tree_numbering
         assert not_nested_count == 0, \
                ^^^^^^^^^^^^^^^^^^^^^
     AssertionError: found 2274 nodenumbers not nested by parent

Specify 6 Behavior (Working as Expected):

  • Running "Update Taxon tree" and "Update Geography tree" in Specify 6 (which triggers the tree renumbering logic) successfully resets the bad count to 0. Merges in this version work without errors after that.
  • In Specify 6, further merges do not cause the numbering issue to reappear.

Specify 7 Behavior (Not Working):

  • In Specify 7, even after manually correcting the numbering (either by database fix or using external tools), every time a merge is attempted, the application automatically attempts to renumber the tree, re-introducing faulty numbering.
  • This results in the error resurfacing with every merge action and the blocking of merges.
  • This auto-renumber process makes it impossible to maintain consistent numbering or migrate data as needed.

Impact:

  • Users are unable to merge taxonomy trees in Specify 7 databases due to this bug.
  • The operations work in Specify 6.
  • The bug blocks critical data curation operations.

Environment:
This is happening in v7.11.1. Testing with 6.8.03.

Relevant logs and output:
After running renumber:

[INFO] renumbering tree

After validation:

AssertionError: found 2274 nodenumbers not nested by parent

After running "Update Taxon tree" or "Update Geography tree" in Specify 6:

Count updated to 0. No errors present.

Relevant code:

def validate_tree_numbering(table):
logger.info('validating tree')
cursor = connection.cursor()
cursor.execute(
"select count(*), count(distinct nodenumber), count(highestchildnodenumber)\n"
"from {table}".format(table=table)
)
node_count, nn_count, hcnn_count = cursor.fetchone()
assert node_count == nn_count == hcnn_count, \
"found {} nodes but {} nodenumbers and {} highestchildnodenumbers" \
.format(node_count, nn_count, hcnn_count)
cursor.execute((
"select count(*) from {table} t join {table} p on t.parentid = p.{table}id\n"
"where t.rankid <= p.rankid\n"
"and t.acceptedid is null"
).format(table=table))
bad_ranks_count, = cursor.fetchone()
assert bad_ranks_count == 0, \
"found {} cases where node rank is not greater than its parent." \
.format(bad_ranks_count)
cursor.execute((
"select count(*) from {table} t join {table} p on t.parentid = p.{table}id\n"
"where t.nodenumber not between p.nodenumber and p.highestchildnodenumber\n"
).format(table=table))
not_nested_count, = cursor.fetchone()
assert not_nested_count == 0, \
f"found {not_nested_count} nodenumbers not nested by parent"

def renumber_tree(table):
logger.info('renumbering tree')
cursor = connection.cursor()
# make sure rankids are set correctly
cursor.execute((
"update {table} t\n"
"join {table}treedefitem d on t.{table}treedefitemid = d.{table}treedefitemid\n"
"set t.rankid = d.rankid\n"
).format(table=table))
# make sure there are no cycles
cursor.execute((
"select p.{table}id, p.fullname, t.{table}id, t.fullName, tdef.title\n"
"from {table} t\n"
"join {table} p on t.parentid = p.{table}id\n"
"join {table}treedefitem tdef on t.{table}treedefitemid=tdef.{table}treedefitemid\n"
"where t.rankid <= p.rankid\n"
"and t.acceptedid is null"
).format(table=table))
results = cursor.fetchall()
formattedResults = {
"nodeData" : [
{
"parent" : {
f"{table.capitalize()} ID" : parentID,
"Full Name" : parentName
},
"child" : {
f"{table.capitalize()} ID" : childID,
"Full Name" : childName,
"Bad Rank" : childRank
}} for parentID, parentName, childID, childName, childRank in results],
"localizationKey" : "badTreeStructureInvalidRanks",
}
bad_ranks_count = cursor.rowcount
formattedResults["badRanks"] = bad_ranks_count
if bad_ranks_count > 0:
# raise AssertionError( # Phasing out node numbering, logging as warning instead of raising an exception
logger.warning(
f"Bad Tree Structure: Found {bad_ranks_count} case(s) where node rank is not greater than its parent",
formattedResults,
)
# Get the tree ranks in leaf -> root order.
cursor.execute(f"select distinct rankid from {table} order by rankid desc")
ranks = [rank for (rank,) in cursor.fetchall()]
depth = len(ranks)
# Construct a path enumeration for each node and set the
# nodenumbers according to the lexical ordering of the paths. This
# ensures ancestor node numbers come before descendents and
# sibling nodes get consecutive ranges.
cursor.execute("set @rn := 0")
cursor.execute((
"update {table} t\n"
"join (select @rn := @rn + 1 as nn, p.id as id from \n"
" (select t0.{table}id as id, {path} as path\n"
" from {table} t0\n"
" {parent_joins}\n"
" order by path) p\n"
") r on t.{table}id = r.id\n"
# Set the highestchildnodenumber to be the same as the
# nodenumber. This is correct for leaves, and interior nodes
# will be adjusted in the next step.
"set t.nodenumber = r.nn, t.highestchildnodenumber = r.nn\n"
).format(
table=table,
path=path_expr(table, depth),
parent_joins=parent_joins(table, depth),
))
# Adjust the highestchildnodenumbers working from the penultimate
# rank downward towards the roots. The highest rank cannot have
# any children, so all nodes there correctly have
# highestchildnodenumber = nodenumber as set in the previous step.
# Interior nodes are updated by inner joining against their
# children so that nodes with no children are not updated leaving
# highestchildnodenumber = nodenumber.
for rank in ranks[1:]:
cursor.execute((
"update {table} t join (\n"
" select max(highestchildnodenumber) as hcnn, parentid\n"
" from {table} where rankid > %(rank)s group by parentid\n"
") as sub on sub.parentid = t.{table}id\n"
"set highestchildnodenumber = hcnn where rankid = %(rank)s\n"
).format(table=table), {'rank': rank})
# Clear the BadNodes and UpdateNodes flags.
from .models import datamodel, Sptasksemaphore
tree_model = datamodel.get_table(table)
tasknames = [name.format(tree_model.name) for name in ("UpdateNodes{}", "BadNodes{}")]
Sptasksemaphore.objects.filter(taskname__in=tasknames).update(islocked=False)

Reported by:
Tomislav at University of Texas at Austin Specify Help Desk ticket #159

Metadata

Metadata

Assignees

Labels

2 - TreesIssues that are related to the tree system and related functionalities.

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions