feat: Improve handling of deep taxonomies (perf + limits + bugfixes) by bradenmacdonald · Pull Request #511 · openedx/openedx-core

bradenmacdonald · 2026-03-21T02:15:48Z

This is a fix for openedx/modular-learning#257 .

In short: the API was very inconsistent with how it handles deeply-nested tags, and arguably the behavior was buggy.

Approach: This PR updates the Tag data model to store depth and lineage as columns, rather than computing them dynamically. Then, I rewrote all the queries to support unlimited tag depth. With the depth and lineage columns available, we can perform all the same queries very efficiently without having to hard-code things like parent__parent__parent__... that assume a certain depth limit. Now all the API methods work with an unlimited tag depth.

Actual depth limit: This PR also clarifies the definition of TAXONOMY_MAX_DEPTH and actually enforces it to limit the allowed depth to six levels. (Before this, no limit was enforced when creating tags. A limit of 3 levels was enforced when reading tags from multiple levels at once, but it didn't work well below the root.)

Perfomance: pretty much on par with the main branch in every way I could measure. Significantly faster than my CTE approach #510 .

AI disclosure: Claude assisted with this PR.

openedx-webhooks · 2026-03-21T02:15:54Z

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

fix: API results are now correct regardless of tag depth feat: refuse to create tags deeper than TAXONOMY_MAX_DEPTH perf: make "depth" and "lineage" concrete

…rrect

bradenmacdonald · 2026-03-23T15:40:09Z

@ormsbee @kdmccormick do either of you have time to review this tagging backend PR, to help unblock the tag editor UI work?

kdmccormick · 2026-03-23T18:13:53Z

@bradenmacdonald Sure thing

kdmccormick

Nice, just one question.

Haven't tested it myself--would you like me to?

kdmccormick · 2026-03-23T18:58:21Z

src/openedx_tagging/models/base.py

+        ),
+    )
+    lineage = case_insensitive_char_field(
+        max_length=3006,


I've added the explanation in f5a7a99

(value max_length + 1 for tab character) * (TAXONOMY_MAX_DEPTH + 1) = 501 * 6 = 3006

tests/openedx_tagging/test_models.py

src/openedx_tagging/models/base.py

ChrisChV

@bradenmacdonald Great work! I found some nits and a bug:

This error occurs when importing a taxonomy:

[2026-03-23 19:18:28] Starting execute actions
[2026-03-23 19:18:29] #1: Create a new tag with values (external_id=37153, value=hierarchical taxonomy tag 1, parent_id=None). [Started]
[2026-03-23 19:18:29] AttributeError("'int' object has no attribute 'strip'")

ChrisChV · 2026-03-23T16:50:30Z

src/openedx_tagging/models/base.py

+                next_ancestor_id = row["parent__parent__parent_id"]
+                while next_ancestor_id:  # If there are even deeper ancestors, add them (inefficiently):
+                    next_ancestor_id = Tag.objects.get(pk=next_ancestor_id).parent_id
+                    matching_ids.append(next_ancestor_id)


When reaching a root tag, the next_ancestor_id is None, and that value is added to matching_ids. Is that expected?

It doesn't really hurt anything, but yeah it's better not to. Updated: 60192ba

ChrisChV · 2026-03-23T16:53:08Z

src/openedx_tagging/api.py

@@ -198,15 +200,14 @@ def get_object_tags(
        base_qs
        # Preload related objects, including data for the "get_lineage" method on ObjectTag/Tag:
        .select_related("taxonomy", "tag", "tag__parent", "tag__parent__parent")


By removing the previous query, the tag__parent and the tag__parent__parent in the select_relatedwould no longer be necessary.

Thanks, removed: 60192ba

bradenmacdonald · 2026-03-24T17:42:00Z

@ChrisChV

This error occurs when importing a taxonomy:

The only way I can reproduce that error is using a JSON export where the IDs are changed to integers instead of strings, which is technically invalid. I wasn't able to reproduce it using CSV. So I think the bug was probably there before? But I can add some type coercion to fix it.

kdmccormick

Code looks good to me! Haven't tested, but I'm trusting that you guys have.

bradenmacdonald · 2026-03-24T18:45:40Z

@ChrisChV

This error occurs when importing a taxonomy:

The bug only occurs if you use an invalid JSON export where the IDs are integers. The code before wasn't calling Tag.clean() and was directly using Tag.create() instead of Taxonomy.add_tag(), so it was skipping a lot of validation, although it was working OK because it implicitly coerced to strings when saving into the database. I updated this in f783904 and added more explicit type checking during import. Should be good now :)

openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Mar 21, 2026

openedx-webhooks added this to Contributions Mar 21, 2026

github-project-automation bot moved this to Needs Triage in Contributions Mar 21, 2026

bradenmacdonald mentioned this pull request Mar 21, 2026

Simplify tagging queries with a view that uses recursive CTEs #510

Closed

bradenmacdonald force-pushed the braden/concrete-depth branch from 796a8b4 to e74f5f0 Compare March 21, 2026 02:18

feat: Improve handling of deep taxonomies (perf + limits + bugfixes)

a74e685

fix: API results are now correct regardless of tag depth feat: refuse to create tags deeper than TAXONOMY_MAX_DEPTH perf: make "depth" and "lineage" concrete

bradenmacdonald force-pushed the braden/concrete-depth branch from e74f5f0 to a74e685 Compare March 21, 2026 02:30

bradenmacdonald added 2 commits March 20, 2026 22:13

feat: add an additional CHECK constraint to help ensure lineage is co…

ef07818

…rrect

docs: update some comments

335cada

mphilbrick211 requested review from kdmccormick and ormsbee March 23, 2026 17:55

mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions Mar 23, 2026

kdmccormick reviewed Mar 23, 2026

View reviewed changes

ChrisChV requested changes Mar 23, 2026

View reviewed changes

bradenmacdonald added 3 commits March 24, 2026 09:48

docs: clarify max_length and lineage update query

f5a7a99

chore: formatting

101fbc7

test: add a lineage test for renaming a tag

ea658b0

bradenmacdonald added 2 commits March 24, 2026 10:48

fix: nits found by Chris

60192ba

fix: errors when imported JSON contains integer IDs

f783904

kdmccormick approved these changes Mar 24, 2026

View reviewed changes

fix: make TAXONOMY_MAX_DEPTH validation more robust

4722fc4

bradenmacdonald requested a review from ChrisChV March 24, 2026 18:46

Conversation

bradenmacdonald commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openedx-webhooks commented Mar 21, 2026

Uh oh!

bradenmacdonald commented Mar 23, 2026

Uh oh!

kdmccormick commented Mar 23, 2026

Uh oh!

kdmccormick left a comment

Choose a reason for hiding this comment

Uh oh!

kdmccormick Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ChrisChV left a comment

Choose a reason for hiding this comment

Uh oh!

ChrisChV Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisChV Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald commented Mar 24, 2026

Uh oh!

kdmccormick left a comment

Choose a reason for hiding this comment

Uh oh!

bradenmacdonald commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bradenmacdonald commented Mar 21, 2026 •

edited

Loading

bradenmacdonald Mar 24, 2026 •

edited

Loading

bradenmacdonald commented Mar 24, 2026 •

edited

Loading