Skip to content

Tbain/253 add tags count#506

Open
tbain wants to merge 11 commits intoopenedx:mainfrom
tbain:tbain/253_add_tags_count_rebased
Open

Tbain/253 add tags count#506
tbain wants to merge 11 commits intoopenedx:mainfrom
tbain:tbain/253_add_tags_count_rebased

Conversation

@tbain
Copy link
Copy Markdown

@tbain tbain commented Mar 18, 2026

Description

This implements openedx/modular-learning#253 , the task to add tag usage counts to the tags table under the taxonomies table. The frontend piece is where the results of this aggregation work is displayed is part of a separate pr to openedx/frontend-app-authoring. This change adds a subquery annotation onto the django query for retrieving tags. The original implementation of the counts for tags only counted raw usage of each tag, rather than aggregate sum of any tag and child tag usage with sibling de-duplication for the same usage (e.g. when two sibling nodes are used against the same course, module, etc. we still only need to count that as '1' for any parent/grandparent nodes) as specified in the AC for the issue above, so it was replaced with this more complicated sub-query that sums across tag usage based on various courses, sections, modules, and libraries that might use a tag.

Supporting information

Github issue with AC: openedx/modular-learning#253

Testing instructions

Refer to the AC in the Github Issue. Steps to verify this is implemented and working via UX (Note, depends on the frontend part of this ticket):

  1. Navigate to the "Studio home" page
  2. Navigate into an existing Course (or create a course and navigate into it)
  3. In the "Course Outline" page, add tag(s) from an existing taxonomy to the course, module, or section. Ensure at least one of the tags you add is a sub-tag of a root tag.
  4. Navigate back to the "Studio home" page
  5. Click the "Taxonomies" tab to navigate to the Taxonomies page
  6. Navigate into the Taxonomy that corresponds to the tag you added in step 3
  7. Observe that, if a tag is used, there is now an additional column on the table named "Usage Count" that is populated with bubbles that display the count of tags usages, if applicable
  8. Ensure that the tag you added in Step 3 properly associates the incremented count from its usage, and ensure that the usage count properly aggregates up the lineage based on the sub tag you selected in step 3

Other information

Include anything else that will help reviewers and consumers understand the change.

  • Does this change depend on other changes elsewhere?
    • this ticket is backwards compatible with the current implementation in frontend-app-authoring, since by default the frontend does not request the counts.
  • Any special concerns or limitations? For example: deprecations, migrations, security, or accessibility.
    • none at this time

@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @tbain!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 18, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Mar 18, 2026
Copy link
Copy Markdown

@jesperhodge jesperhodge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be changes missing. For example, src/taxonomy/data/api.ts.
Could you

  • review this PR and make sure that all necessary changes are in this branch? Compare to the open Unicon PR.
  • review discussions in the Unicon PR and either resolve them or copy them here to be addressed here.
  • fix any pipeline errors
    ?

@mgwozdz-unicon
Copy link
Copy Markdown
Contributor

Since we're no longer using recursive SQL for this, is it possible to update the PR description for accuracy?

@mphilbrick211 mphilbrick211 moved this from Needs Triage to In Eng Review in Contributions Mar 23, 2026
@tbain
Copy link
Copy Markdown
Author

tbain commented Mar 23, 2026

There seem to be changes missing. For example, src/taxonomy/data/api.ts. Could you

* review this PR and make sure that all necessary changes are in this branch? Compare to the open Unicon PR.

* review discussions in the Unicon PR and either resolve them or copy them here to be addressed here.

* fix any pipeline errors
  ?
  • src/taxonomy/data/api.ts, as an example, was a file in the front-end changes. I compared everything with the Backend changes/openedx-core and this is the correct set of files
  • All comments/issues to address from the aforementioned PR have been addressed with this one, so this PR is up to date
  • Working on that - I had missed a test suite that was affected by the changes so address that, still working on a strange quality issue where it's complaining about the time the unit test suite takes

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds rolled-up, de-duplicated tag usage counts (including ancestor rollups) to the tag listing query so the Taxonomies UI can display accurate “Usage Count” values per tag.

Changes:

  • Replaced the prior per-tag direct usage counting subquery with a dynamic, depth-aware subquery that rolls counts up to ancestors with per-object de-duplication.
  • Updated existing API/model tests to reflect rolled-up counts and added a broader set of usage-count test cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
src/openedx_tagging/models/base.py Centralizes and updates include_counts behavior by annotating tag querysets with rolled-up, de-duplicated usage_count via a subquery.
tests/openedx_tagging/test_models.py Updates expected usage counts and adds multiple new test scenarios validating ancestor rollup and sibling de-duplication.
tests/openedx_tagging/test_api.py Updates autocomplete/search test expectations to reflect rolled-up usage counts returned by the API when include_counts=True.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bradenmacdonald
Copy link
Copy Markdown
Contributor

Feel free to ping me for review here once the AC are clarified and the comments from Copilot etc are addressed.

@tbain
Copy link
Copy Markdown
Author

tbain commented Mar 28, 2026

Feel free to ping me for review here once the AC are clarified and the comments from Copilot etc are addressed.

@bradenmacdonald I think this is ready for re-review, I resolved all the Copilot issues and added the improvement you suggested for finding the depth via a query rather than depending on the constant

@bradenmacdonald
Copy link
Copy Markdown
Contributor

When I test this using
Lightcast Open Skills Taxonomy.csv (which has 4,268 tags in 3 levels), the time to load /api/content_tagging/v1/taxonomies/19/tags/?full_depth_threshold=10000&include_counts=true goes from ~140ms (current main branch after recent optimizations) to ~1,250ms (this branch, with main locally merged in) - a 10x decrease in performance. It's still pretty performant overall, but the slowdown is very noticeable.

"""
Test that the usage count is correct and parent counts are included based on
child tags being added to an object. However, we de-duplicate and only count
1 parent tag towards a course even if 2 children are applied to that course
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1 parent tag towards a course even if 2 children are applied to that course
1 parent tag towards each object even if 2 children are applied to that object
(to reflect that the UI for that object would show 1 implicit parent tag and 2
explicit child tags).

This description (and the test name) makes it sound like we are aware of the context of objects within courses, but we are not. There is no logic related to "courses" specifically in the query.

Object 1, Course A: grandchild tag
Object 2, Course A: grandchild tag
Object 3, Course B: grandchild tag

Overall usage count for the common parent of "grandchild tag" should be "3", because these are all distinct objects and it doesn't matter that two of them are in the same course.

" Animalia (used: 4, children: 7 + 1)",
" Arthropoda (used: 1, children: 0)",
" Chordata (used: 0, children: 1)",
" Chordata (used: 0, children: 1)", # <<< Chordata has a matching child but we only support searching
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean? Was it added accidentally during a merge/rebase?

@bradenmacdonald
Copy link
Copy Markdown
Contributor

I'm not opposed to this PR as is, but a 10x slowdown isn't great, and I suspect it may be worse if there are more ObjectTags in use (I don't have that many in my test environment).

In order to improve performance, I have two suggestions:

  1. Build the object counts in python. Basically, in the /taxonomy/:n/tags/ REST API endpoint, once we've evaluated the query to load the tags (along with whatever filtering and pagination etc. may be in place), then you can do a second query to load all related ObjectTags, including tag__lineage (1 simple query, no aggregation at the query level). Then, in python, you can group by object tags, split the lineage up into individual tags, de-duplicate with a set(), and then annotate the original query objects with the counts. This also lets you separate implicit counts from explicit counts in the API, which I think would be even better then combining them.

  2. Or, perhaps even better, make the "get counts with implicit counts" a separate REST API endpoint. Then you can implement it either the way I described above or your original way, and it doesn't matter if it's a bit slow since the UI can load it separately, and the rest of the tags will load in immediately, so it doesn't matter if the counts load a bit slower.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Status: In Eng Review

Development

Successfully merging this pull request may close these issues.

7 participants