Skip to content

fix: filter out datasets with inconsistent database and LakeFS records#5171

Open
xuang7 wants to merge 2 commits into
apache:mainfrom
xuang7:fix/filter-mismatched-datasets
Open

fix: filter out datasets with inconsistent database and LakeFS records#5171
xuang7 wants to merge 2 commits into
apache:mainfrom
xuang7:fix/filter-mismatched-datasets

Conversation

@xuang7
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 commented May 24, 2026

What changes were proposed in this PR?

This PR fixes an issue where dataset listings fail when dataset records in the database and LakeFS repositories are inconsistent. This breaks the workflow dataset picker and can also affect Hub dataset listings. The fix updates the dataset listing endpoints to first fetch existing LakeFS repository names and filter out dataset records whose repositories are missing, so valid datasets can still be returned normally.

Demo:

Before After
Before: dataset listing error After: dataset picker loads valid datasets

Any related issues, documentation, discussions?

Closes #5106

How was this PR tested?

Added two tests.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@github-actions github-actions Bot added engine fix common platform Non-amber Scala service paths labels May 24, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 24, 2026

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.66%. Comparing base (0f5f791) to head (a98106b).

Files with missing lines Patch % Lines
...exera/web/resource/dashboard/hub/HubResource.scala 0.00% 2 Missing ⚠️
.../amber/core/storage/util/LakeFSStorageClient.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5171      +/-   ##
============================================
- Coverage     43.66%   43.66%   -0.01%     
  Complexity     2218     2218              
============================================
  Files          1049     1049              
  Lines         40580    40585       +5     
  Branches       4324     4324              
============================================
+ Hits          17719    17720       +1     
- Misses        21766    21768       +2     
- Partials       1095     1097       +2     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from 0f5f791
amber 43.92% <0.00%> (-0.03%) ⬇️
computing-unit-managing-service 1.38% <ø> (ø)
config-service 19.35% <ø> (ø)
file-service 32.89% <100.00%> (+0.70%) ⬆️
frontend 35.15% <ø> (ø) Carriedforward from 0f5f791
python 90.50% <ø> (ø) Carriedforward from 0f5f791
workflow-compiling-service 58.39% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xuang7 xuang7 requested a review from aicam May 24, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common engine fix platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset file selection fails when LakeFS repository and database records are inconsistent

2 participants