Skip to content

feat(migrate): MongoDB tooling for v1.19.2 -> v2.0.0 (URLs + emails + ACL audit)#1543

Open
re-vlad wants to merge 14 commits into
finos:mainfrom
re-vlad:feat/1535-mongodb-migration
Open

feat(migrate): MongoDB tooling for v1.19.2 -> v2.0.0 (URLs + emails + ACL audit)#1543
re-vlad wants to merge 14 commits into
finos:mainfrom
re-vlad:feat/1535-mongodb-migration

Conversation

@re-vlad
Copy link
Copy Markdown

@re-vlad re-vlad commented May 21, 2026

Description

Upgrade to Git-Proxy v2.0.0 requires database preparation for breaking behavior around exact repo URL lookup (need .git on repos.url) and v2 push authorization based on committer email + User.username ACL entries.
This PR adds operator-run migration scripts with dry-run by default, explicit apply, structured reports, and separate backups so upgrades are safer and repeatable (aligned with issue #1535).

Phase 1 — repo URL normalization: append .git to repos.url where missing (idempotent) via migrate-urls.js + backup-urls.js.
Phase 2 — users & ACL normalization:

  • migrate-users.js always runs email audit + ACL orphan audit (entries in canPush / canAuthorise that don’t resolve to any User.username; no silent ACL rewrite).
  • optional email updates via --apply --csv username,email (lib/csv.js, apply-user-emails.js with post-apply uniqueness simulation).
  • backup-users.js writes backup-users-.json (full snapshot, passwords excluded) and users-email-.csv template for admin edits before apply.
  • docs: The Migration-guide-v1-to-v2.md updated for both phases + csv constraints.

Related Issue

Resolves #1535

Checklist

General

Documentation

  • Documentation has been added/updated for any new features

Configuration

  • If configuration schema (config.schema.json) was modified:
    • TypeScript types regenerated (npm run generate-config-types)
    • Schema reference docs regenerated (npm run gen-schema-doc)

Tests

  • Tests have been added/updated for new functionality
  • Unit tests pass (npm test)
  • Linting and formatting pass (npm run lint and npm run format:check)
  • Type checks pass (npm run check-types)

Note: Migration scripts are one-off operator tooling, not part of the main app bundle. Their unit tests live under scripts/migrate/test/ and are not included in the default npm test run (which targets ./test). Run them separately with:

npm run test:migrate

@re-vlad re-vlad requested a review from a team as a code owner May 21, 2026 14:20
@re-vlad re-vlad self-assigned this May 21, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented May 21, 2026

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 1170f0e
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/6a19a7843e174e0008da4f91

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 21, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: re-vlad / name: re-vlad (51384cc)

@re-vlad re-vlad added the enhancement New feature or request label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.65%. Comparing base (ffd6937) to head (1170f0e).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1543   +/-   ##
=======================================
  Coverage   90.65%   90.65%           
=======================================
  Files          69       69           
  Lines        5690     5690           
  Branches      985      985           
=======================================
  Hits         5158     5158           
  Misses        514      514           
  Partials       18       18           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@re-vlad re-vlad force-pushed the feat/1535-mongodb-migration branch from 51384cc to fb1782a Compare May 21, 2026 14:54
@re-vlad re-vlad changed the title feat: v1.19.2 to v2.0.0 repos.url Idempotent normalization: append .git where missing feat(migrate): MongoDB tooling for v1.19.2 → v2.0.0 (URLs + emails + ACL audit) May 27, 2026

**Goal:** every `repos.url` that v2 will match must include the `.git` suffix where it is missing.

**Why:** v2 resolves repos by **exact** `url`; v1 often relied on `name`. Clients typically send URLs with `.git`; rows without `.git` stop matching.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the "client" you're referring to is the UI, this is inaccurate: the UI enforces the .git ending when creating new repositories. I'd edit this for accuracy/clarity:

Suggested change
**Why:** v2 resolves repos by **exact** `url`; v1 often relied on `name`. Clients typically send URLs with `.git`; rows without `.git` stop matching.
**Why:** v2 resolves repos by **exact** `url`; v1 often relied on `name`. Rows without `.git` no longer match when calling `getRepoByUrl`, which makes processors such as `checkRepoInAuthorisedList` behave unexpectedly.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is definately my miswording - by "clients" I meant incoming git HTTP traffic (which includes .git in the path), not the admin UI, which already enforces .git on create. I've updated the 'Why..' section to describe the actual failure mode: exact getRepoByUrl lookup vs legacy rows without .git, and the impact on processors like checkRepoInAuthorisedList

Comment thread scripts/migrate/lib/analyze-users.js Outdated
counts,
duplicateGroups,
issues,
blockingIssueCount: counts.missing + counts.blank + counts.duplicate,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be counting emails with invalid formatting counts.invalidFormat here as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch: invalid-format is already in issues and counts.invalidFormat, but it was omitted from blockingIssueCount. That meant dry-run could exit 0 while the DB still had malformed emails, which is inconsistent with CSV validation. Now thecounts.invalidFormat included into the blockingIssueCount.

expect(byId.get('u1').status).toBe('missing');
expect(byId.get('u2').status).toBe('blank');
expect(byId.get('u3').status).toBe('invalid-format');
expect(report.blockingIssueCount).toBe(2); // missing + blank
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment: any specific reason why emails with invalid formatting are not blocking? The script would just proceed with the update even if invalid emails were present.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changed: missing + blank + invalid-format


for (const repo of repos) {
const currentUrl = (repo.url || '').trim();
const needsUpdate = !currentUrl.endsWith('.git');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this create invalid URLs if the url field wasn't formatted properly before? Some examples:

  • url: "" would turn into url: ".git"
  • url: "https://github.com/finos/git-proxy/ would turn into url: "https://github.com/finos/git-proxy/.git

At least we should report these, so admins can fix them manually.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thank you @jescalada, the initial implementation could produce invalid values (e.g. "" → ".git" or trailing / → "/.git"). This PR now normalizes only safe cases (trim + strip trailing /) and reports blank/non-http(s) URLs as issues (with url-issues-*.csv) for manual admin fixes; it also exits non‑zero while issues remain so they can’t be missed.

Comment thread scripts/migrate/migrate-urls.js Outdated

for (const change of report.changes) {
try {
const success = await updateRepoUrl(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up: updateRepoUrl is creating a new MongoClient for each repo to update. We probably want to have a single client or do bulk writes instead.

Copy link
Copy Markdown
Author

@re-vlad re-vlad May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed: now we reuse a single Mongo сlient when 'apply' to avoid reconnect per repo

Comment thread scripts/migrate/lib/common.js Outdated
}

async function countReposWithoutGit(mongoUri, dbName) {
const client = new MongoClient(mongoUri);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function might also benefit from having a shared MongoClient

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, now the function accept already prepared collection and not opened the connection internally

Comment thread scripts/migrate/lib/common.js Outdated
const reposCollection = db.collection('repos');

return await reposCollection.countDocuments({
url: { $not: /\.git$/ },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this match empty strings/null URLs? These would get counted but not actually migrated since we're currently ignoring empty URLs in buildUrlNormalizationReport.

Copy link
Copy Markdown
Author

@re-vlad re-vlad May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The criteria: { url: { $not: /.git$/ } } also matches empty/null and other non-.git URLs.
That's intentional. remaining is a post-apply DB check (how many repos.url values still don't end with .git), not a count of failed migration updates. Blank/unsupported URLs are skipped by normalizeRepoUrl and go to 'issues', not report.changes, but they still belong in this total because they need manual fixes.
We already report them as URL 'issues' and fail the run when urlIssues > 0. I updated the verify warning to note that remaining includes blank/invalid URLs listed under URL issues.

re-vlad added 2 commits May 28, 2026 15:12
- avoid generating invalid repo URLs during v1→v2 migration
- normalizing safe cases (trim + strip trailing slashes)
- reporting blank/non-http(s) values as manual-fix issues.
- refactor to reuse one MongoClient per run
- avoid repeated connect/close cycles
@re-vlad re-vlad changed the title feat(migrate): MongoDB tooling for v1.19.2 → v2.0.0 (URLs + emails + ACL audit) feat(migrate): MongoDB tooling for v1.19.2 -> v2.0.0 (URLs + emails + ACL audit) May 28, 2026
Copy link
Copy Markdown
Contributor

@Andreybest Andreybest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok from my side, please check @jescalada comments.
Only 2 things from my side to make it perfect:

  1. TypeScript in migration scripts would be appreciated.
  2. Tests that are run on virtual mongo would show more reliability by using mongodb-memory-server or it's derivatives.

@re-vlad
Copy link
Copy Markdown
Author

re-vlad commented May 29, 2026

  1. TypeScript in migration scripts would be appreciated.

Agreed. I'll track a follow-up to convert scripts/migrate to TS for consistency with the rest of the repo.

@re-vlad re-vlad requested a review from jescalada May 29, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Automated MongoDB Migration Script from v1.19.2 to v2.0.0 upgrade

3 participants