Skip to content

Improve linked symbol anchors with tree-sitter fallback#314864

Merged
dmitrivMS merged 8 commits intomicrosoft:mainfrom
pranavvaid-ac:pranav/copilot-symbol-linkifier-fallback
May 10, 2026
Merged

Improve linked symbol anchors with tree-sitter fallback#314864
dmitrivMS merged 8 commits intomicrosoft:mainfrom
pranavvaid-ac:pranav/copilot-symbol-linkifier-fallback

Conversation

@pranavvaid-ac
Copy link
Copy Markdown
Contributor

Adds an eager tree-sitter fallback for linked-backtick symbol references so links open near the referenced symbol even before an LSP DocumentSymbolProvider result is available. Before this PR, links to referenced symbols would just take you to the top of the file if an LSP was not available.

Changes:

  • Resolve initial linked symbol locations from tree-sitter when LSP symbols are unavailable.
  • Prefer declaration-like tree-sitter symbols before generic identifier matches.
  • Cache parsed file symbols during linkification to avoid repeated parsing for the same file.
  • Keep the existing LSP resolve path so symbol kind and more precise document-symbol locations can still upgrade later.
  • Add tests for exact matches, qualified-name fallback, parser caching, linkifier integration, and LSP kind/location upgrades.

Copilot AI review requested due to automatic review settings May 6, 2026 22:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an eager Tree-sitter-based fallback when linkifying linked-backtick symbol references so that symbol links can open near the referenced declaration even before vscode.executeDocumentSymbolProvider (LSP) results are available, while still allowing LSP-based kind/location upgrades later.

Changes:

  • Inject IParserService into SymbolLinkifier and resolve an initial best-effort symbol location via Tree-sitter (with per-linkify caching).
  • Add findSymbolLocationInFile to parse/query per-file symbols, preferring declaration-like matches and falling back to generic symbol/qualified-name matches.
  • Export extractSymbolNamesInCode and add tests covering Tree-sitter fallback behavior, caching, and LSP upgrade behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
extensions/copilot/src/extension/linkify/vscode-node/symbolLinkifier.ts Uses Tree-sitter-derived initial symbol locations (cached per run) and preserves the existing LSP upgrade path.
extensions/copilot/src/extension/linkify/vscode-node/findWord.ts Implements Tree-sitter symbol location lookup with declaration preference, qualified-name fallback, and a per-URI symbol cache.
extensions/copilot/src/extension/linkify/vscode-node/findSymbol.ts Exposes extractSymbolNamesInCode for reuse by Tree-sitter fallback logic.
extensions/copilot/src/extension/linkify/test/vscode-node/symbolLinkifier.test.ts Adds coverage for initial Tree-sitter locations, cache reuse, and LSP upgrade behavior.
extensions/copilot/src/extension/linkify/test/vscode-node/findWord.test.ts Adds unit tests for findSymbolLocationInFile, including declaration preference, qualified fallback, and cache behavior.

Comment on lines +60 to +86
function ensureTestWorkspace(): MutableTestWorkspace {
const testVscode = vscode as unknown as { workspace?: PartialMutableTestWorkspace };
testVscode.workspace ??= {};
testVscode.workspace.textDocuments ??= [];
testVscode.workspace.fs ??= {};
testVscode.workspace.fs.readFile ??= (async () => { throw new Error('workspace.fs.readFile not mocked in test'); }) as typeof vscode.workspace.fs.readFile;
return testVscode.workspace as MutableTestWorkspace;
}

function ensureTestCommands(): MutableTestCommands {
const testVscode = vscode as unknown as { commands?: Partial<MutableTestCommands> };
testVscode.commands ??= {};
testVscode.commands.executeCommand ??= (async () => undefined) as typeof vscode.commands.executeCommand;
return testVscode.commands as MutableTestCommands;
}

const testWorkspace = ensureTestWorkspace();
const testCommands = ensureTestCommands();
const originalWorkspaceReadFile = vscode.workspace.fs.readFile;
const originalWorkspaceTextDocuments = vscode.workspace.textDocuments;
const originalExecuteCommand = vscode.commands.executeCommand;

afterEach(() => {
testWorkspace.textDocuments = originalWorkspaceTextDocuments;
testWorkspace.fs.readFile = originalWorkspaceReadFile;
testCommands.executeCommand = originalExecuteCommand;
});
Comment on lines +16 to +84
class TestParserService implements Partial<IParserService> {
public parseCount = 0;
public genericSymbolQueryCount = 0;

constructor(
private readonly symbols: readonly TreeSitterExpressionInfo[] = [],
private readonly classDeclarations: readonly TreeSitterExpressionInfo[] = [],
private readonly functionDefinitions: readonly TreeSitterExpressionInfo[] = [],
private readonly typeDeclarations: readonly TreeSitterExpressionInfo[] = [],
) { }

getTreeSitterASTForWASMLanguage(_language: WASMLanguage, _source: string): TreeSitterAST {
this.parseCount++;
const symbols = this.symbols;
const classDeclarations = this.classDeclarations;
const functionDefinitions = this.functionDefinitions;
const typeDeclarations = this.typeDeclarations;
return {
getClassDeclarations: async () => classDeclarations,
getFunctionDefinitions: async () => functionDefinitions,
getTypeDeclarations: async () => typeDeclarations,
getSymbols: async () => {
this.genericSymbolQueryCount++;
return symbols;
},
} as unknown as TreeSitterAST;
}
}

interface MutableTestWorkspace {
textDocuments: typeof vscode.workspace.textDocuments;
fs: {
readFile: typeof vscode.workspace.fs.readFile;
};
}

interface PartialMutableTestWorkspace {
textDocuments?: MutableTestWorkspace['textDocuments'];
fs?: Partial<MutableTestWorkspace['fs']>;
}

function ensureTestWorkspace(): MutableTestWorkspace {
const testVscode = vscode as unknown as { workspace?: PartialMutableTestWorkspace };
testVscode.workspace ??= {};
testVscode.workspace.textDocuments ??= [];
testVscode.workspace.fs ??= {};
testVscode.workspace.fs.readFile ??= (async () => { throw new Error('workspace.fs.readFile not mocked in test'); }) as typeof vscode.workspace.fs.readFile;
return testVscode.workspace as MutableTestWorkspace;
}

const testWorkspace = ensureTestWorkspace();
const originalWorkspaceReadFile = vscode.workspace.fs.readFile;
const originalWorkspaceTextDocuments = vscode.workspace.textDocuments;

afterEach(() => {
testWorkspace.textDocuments = originalWorkspaceTextDocuments;
testWorkspace.fs.readFile = originalWorkspaceReadFile;
});

function setWorkspaceFileContents(contentsByUri: ReadonlyMap<string, string>): void {
testWorkspace.textDocuments = [];
testWorkspace.fs.readFile = async (uri: vscode.Uri) => {
const contents = contentsByUri.get(uri.toString());
if (contents === undefined) {
throw new Error(`File not found: ${uri.toString()}`);
}
return new TextEncoder().encode(contents);
};
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be worth addressing as I saw this as well

Comment on lines +57 to +84
function ensureTestWorkspace(): MutableTestWorkspace {
const testVscode = vscode as unknown as { workspace?: PartialMutableTestWorkspace };
testVscode.workspace ??= {};
testVscode.workspace.textDocuments ??= [];
testVscode.workspace.fs ??= {};
testVscode.workspace.fs.readFile ??= (async () => { throw new Error('workspace.fs.readFile not mocked in test'); }) as typeof vscode.workspace.fs.readFile;
return testVscode.workspace as MutableTestWorkspace;
}

const testWorkspace = ensureTestWorkspace();
const originalWorkspaceReadFile = vscode.workspace.fs.readFile;
const originalWorkspaceTextDocuments = vscode.workspace.textDocuments;

afterEach(() => {
testWorkspace.textDocuments = originalWorkspaceTextDocuments;
testWorkspace.fs.readFile = originalWorkspaceReadFile;
});

function setWorkspaceFileContents(contentsByUri: ReadonlyMap<string, string>): void {
testWorkspace.textDocuments = [];
testWorkspace.fs.readFile = async (uri: vscode.Uri) => {
const contents = contentsByUri.get(uri.toString());
if (contents === undefined) {
throw new Error(`File not found: ${uri.toString()}`);
}
return new TextEncoder().encode(contents);
};
}
eleanorjboyd
eleanorjboyd previously approved these changes May 6, 2026
@pranavvaid-ac
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@microsoft-github-policy-service agree

@dmitrivMS dmitrivMS self-requested a review May 6, 2026 23:24
@dmitrivMS
Copy link
Copy Markdown
Contributor

@pranavvaid-ac Unit-tests are failing - please see the logs.

Copy link
Copy Markdown
Contributor

@dmitrivMS dmitrivMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failing

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 3

Comment thread extensions/copilot/src/extension/linkify/vscode-node/symbolLinkifier.ts Outdated
Comment thread extensions/copilot/src/extension/linkify/vscode-node/findWord.ts
Comment thread extensions/copilot/src/extension/linkify/test/vscode-node/util.ts
@dmitrivMS
Copy link
Copy Markdown
Contributor

@pranavvaid-ac Please take a look at the new Copilot comments.

Also on perf from AI - what are your thoughts on below?

The PR adds non-trivial work to a code path that previously was constant time (new Location(uri, (0,0))).

  1. Per-streamed-chunk cache thrashing. SymbolLinkifier.linkify() creates a fresh symbolFileCache: SymbolFileCache = new Map() on every call. Chat responses stream and linkify() is invoked on each appended chunk, so a long response with the same Foo link appearing in 30 chunks will reparse src/file.ts 30 times. The fix is straightforward: hoist the cache to LinkifierContext (the context parameter) or to the IContributedLinkifier instance for the request.

  2. Sequential await inside the match loop. symbolLinkifier.ts already awaits resolveInWorkspace per match; this PR adds another await findSymbolLocationInFile(...) in the same loop. For a response with N links to N distinct files, parsing now serializes. A Promise.all over matches would parallelize the new I/O.

  3. Eager three-query fan-out per file. doGetFileSymbols always runs getClassDeclarations, getFunctionDefinitions, and getTypeDeclarations in parallel even when the file is small or the eventual match is in the first list. This is fine — Promise.all runs them concurrently — but for very large files all three traversals over the WASM tree happen up front.

  4. Disk read on every uncached file. openDocument falls back to vscode.workspace.fs.readFile if the document isn't open. This is new I/O on the streaming path. If the LSP executeDocumentSymbolProvider was going to upgrade the location anyway (the file is open and an LSP is available), this work is wasted.

  5. Whole-file getSymbols range. new vscode.Range(0, 0, Number.MAX_SAFE_INTEGER, 0) — pre-existing pattern in this file, but worth noting it scans the entire file when declarations don't match.

AI Suggestion
Hoist symbolFileCache to [LinkifierContext] scope (or the [SymbolLinkifier] instance for the request) so streamed chunks of a single response share parsed symbols, and consider parallelizing the per-match work inside the [for…of text.matchAll(...)] loop.

Comment thread extensions/copilot/src/extension/linkify/vscode-node/symbolLinkifier.ts Outdated
@pranavvaid-ac
Copy link
Copy Markdown
Contributor Author

@pranavvaid-ac Please take a look at the new Copilot comments.

Also on perf from AI - what are your thoughts on below?

The PR adds non-trivial work to a code path that previously was constant time (new Location(uri, (0,0))).

  1. Per-streamed-chunk cache thrashing. SymbolLinkifier.linkify() creates a fresh symbolFileCache: SymbolFileCache = new Map() on every call. Chat responses stream and linkify() is invoked on each appended chunk, so a long response with the same Foo link appearing in 30 chunks will reparse src/file.ts 30 times. The fix is straightforward: hoist the cache to LinkifierContext (the context parameter) or to the IContributedLinkifier instance for the request.
  2. Sequential await inside the match loop. symbolLinkifier.ts already awaits resolveInWorkspace per match; this PR adds another await findSymbolLocationInFile(...) in the same loop. For a response with N links to N distinct files, parsing now serializes. A Promise.all over matches would parallelize the new I/O.
  3. Eager three-query fan-out per file. doGetFileSymbols always runs getClassDeclarations, getFunctionDefinitions, and getTypeDeclarations in parallel even when the file is small or the eventual match is in the first list. This is fine — Promise.all runs them concurrently — but for very large files all three traversals over the WASM tree happen up front.
  4. Disk read on every uncached file. openDocument falls back to vscode.workspace.fs.readFile if the document isn't open. This is new I/O on the streaming path. If the LSP executeDocumentSymbolProvider was going to upgrade the location anyway (the file is open and an LSP is available), this work is wasted.
  5. Whole-file getSymbols range. new vscode.Range(0, 0, Number.MAX_SAFE_INTEGER, 0) — pre-existing pattern in this file, but worth noting it scans the entire file when declarations don't match.

AI Suggestion Hoist symbolFileCache to [LinkifierContext] scope (or the [SymbolLinkifier] instance for the request) so streamed chunks of a single response share parsed symbols, and consider parallelizing the per-match work inside the [for…of text.matchAll(...)] loop.

@dmitrivMS

Addressed points 1, 2, and 5. Points 1 and 5 were related to the other review comments, so addressing those also fixed that. Point 2 makes sense as well - I was relying on the performance of the InlineCodeSymbolLinkifier as a baseline since it also uses a tree sitter, and the implementation for that also parallelizes the matches the same way. I made my implementation also follow this pattern.

On point 3, the three declaration queries let us prefer the declaration-kind matches over generic symbol references which is a better user experience. Performance wise I'm not too worried since the declaration queries are per file and cached for the response.

On point 4, this is consistent with the behavior of the InlineCodeSymbolLinkifier so I would expect similar performance to that. It's hard to tell ahead of time whether the LSP will return useful results without invoking it, so this allows us to make the initial anchor better faster while we wait for the LSP to resolve.

@dmitrivMS dmitrivMS enabled auto-merge (squash) May 9, 2026 22:03
@dmitrivMS dmitrivMS merged commit 9384620 into microsoft:main May 10, 2026
39 of 40 checks passed
@vs-code-engineering vs-code-engineering Bot added this to the 1.120.0 milestone May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants