Feat/git by suluyana · Pull Request #896 · modelscope/ms-agent

suluyana · 2026-04-07T12:36:37Z

Change Summary

Related issue number

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Run pre-commit install and pre-commit run --all-files before git commit, and passed lint check.
Documentation reflects the changes where applicable

- Add snapshot.py: isolated git repo under output_dir/.ms_agent_snapshots, stores message_count per commit for history truncation on rollback - Auto-snapshot on every user turn via on_task_begin (enable_snapshots=True by default) - Add list_snapshots()/rollback() to Agent base and LLMAgent: rollback restores files, truncates saved history, clears _read_cache - Refactor filesystem_tool.py: remove replace_file_contents, rewrite edit_file with old_string→new_string exact replace, quote-normalization fallback, smart delete, trailing-whitespace strip, staleness check, multi-type read (images/binary), read dedup cache - Add smoke tests (20 cases, all offline)

gemini-code-assist

Code Review

This pull request introduces a lightweight snapshot and rollback system for agent output directories using an isolated Git repository, integrated with automatic snapshotting during task initiation. It also refactors the FileSystemTool to consolidate operations into read_file, write_file, and edit_file, incorporating staleness checks and image support. Review feedback suggests improving the robustness of Git hash retrieval, clarifying line-numbering behavior in tool descriptions, and reconsidering the strictness of staleness checks and automatic whitespace stripping. Additionally, more comprehensive error handling for missing Git binaries is recommended.

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/utils/snapshot.py

+        result = _git(['commit', '-m', subject],
+                      work_tree=output_dir, git_dir=git_dir)
+
+        commit_hash = None
+        for line in result.stdout.splitlines():
+            if line.startswith('['):
+                before_bracket = line.split(']')[0]
+                commit_hash = before_bracket.split()[-1]
+                break
+        if commit_hash is None:
+            commit_hash = 'ok'


Parsing the commit hash from the git commit output is fragile as it depends on the git version and system locale. A more robust approach is to use git rev-parse --short HEAD after a successful commit. Additionally, defaulting the hash to 'ok' if parsing fails will cause subsequent restore_snapshot calls to fail, as 'ok' is not a valid git reference.

Suggested change

result = _git(['commit', '-m', subject],

work_tree=output_dir, git_dir=git_dir)

commit_hash = None

for line in result.stdout.splitlines():

if line.startswith('['):

before_bracket = line.split(']')[0]

commit_hash = before_bracket.split()[-1]

break

if commit_hash is None:

commit_hash = 'ok'

_git(['commit', '-m', subject], work_tree=output_dir, git_dir=git_dir)

commit_hash = _git(['rev-parse', '--short', 'HEAD'],

work_tree=output_dir, git_dir=git_dir).stdout.strip()

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/tools/filesystem_tool.py

+                    description=(
+                        'Read the content of one or more files.\n\n'
+                        '- `paths`: list of relative file paths to read.\n'
+                        '- For image files (png/jpg/jpeg/gif/webp), returns base64-encoded content.\n'
+                        '- `offset`: line number to start reading from (1-based). '
+                        'Only effective when paths has exactly one element. Omit to read from the beginning.\n'
+                        '- `limit`: number of lines to read. '
+                        'Only effective when paths has exactly one element. Omit to read to the end.\n'
+                        '- `abbreviate`: if true, use an LLM to return a condensed summary of each file '
+                        'instead of the raw content. Cached after first call. '
+                        'Use this for a quick structural overview; read the full file if more detail is needed.'
+                    ),


The read_file implementation (lines 401-404) prepends line numbers and tabs to each line of the output. However, this behavior is not mentioned in the tool's description. This is critical because the LLM needs to know that the returned text contains metadata (line numbers) that are not part of the actual file content, especially since edit_file requires exact string matching.

Suggested change

description=(

'Read the content of one or more files.\n\n'

'- `paths`: list of relative file paths to read.\n'

'- For image files (png/jpg/jpeg/gif/webp), returns base64-encoded content.\n'

'- `offset`: line number to start reading from (1-based). '

'Only effective when paths has exactly one element. Omit to read from the beginning.\n'

'- `limit`: number of lines to read. '

'Only effective when paths has exactly one element. Omit to read to the end.\n'

'- `abbreviate`: if true, use an LLM to return a condensed summary of each file '

'instead of the raw content. Cached after first call. '

'Use this for a quick structural overview; read the full file if more detail is needed.'

),

description=(

'Read the content of one or more files. Returns content with line numbers prepended (e.g., "1\tcontent").\n\n'

'- `paths`: list of relative file paths to read.\n'

'- For image files (png/jpg/jpeg/gif/webp), returns base64-encoded content.\n'

'- `offset`: line number to start reading from (1-based). '

'Only effective when paths has exactly one element. Omit to read from the beginning.\n'

'- `limit`: number of lines to read. '

'Only effective when paths has exactly one element. Omit to read to the end.\n'

'- `abbreviate`: if true, use an LLM to return a condensed summary of each file '

'instead of the raw content. Cached after first call. '

'Use this for a quick structural overview; read the full file if more detail is needed.'

),

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/tools/filesystem_tool.py

+                        'Edit an existing file by replacing an exact string with new content.\n\n'
+                        'You must provide the exact text to find (`old_string`) and the replacement (`new_string`).\n'
+                        '`old_string` must match the file content EXACTLY — including whitespace and line breaks.\n'
+                        'If `old_string` appears multiple times and `replace_all` is false, the call will fail '
+                        'with the match count so you can add more context to make it unique.\n\n'
+                        'Special case — `old_string=""`:\n'
+                        '- File does not exist: creates the file with `new_string` as its content.\n'
+                        '- File exists and is empty: fills it with `new_string`.\n'
+                        '- File exists and has content: returns an error. Use `write_file` for a full rewrite.'
+                    ),


Since read_file now returns content with line numbers, it is highly likely that the LLM will mistakenly include these line numbers in the old_string parameter. The description should explicitly warn the model to strip line numbers and tabs before using text as an anchor for edit_file.

description=( 'Edit an existing file by replacing an exact string with new content.\n\n' 'You must provide the exact text to find (`old_string`) and the replacement (`new_string`).\n' 'IMPORTANT: `old_string` must match the file content EXACTLY — including whitespace and line breaks. ' 'Do NOT include the line numbers or tabs returned by `read_file` in `old_string`.\n' 'If `old_string` appears multiple times and `replace_all` is false, the call will fail ' 'with the match count so you can add more context to make it unique.\n\n' 'Special case — `old_string=""`:\n' '- File does not exist: creates the file with `new_string` as its content.\n' '- File exists and is empty: fills it with `new_string`.\n' '- File exists and has content: returns an error. Use `write_file` for a full rewrite.' ),

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/tools/filesystem_tool.py

+            if not is_markdown:
+                actual_new = self._strip_trailing_whitespace(actual_new)



Automatically stripping trailing whitespace from new_string for all non-markdown files is dangerous. While it helps clean up LLM hallucinations, it can break code that relies on specific formatting or trailing spaces within string literals (e.g., multi-line strings in Python). This should be handled more conservatively or removed to ensure data integrity.

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/tools/filesystem_tool.py

+            err = self._check_staleness(real_path)
+            if err:
+                return err


Enforcing a staleness check in write_file means an agent cannot overwrite an existing file unless it has been read in the current session. This is a significant restriction for a tool intended for full rewrites and will increase latency by forcing unnecessary read operations. Consider allowing write_file to proceed without a prior read, as the intent is a complete replacement anyway.

gemini-code-assist · 2026-04-07T12:38:46Z

ms_agent/utils/snapshot.py

+        result = _git(
+            ['log', '--pretty=format:%h\t%ai\t%s'],
+            work_tree=output_dir,
+            git_dir=git_dir,
+            check=False,
+        )
+        if result.returncode != 0:


If the git executable is not found in the system PATH, this call will raise a FileNotFoundError, which is not caught here. While take_snapshot handles this, list_snapshots and restore_snapshot should also gracefully handle the absence of git to avoid crashing the agent.

suluyan added 7 commits February 6, 2026 15:03

fix: video gen exclude edit_file

6baa994

Merge branch 'main' of https://github.com/modelscope/ms-agent

9c8205c

Merge branch 'main' of https://github.com/modelscope/ms-agent

94b833f

Merge branch 'main' of https://github.com/modelscope/ms-agent

dd3f185

Merge branch 'main' of https://github.com/modelscope/ms-agent

e6ad3e9

Merge branch 'main' of https://github.com/modelscope/ms-agent

49919ab

suluyana had a problem deploying to testci April 7, 2026 12:36 — with GitHub Actions Error

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/git#896

Feat/git#896
suluyana wants to merge 7 commits intomodelscope:mainfrom
suluyana:feat/git

suluyana commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not is_markdown:
		actual_new = self._strip_trailing_whitespace(actual_new)

Conversation

suluyana commented Apr 7, 2026

Change Summary

Related issue number

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant