Adapt Mindspeed/Megatron 0.15.3#25
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the mcore bridge to better interoperate with MindSpeed/Megatron 0.15.x Transformer Engine (TE) behavior, especially around LoRA adapter injection and TE debug/compat fields.
Changes:
- Add MindSpeed version detection and a helper to build “local” TE linear layers with MindSpeed 0.15.x-compatible semantics on NPU.
- Make LoRA tensor-parallel group resolution robust to MindSpeed TE variants that store the group under
parallel_groupinstead oftp_group. - Patch
TELinear.__repr__to tolerate MindSpeed 0.15.x field name changes on NPU (input_size/output_size).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/mcore_bridge/tuners/lora.py |
Adds MindSpeed-aware helpers and updates LoRA adapter layer construction / tp group resolution for NPU + MindSpeed 0.15.x. |
src/mcore_bridge/patcher.py |
Updates TE __repr__ patch for MindSpeed NPU field compatibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from contextlib import contextmanager, nullcontext | ||
| from importlib import metadata | ||
| from megatron.core import parallel_state |
There was a problem hiding this comment.
from importlib import metadata introduces a module-level name that collides with the existing metadata parameter used later in sharded_state_dict(...). This shadowing is legal but makes the file harder to read and can lead to accidental misuse of the module vs. the parameter. Consider importing with an alias (e.g., importlib_metadata) and updating _get_mindspeed_version() accordingly.
| # semantics. | ||
| in_features = getattr(self, 'in_features', getattr(self, 'input_size', None)) | ||
| out_features = getattr(self, 'out_features', getattr(self, 'output_size', None)) | ||
| use_bias = getattr(self, 'use_bias', getattr(self, 'bias', None) is not None) |
There was a problem hiding this comment.
On the NPU path, use_bias = getattr(self, 'use_bias', getattr(self, 'bias', None) is not None) can misreport bias when self.bias is a boolean (e.g., False still makes the expression true because it is not None). To make __repr__ robust across TE/MindSpeed variants, compute use_bias by first reading bias_attr = getattr(self, 'bias', None) and handling the boolean case explicitly (otherwise fall back to bias_attr is not None).
| use_bias = getattr(self, 'use_bias', getattr(self, 'bias', None) is not None) | |
| bias_attr = getattr(self, 'bias', None) | |
| if hasattr(self, 'use_bias'): | |
| use_bias = self.use_bias | |
| elif isinstance(bias_attr, bool): | |
| use_bias = bias_attr | |
| else: | |
| use_bias = bias_attr is not None |
There was a problem hiding this comment.
Code Review
This pull request introduces compatibility for MindSpeed 0.15.x on NPU devices by updating the TELinear patched representation and refactoring LoRA layer updates. It adds logic to handle NPU-specific attribute names and introduces helper functions to manage version-dependent linear layer instantiation and tensor-parallel group resolution. Feedback focuses on improving the robustness of tp_size inference in the patched repr and optimizing performance by caching the MindSpeed version lookup.
| parallel_mode = getattr(self, 'parallel_mode', None) | ||
| tp_size = 1 if parallel_mode == 'duplicated' else 'unknown' |
There was a problem hiding this comment.
When tp_size is missing, it is inferred from parallel_mode. However, parallel_mode can be None for local layers (as seen in the GPU path or default initialization), which should also imply a tp_size of 1. Currently, it defaults to 'unknown' in this case.
| parallel_mode = getattr(self, 'parallel_mode', None) | |
| tp_size = 1 if parallel_mode == 'duplicated' else 'unknown' | |
| parallel_mode = getattr(self, 'parallel_mode', None) | |
| tp_size = 1 if parallel_mode in ('duplicated', None) else 'unknown' |
| def _get_mindspeed_version(): | ||
| try: | ||
| return version.parse(metadata.version('mindspeed')) | ||
| except metadata.PackageNotFoundError: | ||
| return None | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
The _get_mindspeed_version function is called multiple times during the LoRA update process. Since metadata.version involves filesystem access and parsing, it is inefficient to call it repeatedly. Caching the result would improve performance.
| def _get_mindspeed_version(): | |
| try: | |
| return version.parse(metadata.version('mindspeed')) | |
| except metadata.PackageNotFoundError: | |
| return None | |
| except Exception: | |
| return None | |
| _MINDSPEED_VERSION = None | |
| def _get_mindspeed_version(): | |
| global _MINDSPEED_VERSION | |
| if _MINDSPEED_VERSION is not None: | |
| return _MINDSPEED_VERSION | |
| try: | |
| _MINDSPEED_VERSION = version.parse(metadata.version('mindspeed')) | |
| except (metadata.PackageNotFoundError, Exception): | |
| _MINDSPEED_VERSION = False | |
| return _MINDSPEED_VERSION if _MINDSPEED_VERSION is not False else None |
No description provided.