Conversation
There was a problem hiding this comment.
Code Review
This pull request updates gpt_bridge.py to refine PEFT weight processing and weight exporting. Review feedback highlights that the new v is not None check in export_weights causes key name mismatches in distributed environments and should be reverted. Additionally, the adapter name matching logic in _set_module is too broad and should be updated to include a leading dot for more precise matching.
| with torch.no_grad(): | ||
| for k, v in self._convert(mg_models, {}, hf_prefix, False, tqdm_desc=tqdm_desc): | ||
| if converter: | ||
| if converter and v is not None: |
There was a problem hiding this comment.
Skipping the converter when v is None causes inconsistent key names across different ranks in a distributed environment. When only_master_rank=True is used, non-master ranks yield v=None. If the converter is skipped on these ranks, they will yield the original Megatron keys, while the master rank yields the converted (HuggingFace) keys. This mismatch can break downstream components that expect consistent keys across all ranks. The converter should be called even when v is None, and it is the responsibility of the converter implementation to handle None values.
| if converter and v is not None: | |
| if converter: |
| for k, v in hf_state_dict.items(): | ||
| if self._peft_format: | ||
| if '.lora_A.' in k or '.lora_B.' in k or '.modules_to_save.' in k: | ||
| if ('.lora_A.' in k or '.lora_B.' in k or '.modules_to_save.' in k) and f'{self._adapter_name}.' in k: |
There was a problem hiding this comment.
The check f'{self._adapter_name}.' in k is potentially too broad and could lead to incorrect matches if one adapter name is a prefix of another (e.g., default and default_new). Since keys are dot-separated, it is safer to include the leading dot in the check to ensure it matches the specific adapter component.
| if ('.lora_A.' in k or '.lora_B.' in k or '.modules_to_save.' in k) and f'{self._adapter_name}.' in k: | |
| if ('.lora_A.' in k or '.lora_B.' in k or '.modules_to_save.' in k) and f'.{self._adapter_name}.' in k: |
No description provided.