Skip to content

Commit 4d74ac0

Browse files
fix: add warning for TinyStories n_ctx mismatch
TinyStories models were trained with sequence length 512, but HuggingFace config claims n_ctx=2048. This causes performance degradation for sequences >512 tokens. Added warning to alert users of this limitation. Note: We cannot change n_ctx in the config because the pretrained weights have positional embeddings for 2048 positions. Changing n_ctx would break weight loading. Fixes #492
1 parent 7df72ff commit 4d74ac0

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

transformer_lens/loading_from_pretrained.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1970,6 +1970,15 @@ def convert_hf_model_config(model_name: str, **kwargs: Any):
19701970
cfg_dict["tokenizer_name"] = official_model_name
19711971
if kwargs.get("trust_remote_code", False):
19721972
cfg_dict["trust_remote_code"] = True
1973+
# Warn about TinyStories n_ctx mismatch (trained with seq_len=512, but HF config has 2048)
1974+
# The weights have pos_embed for 2048, so we can't change n_ctx without breaking loading
1975+
# See: https://github.com/TransformerLensOrg/TransformerLens/issues/492
1976+
if official_model_name.startswith("roneneldan/TinyStories"):
1977+
logging.warning(
1978+
f"TinyStories models were trained with max sequence length 512, but the HuggingFace "
1979+
f"config reports n_ctx=2048. Performance degrades significantly for sequences longer "
1980+
f"than 512 tokens. See https://github.com/TransformerLensOrg/TransformerLens/issues/492"
1981+
)
19731982
return cfg_dict
19741983

19751984

0 commit comments

Comments
 (0)