Hi! I'm working on circuit discovery for low-resource Southeast Asian languages (e.g., Indonesian, Malay, Javanese, Sundanese) and would like to use TransformerLens for multilingual experiments.
It would be great if mT5 could be supported, starting with something like google/mt5-small. Since T5 is already included in TransformerLens, and mT5 follows the same architecture (encoder-decoder with relative positional encodings and shared embeddings), I was wondering if the current implementation could be extended to include mT5 with some adjustments.
In addition, Aya 101 by Cohere is built on mT5-XXL, so having mT5 support would enable further work with that model as well.
Let me know if this might be possible to include — or is there any guide so that I could try integrating it on my own?
Thanks again for building such an excellent research tool!
Checklist
Hi! I'm working on circuit discovery for low-resource Southeast Asian languages (e.g., Indonesian, Malay, Javanese, Sundanese) and would like to use TransformerLens for multilingual experiments.
It would be great if mT5 could be supported, starting with something like google/mt5-small. Since T5 is already included in TransformerLens, and mT5 follows the same architecture (encoder-decoder with relative positional encodings and shared embeddings), I was wondering if the current implementation could be extended to include mT5 with some adjustments.
In addition, Aya 101 by Cohere is built on mT5-XXL, so having mT5 support would enable further work with that model as well.
Let me know if this might be possible to include — or is there any guide so that I could try integrating it on my own?
Thanks again for building such an excellent research tool!
Checklist