Skip to content

[Proposal] Support mT5 Models #912

@KN-Aini

Description

@KN-Aini

Hi! I'm working on circuit discovery for low-resource Southeast Asian languages (e.g., Indonesian, Malay, Javanese, Sundanese) and would like to use TransformerLens for multilingual experiments.

It would be great if mT5 could be supported, starting with something like google/mt5-small. Since T5 is already included in TransformerLens, and mT5 follows the same architecture (encoder-decoder with relative positional encodings and shared embeddings), I was wondering if the current implementation could be extended to include mT5 with some adjustments.

In addition, Aya 101 by Cohere is built on mT5-XXL, so having mT5 support would enable further work with that model as well.

Let me know if this might be possible to include — or is there any guide so that I could try integrating it on my own?

Thanks again for building such an excellent research tool!

Checklist

  • I have checked that there is no similar issue in the repo (required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions