Adding Conformer encoder I/O-styled Transformer encoder by tango4j · Pull Request #15703 · NVIDIA-NeMo/NeMo

tango4j · 2026-05-15T15:58:51Z

What does this PR do?

Follow up work after the initial TF encoder PR (#15661). Many NeMo Speech AI maintainers are asking for the new Transformer Encoder implementations to have pre-encode, positional encoding features in conformer encoder.

Aligns the ASR TransformerEncoder module with the offline ConformerEncoder module surface while preserving Transformer-specific attention parameters and behavior.

Streaming encoder and adapter implementations are not included in this PR. These features will be added later on.

Tested LibriSpeech training with Transformer + CTC (BPE). Added transformer_ctc_bpe.yaml with the default configurations.

Collection: ASR

Changelog

Updated TransformerEncoder to inherit NeMo module/export/access mixins and expose Conformer-style input/output type metadata.
Added Conformer-style offline encoder utilities, including input_example, forward_for_export, forward_internal, bypass_pre_encode, feat_out, positional encoding, pad mask toggling, stochastic depth, and inter-CTC tensor capture.
Added Conformer-style pre-encoder options while preserving the Transformer-native FeatureStacking path as subsampling="feature_stacking".
Moved FeatureStacking into the shared ASR subsampling module so it can be imported from nemo.collections.asr.parts.submodules.subsampling.
Added self_attention_model mirroring Conformer's positional-encoding switch: "rel_pos" (default), "abs_pos", and "no_pos" (None is accepted as a YAML alias for "no_pos").
Implemented Transformer-XL relative PE on FlexAttention — (b)+(d) bias via a score_mod closure, (c) bias folded as Q + pos_bias_u; rel-shift is shared with ConformerEncoder via RelPositionMultiHeadAttention.rel_shift.
Added Transformer encoder tests mirroring relevant Conformer encoder test procedures for stochastic depth and bypass pre-encode behavior.
Added self_attention_model tests, including a T != n_heads regression for pos_bias_{u,v} broadcasting.
Updated Transformer encoder tests to use typed-module keyword arguments and validate output lengths.
Wrapped CPU forward tests in torch.no_grad() so FlexAttention's CPU path doesn't raise under model.train().

Usage

from nemo.collections.asr.modules.transformer_encoder import TransformerEncoder

encoder = TransformerEncoder(
    feat_in=128,
    d_model=512,
    n_heads=8,
    n_layers=17,
    subsampling="feature_stacking",
    subsampling_factor=4,
    self_attention_model="rel_pos",  # one of "rel_pos" | "abs_pos" | "no_pos" (or None)
)

encoded, encoded_len = encoder(audio_signal=features, length=feature_lengths)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Follow up work after the initial TF encoder PR (#15661)

Signed-off-by: taejinp <tango4j@gmail.com>

copy-pr-bot · 2026-05-15T15:58:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

stevehuang52 · 2026-05-15T17:42:20Z

        pre_block_norm: bool = True,
-        subsampling_factor: int = 4,
+        pos_emb_max_len: int = 5000,
+        xscaling: bool = True,


Shall we set default xscaling to False, since we already know that the layernorm will zero-out the effect of xscaling?

Thanks for pointing this out. Setting this with default to False.

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j · 2026-05-16T23:41:19Z

/ok to test 5398604

github-actions · 2026-05-17T00:39:52Z

[🤖]: Hi @tango4j 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

nithinraok · 2026-05-18T12:36:53Z

Thanks Taejin!

Have you had a chance to run training with this? Does it converges similarly with positional embedding enabled and how the results compare to previous runs.

tango4j · 2026-05-19T23:59:38Z

Thanks Taejin!

Have you had a chance to run training with this? Does it converges similarly with positional embedding enabled and how the results compare to previous runs.

@KunalDhawan is working on using this PR part to train his MoE transformer experiments.
If time allows, I will also try Librispeech-only training with several setups to do sanity check (to test the points you mentioned)

Recently, after doing some survey, It appeared to me that convnet frontend and positional encoding can affect the performance a lot. So I think we need to test these two configurations separately (ablations).

Signed-off-by: Taejin Park <tango4j@gmail.com>

tango4j · 2026-05-21T06:18:08Z

@nithinraok
I have tested relative positional encoding with training job. Compared with FastConformer CTC vs Transformer CTC.
Having rel_pos in the Transformer encoder affects the performance a lot (10~20% more error). Need to set this as default.

Also figured that Filterbank Stacking feature is equally good as dw_striding (3 level convnet frontend). Better switching to Filterbank stacking to make this model low precision friendly.

@ipmedenn @KunalDhawan @stevehuang52
Now, transformer is fully equipped with relative positional encoding.

tango4j · 2026-05-21T06:18:36Z

/ok to test 6725930

Signed-off-by: Taejin Park <tango4j@gmail.com>

tango4j · 2026-05-29T19:24:59Z

/ok to test 56423ba

tango4j · 2026-05-29T19:41:48Z

/ok to test 839fd85

tango4j · 2026-05-30T18:36:06Z

/ok to test 3b3ec63

github-actions · 2026-05-31T02:46:01Z

[🤖]: Hi @tango4j 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Adding Conformer encoder style Transformer encoder

f86e346

Signed-off-by: taejinp <tango4j@gmail.com>

github-actions Bot added the ASR label May 15, 2026

tango4j requested review from KunalDhawan, ipmedenn and stevehuang52 May 15, 2026 15:59

stevehuang52 reviewed May 15, 2026

View reviewed changes

Adding final touch up

ba9d40d

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j requested review from nithinraok and pzelasko May 15, 2026 22:39

Fixing Black issue

5398604

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j marked this pull request as ready for review May 15, 2026 22:52

copy-pr-bot Bot temporarily deployed to public May 16, 2026 23:42 Inactive

copy-pr-bot Bot temporarily deployed to test May 16, 2026 23:42 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 23:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 23:46 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 23:49 Inactive

tango4j added 2 commits May 20, 2026 22:59

Adding relative position encoding and transformer-ctc yaml

c19ca03

Signed-off-by: Taejin Park <tango4j@gmail.com>

Apply black formatting

6725930

Signed-off-by: Taejin Park <tango4j@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 21, 2026 06:19 Inactive

copy-pr-bot Bot temporarily deployed to test May 21, 2026 06:20 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 06:22 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 06:23 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 01:32 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 01:33 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 01:36 Inactive

tango4j added 2 commits May 29, 2026 12:22

Fixed transformer tests on minimum n_heads

b5c41c1

Signed-off-by: Taejin Park <tango4j@gmail.com>

Merge branch 'main' into add_tf_encoder_asr

56423ba

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:25 Inactive

copy-pr-bot Bot had a problem deploying to test May 29, 2026 19:26 Error

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:29 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:30 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:33 Inactive

Merge branch 'main' into add_tf_encoder_asr

839fd85

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:42 Inactive

copy-pr-bot Bot temporarily deployed to test May 29, 2026 19:43 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:46 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:47 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:50 Inactive

Merge branch 'main' into add_tf_encoder_asr

3b3ec63

copy-pr-bot Bot temporarily deployed to public May 30, 2026 18:36 Inactive

copy-pr-bot Bot temporarily deployed to test May 30, 2026 18:37 Inactive

copy-pr-bot Bot temporarily deployed to public May 30, 2026 18:40 Inactive

copy-pr-bot Bot temporarily deployed to public May 30, 2026 18:41 Inactive

copy-pr-bot Bot temporarily deployed to public May 30, 2026 18:44 Inactive

Conversation

tango4j commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

stevehuang52 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

tango4j May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tango4j commented May 16, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

nithinraok commented May 18, 2026

Uh oh!

tango4j commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tango4j commented May 21, 2026

Uh oh!

tango4j commented May 21, 2026

Uh oh!

tango4j commented May 29, 2026

Uh oh!

tango4j commented May 29, 2026

Uh oh!

tango4j commented May 30, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tango4j commented May 15, 2026 •

edited

Loading

tango4j May 15, 2026 •

edited

Loading

tango4j commented May 19, 2026 •

edited

Loading