Discrete diffusion in diffusers #12911

kashif · 2026-01-04T23:33:53Z

What does this PR do?

Add experimental support for discrete token diffusion methods and pipeline

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yiyixuxu · 2026-01-15T21:29:23Z

Thanks for this PR!
cc @dg845, can you take a look here? it's related to Dream 7B #12091 you are working on

dg845 · 2026-01-23T01:14:47Z

Thanks for the PR! Some preliminary design questions and comments:

I think it could be useful to have a natural place to implement logic which is common to discrete diffusion models. Would something like a DiscreteDiffusionPipelineMixin make sense? For example, I think _resolve_start_token_id, _normalize_prefix_ids, _top_p_filtering, etc. could be candidates as mixin methods. (A possible alternative could be to put the methods in DiffusionPipeline, but it feels a little weird to put the methods there because they aren't applicable to continuous diffusion models.) But maybe this is premature, since we might not know what logic will end up being useful for all (or most) discrete diffusion models.
1. One motivation for this is that we often want to do semi-autoregressive (SAR) sampling for discrete diffusion models, so it would be useful to have autoregressive sampling techniques such as top-$p$ sampling, top-$k$ sampling, etc. So I think it would be nice to have a place where these methods can be implemented and tested once, and then new discrete diffusion models that support SAR sampling can have easy access to them without having to copy them every time.

Similarly, would it make sense to have a TokenizerTextProcessor class which handles text pre-processing and and post-processing, analogous to how VaeImageProcessor handles image pre- and post-processing? It's probably less necessary as we don't need to do as much normalization as for images, but I could see this being useful for handling e.g. chat templates like in the SDAR and LLaDA 2 pipelines.

As an aside, this could also be useful for existing (continuous) diffusion models, some of which have pretty involved text processing, such as pipelines like SanaPipeline that use a _text_preprocessing method:

diffusers/src/diffusers/pipelines/sana/pipeline_sana.py

Lines 548 to 549 in d4f97d1

    
           # Copied from diffusers.pipelines.deepfloyd_if.pipeline_if.IFPipeline._text_preprocessing 
        
           def _text_preprocessing(self, text, clean_caption=False):

Currently it looks like the pipelines only support denoising models with a transformers-like interface. But we would probably want to implement some discrete diffusion transformers in diffusers, which currently doesn't enforce that interface. So I think we should think about how we can handle both cases gracefully in discrete diffusion pipelines. (One solution could be to simply adopt the transformers interface for all discrete denoising models in diffusers, but that could be unnecessarily restrictive.)

dg845 · 2026-01-23T01:20:25Z