Skip to content

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Jan 23, 2026

Proposed changes

This PR updates the preshuffle format for ABQuant to align with old-ck and AITER implementations by modifying tensor distribution logic, adjusting warp GEMM configurations, and refactoring instance factory functions.

  • Modified tensor distribution calculations to support variable access patterns based on data type sizes
  • Refactored instance factory functions from explicit declarations to static lambda-based initialization
  • Corrected function naming from get_n_words_per_128b() to get_n_dwords_per_128b()

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

@DDEle DDEle marked this pull request as ready for review January 26, 2026 07:39
@DDEle DDEle changed the title Abquant new preshuffle [CK_TILE] ABQuant New Preshuffle Jan 26, 2026
@DDEle DDEle requested a review from Copilot January 27, 2026 08:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the preshuffle format for ABQuant to align with old-ck and AITER implementations by modifying tensor distribution logic, adjusting warp GEMM configurations, and refactoring instance factory functions.

Changes:

  • Modified tensor distribution calculations to support variable access patterns based on data type sizes
  • Refactored instance factory functions from explicit declarations to static lambda-based initialization
  • Corrected function naming from get_n_words_per_128b() to get_n_dwords_per_128b()

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
gemm_wp_abquant_pipeline_ag_bg_cr_base_policy.hpp Updated KBPerLoad calculation and warp GEMM configuration with NumAccess parameter
gemm_quant_kernel.hpp Modified tensor view creation to use separate K split dimensions
wp_pipeline_agmem_bgmem_creg_base_policy.hpp Added KAccess calculation and updated tile distribution encoding
gemm_universal_pipeline_ag_bg_cr_policy.hpp Renamed LDS bank width calculation function
tensor_shuffle_utils.hpp Refactored shuffle logic with new access pattern calculation
arch.hpp Renamed function to better reflect it returns dword count
run_gemm_quant_example.inc Simplified K_split calculation and fixed return values
gemm_utils.hpp Added global kernel lookup table function
gemm_quant_*.cpp Converted instance factories to static initialization pattern
CMakeLists.txt Added compiler flag to suppress global constructor warnings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DDEle DDEle enabled auto-merge (squash) January 28, 2026 07:46
@DDEle DDEle merged commit 8e3d84a into develop Jan 28, 2026
23 checks passed
@DDEle DDEle deleted the abquant-new-preshuffle branch January 28, 2026 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants