refactor : configurations, and introduce CUDA features (#53) by Eamon2009 · Pull Request #53 · Eamon2009/Quadtrix.cpp

Eamon2009 · 2026-05-25T05:13:18Z

Summary

Causal Multi-Head Attention Forward Pass (CUDA)

PR implements the CUDA forward pass for causal multi-head attention (attention_forward). It includes the core GPU kernel, custom block-level reduction primitives, and tensor validation helpers.

Core Attention Kernelattention_forward_kernel:

Computes scaled dot-product attention on an interleaved QKV input tensor structured as [Batch, Time, 3 * Channels].
Causal Masking: Enforces autoregressive constraints by preventing tokens from attending to future time steps ($t2 > t$).
Implements parallelized block_max and block_sum device functions.
Leverages cooperative warp shuffles (warp_max, warp_sum) and shared memory to handle stable online softmax normalization

#52
#11
#12
#14
#29

- Increase MAX_SESSIONS to 1000 and set SESSION_TTL_HOURS to 24. - Update CPP_SERVER_URL to point to localhost:8080. - Fix a typo in TORCH_CHECKPOINT_PATH (removed trailing space before extension). - Set REQUEST_TIMEOUT_SECONDS to 60.

- Sync default field values with new environment baseline. - Remove duplicate `torch_checkpoint_path` definition and fix the trailing space typo in the file extension. - Update `request_timeout_seconds` to a float default of 60.0.

Implement dataset evaluation logic (`estimate_loss`) and standard training loops. - Add an interactive `--chat` interface mode utilizing token stream generation. - Configure automatic hardware routing between CUDA and CPU execution environments.

…urations - Update environment variables configuration and fix the trailing space typo in TORCH_CHECKPOINT_PATH. - Remove duplicate definition of torch_checkpoint_path in backend/config.py. - Decrease evaluation intervals (EVAL_INTERVAL) in the C++ engine for quicker validation tracking. - Add LibTorch C++ execution and interactive chat stream handler in torch_main.cpp. - Implement state migration (v1) in frontend settings store to default clients to the PyTorch backend.

- Remove the legacy typo (trailing space before extension) from the PyTorch checkpoint string in the header's logic.

- Update the tokenizer configuration in `llm.py` to use the `o200k_base` encoding. - Expand tokenization capabilities and adjust the `vocab_size` dynamically to support the updated vocabulary baseline.

… architecture - Replace static `generate_response` with a generator-based `stream_response` utilizing token yielding. - Update `GPTLanguageModel.generate` to act as an iterator yielding sequential token IDs instead of returning a complete array. - Implement token-by-token decoding (`decode([token_id])`) to support real-time user-interface updates. - Keep `generate_response` as a backward-compatible utility that aggregates the token stream.

- Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean - Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean - Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

Ignore specific paths for push and pull request events.

…copy utilities

…py utilities

…imitives

- Add `adamw_kernel` for parallelized parameter updates on CUDA. - Implement host side `adamw_update` with shape and parameter validation. - Guard device transitions using `DeviceGuard`.

- Add `attention_forward_kernel` supporting casual masking and online softmax. - Implement parallelized block reduction helpers `block_sum` and `block_max`. - Include shared memory utilization for intermediate warp-level reduction. - Add safety checks for contiguous F32 CUDA tensors and integer bounds.

…at and unused dependencies

codeaddict-119 · 2026-05-25T05:17:28Z

Eamon2009 added 30 commits May 24, 2026 18:37

chore(train): decrease evaluation interval for faster tracking

ba26fdb

Delete chat.txt

c2f8c40

Delete torch_example.cpp

c197072

Delete torch_main.cpp

0c93c42

refactor(ui): update backend tooltip and fix checkpoint filename typo

6c5a27d

- Remove the legacy typo (trailing space before extension) from the PyTorch checkpoint string in the header's logic.

feat(model): switch tokenizer to o200k_base for expanded vocabulary

62e7b52

- Update the tokenizer configuration in `llm.py` to use the `o200k_base` encoding. - Expand tokenization capabilities and adjust the `vocab_size` dynamically to support the updated vocabulary baseline.

refactor(build): update Vite script execution with configLoader flag

7f4701a

refactor(main): cleaned code resolved file path added new parameters

61c8f2f

Refactor Docker image name handling in workflow

ce1d32d

ci: update release configuration in relse.yml

55953d6

ci: update release configuration in relse.yml

b6f97e7

ci: update release configuration in relse.yml

0ae4450

ci: update release configuration in relse.yml

9b05dde

ci :Update Docker publish workflow to ignore paths

ac1d865

Ignore specific paths for push and pull request events.

Refactor common.h: Migrate to modern C++ paradigms and scoped namespaces

14fffde

refactor (memory management): Introduce RAII DeviceBuffer and scoped …

90d71a3

…copy utilities

Refactor memory management: Introduce RAII DeviceBuffer and scoped co…

ff1acd3

…py utilities

Eamon2009 added 18 commits May 25, 2026 08:10

Delete chat.py

534cbd9

Delete data-set.py

bf204b9

Delete main.py

77d4d95

Delete run_20260504_143730.txt

fae4268

feat(cuda): implement NcclCommunicator RAII wrapper and all-reduce pr…

bb27044

…imitives

feat : added a test script for memory.cuh runtime.cuh and tensor.cuh

e3f2da8

feat(cuda): implement AdamW optimizer kernel and update host function

6eee5ca

- Add `adamw_kernel` for parallelized parameter updates on CUDA. - Implement host side `adamw_update` with shape and parameter validation. - Guard device transitions using `DeviceGuard`.

refactor: obsolete utilities and deprecated functions Remove code blo…

96910d4

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

1871a0c

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

34530a3

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

1983d41

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

7d1738d

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

79bc035

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

eaf9ed6

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

13d5b6e

…at and unused dependencies

refactor: obsolete utilities and deprecated functions Remove code blo…

e42f23c

…at and unused dependencies

feat :tensor management with benchmarks (#51) (#52)

c7a1e01

Eamon2009 requested a review from codeaddict-119 May 25, 2026 05:13

Eamon2009 self-assigned this May 25, 2026

Eamon2009 added documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code python Pull requests that update python code cuda labels May 25, 2026

Eamon2009 changed the title ~~Update environment variables, configurations, and introduce CUDA features~~ refactor : configurations, and introduce CUDA features (#53) May 25, 2026

codeaddict-119 self-assigned this May 25, 2026

codeaddict-119 approved these changes May 25, 2026

View reviewed changes

codeaddict-119 merged commit e766e8c into exp May 25, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor : configurations, and introduce CUDA features (#53)#53

refactor : configurations, and introduce CUDA features (#53)#53
codeaddict-119 merged 58 commits into
expfrom
master

Eamon2009 commented May 25, 2026

Uh oh!

codeaddict-119 commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eamon2009 commented May 25, 2026

Summary

Causal Multi-Head Attention Forward Pass (CUDA)

Core Attention Kernelattention_forward_kernel:

Uh oh!

codeaddict-119 commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants