Skip to content

refactor : configurations, and introduce CUDA features (#53)#53

Merged
codeaddict-119 merged 58 commits into
expfrom
master
May 25, 2026
Merged

refactor : configurations, and introduce CUDA features (#53)#53
codeaddict-119 merged 58 commits into
expfrom
master

Conversation

@Eamon2009
Copy link
Copy Markdown
Owner

Summary

Causal Multi-Head Attention Forward Pass (CUDA)

PR implements the CUDA forward pass for causal multi-head attention (attention_forward). It includes the core GPU kernel, custom block-level reduction primitives, and tensor validation helpers.

Core Attention Kernelattention_forward_kernel:

  • Computes scaled dot-product attention on an interleaved QKV input tensor structured as [Batch, Time, 3 * Channels].
  • Causal Masking: Enforces autoregressive constraints by preventing tokens from attending to future time steps ($t2 > t$).
  • Implements parallelized block_max and block_sum device functions.
  • Leverages cooperative warp shuffles (warp_max, warp_sum) and shared memory to handle stable online softmax normalization

#52
#11
#12
#14
#29

Eamon2009 added 30 commits May 24, 2026 18:37
- Increase MAX_SESSIONS to 1000 and set SESSION_TTL_HOURS to 24.
- Update CPP_SERVER_URL to point to localhost:8080.
- Fix a typo in TORCH_CHECKPOINT_PATH (removed trailing space before extension).
- Set REQUEST_TIMEOUT_SECONDS to 60.
- Sync default field values with new environment baseline.
- Remove duplicate `torch_checkpoint_path` definition and fix the trailing space typo in the file extension.
- Update `request_timeout_seconds` to a float default of 60.0.
Implement dataset evaluation logic (`estimate_loss`) and standard training loops.
- Add an interactive `--chat` interface mode utilizing token stream generation.
- Configure automatic hardware routing between CUDA and CPU execution environments.
Implement dataset evaluation logic (`estimate_loss`) and standard training loops.
- Add an interactive `--chat` interface mode utilizing token stream generation.
- Configure automatic hardware routing between CUDA and CPU execution environments.
…urations

- Update environment variables configuration and fix the trailing space typo in TORCH_CHECKPOINT_PATH.
- Remove duplicate definition of torch_checkpoint_path in backend/config.py.
- Decrease evaluation intervals (EVAL_INTERVAL) in the C++ engine for quicker validation tracking.
- Add LibTorch C++ execution and interactive chat stream handler in torch_main.cpp.
- Implement state migration (v1) in frontend settings store to default clients to the PyTorch backend.
- Remove the legacy typo (trailing space before extension) from the PyTorch checkpoint string in the header's logic.
- Update the tokenizer configuration in `llm.py` to use the `o200k_base` encoding.
- Expand tokenization capabilities and adjust the `vocab_size` dynamically to support the updated vocabulary baseline.
… architecture

- Replace static `generate_response` with a generator-based `stream_response` utilizing token yielding.
- Update `GPTLanguageModel.generate` to act as an iterator yielding sequential token IDs instead of returning a complete array.
- Implement token-by-token decoding (`decode([token_id])`) to support real-time user-interface updates.
- Keep `generate_response` as a backward-compatible utility that aggregates the token stream.
- Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

- Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean

- Multi-stage Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
…ge Dockerfile (CPU default, CUDA-ready via BASE_IMAGE arg) - Single container running FastAPI + React frontend via supervisord - Model weights mounted as volume at runtime (/app/models) - docker-compose.yml for local development - GitHub Actions workflow publishing to ghcr.io on master push and version tags - .dockerignore to keep build context clean
Ignore specific paths for push and pull request events.
Eamon2009 added 18 commits May 25, 2026 08:10
- Add `adamw_kernel` for parallelized parameter updates on CUDA.
- Implement host side `adamw_update` with shape and parameter validation.
- Guard device transitions using `DeviceGuard`.
- Add `attention_forward_kernel` supporting casual masking and online softmax.
- Implement parallelized block reduction helpers `block_sum` and `block_max`.
- Include shared memory utilization for intermediate warp-level reduction.
- Add safety checks for contiguous F32 CUDA tensors and integer bounds.
@Eamon2009 Eamon2009 requested a review from codeaddict-119 May 25, 2026 05:13
@Eamon2009 Eamon2009 self-assigned this May 25, 2026
@Eamon2009 Eamon2009 added documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code python Pull requests that update python code cuda labels May 25, 2026
@Eamon2009 Eamon2009 changed the title Update environment variables, configurations, and introduce CUDA features refactor : configurations, and introduce CUDA features (#53) May 25, 2026
@codeaddict-119 codeaddict-119 self-assigned this May 25, 2026
@codeaddict-119
Copy link
Copy Markdown
Collaborator

Screenshot 2026-05-25 012846

@codeaddict-119 codeaddict-119 merged commit e766e8c into exp May 25, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda documentation Improvements or additions to documentation enhancement New feature or request github_actions Pull requests that update GitHub Actions code python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants