Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
93de130
chore(config): update environment variables and fix typo in .pt
Eamon2009 May 24, 2026
06c2aaa
refactor(config): update default settings and fix duplicate torch path
Eamon2009 May 24, 2026
ba26fdb
chore(train): decrease evaluation interval for faster tracking
Eamon2009 May 24, 2026
c2f8c40
Delete chat.txt
Eamon2009 May 24, 2026
c197072
Delete torch_example.cpp
Eamon2009 May 24, 2026
0c93c42
Delete torch_main.cpp
Eamon2009 May 24, 2026
066ec74
feat(model): introduce torch_main C++ training and chat entry point
Eamon2009 May 24, 2026
e2dc8db
feat(model): introduce torch_main C++ training and chat entry point
Eamon2009 May 24, 2026
65f197c
refactor: migrate model backend from C++ to PyTorch and update config…
Eamon2009 May 24, 2026
6c5a27d
refactor(ui): update backend tooltip and fix checkpoint filename typo
Eamon2009 May 24, 2026
62e7b52
feat(model): switch tokenizer to o200k_base for expanded vocabulary
Eamon2009 May 24, 2026
1e8042c
refactor(inference): convert response generation to a token streaming…
Eamon2009 May 24, 2026
7f4701a
refactor(build): update Vite script execution with configLoader flag
Eamon2009 May 24, 2026
60d9215
chore: add Docker support with GitHub Packages publishing
Eamon2009 May 24, 2026
8376a0f
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
a8bf880
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
98be65e
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
093fae8
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
0879e2c
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
61c8f2f
refactor(main): cleaned code resolved file path added new parameters
Eamon2009 May 24, 2026
ce1d32d
Refactor Docker image name handling in workflow
Eamon2009 May 24, 2026
67638c5
chore: add Docker support with GitHub Packages publishing - Multi-st…
Eamon2009 May 24, 2026
55953d6
ci: update release configuration in relse.yml
Eamon2009 May 24, 2026
b6f97e7
ci: update release configuration in relse.yml
Eamon2009 May 24, 2026
0ae4450
ci: update release configuration in relse.yml
Eamon2009 May 24, 2026
9b05dde
ci: update release configuration in relse.yml
Eamon2009 May 24, 2026
ac1d865
ci :Update Docker publish workflow to ignore paths
Eamon2009 May 24, 2026
14fffde
Refactor common.h: Migrate to modern C++ paradigms and scoped namespaces
Eamon2009 May 24, 2026
90d71a3
refactor (memory management): Introduce RAII DeviceBuffer and scoped …
Eamon2009 May 24, 2026
ff1acd3
Refactor memory management: Introduce RAII DeviceBuffer and scoped co…
Eamon2009 May 24, 2026
b9cc837
Delete reduce.cuh
Eamon2009 May 24, 2026
880652e
feat: add runtime utilities Implement RAII DeviceGuard, Stream wrappe…
Eamon2009 May 24, 2026
791506f
refactor :tensor structures: Introduce TensorShape verification, Tens…
Eamon2009 May 24, 2026
5d3423e
refactor :tensor structures: Introduce TensorShape verification, Tens…
Eamon2009 May 24, 2026
d733e07
Enhance README with project overview and architecture
Eamon2009 May 24, 2026
61c60c3
feat(cuda): add AdamW optimizer interface and configuration
Eamon2009 May 25, 2026
27fddb7
feat(cuda): add attention forward and backward pass interfaces
Eamon2009 May 25, 2026
e1e4d51
feat(cuda): implement softmax and causal softmax forward kernels
Eamon2009 May 25, 2026
da06a2f
feat(cuda): implement cuBLAS matmul wrapper and BlasHandle RAII manager
Eamon2009 May 25, 2026
17bc3ba
feat(cuda): implement QKV permutation and unpermutation kernels
Eamon2009 May 25, 2026
534cbd9
Delete chat.py
Eamon2009 May 25, 2026
bf204b9
Delete data-set.py
Eamon2009 May 25, 2026
77d4d95
Delete main.py
Eamon2009 May 25, 2026
fae4268
Delete run_20260504_143730.txt
Eamon2009 May 25, 2026
bb27044
feat(cuda): implement NcclCommunicator RAII wrapper and all-reduce pr…
Eamon2009 May 25, 2026
e3f2da8
feat : added a test script for memory.cuh runtime.cuh and tensor.cuh
Eamon2009 May 25, 2026
6eee5ca
feat(cuda): implement AdamW optimizer kernel and update host function
Eamon2009 May 25, 2026
d24bc3d
feat(cuda): implement causal multi-head attention forward kernel
Eamon2009 May 25, 2026
96910d4
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
1871a0c
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
34530a3
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
1983d41
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
7d1738d
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
79bc035
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
eaf9ed6
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
13d5b6e
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
e42f23c
refactor: obsolete utilities and deprecated functions Remove code blo…
Eamon2009 May 25, 2026
c7a1e01
feat :tensor management with benchmarks (#51) (#52)
Eamon2009 May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.git
.gitignore
.github
.venv
**/__pycache__
**/*.pyc
**/*.pyo
**/*.pyd
engine/logs/
node_modules
frontend/node_modules
.npm-cache
frontend/.vite
frontend/dist

# Model weights
*.pt
*.bin
models/

# Windows build artifacts
*.exe
quadtrix.exe
*.png
*.jpg
*.jpeg
*.md
LICENSE
contributing.md
SECURITY.md
run.md
.DS_Store
Thumbs.db
.idea
.vscode
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
pip install fastapi "uvicorn[standard]" pydantic pydantic-settings httpx redis

- name: Compile Python sources
run: python -m compileall backend engine iGPU
run: python -m compileall backend engine

- name: Import FastAPI application
working-directory: backend
Expand Down
82 changes: 82 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Publish Docker image
on:
push:
branches:
- master
tags:
- "v*.*.*"
paths-ignore:
- 'cuda/**'
- 'docs/**'
- '**.md'
pull_request:
branches:
- master
paths-ignore:
- 'cuda/**'
- 'docs/**'
- '**.md'

env:
REGISTRY: ghcr.io

jobs:
build-and-push:
name: Build & push to ghcr.io
runs-on: ubuntu-latest

permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set lowercase image name
id: image
run: |
echo "name=$(echo '${{ github.repository }}' | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT

- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to ghcr.io
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract Docker metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=ref,event=pr
- name: Build and push Docker image (CPU)
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
BASE_IMAGE=ubuntu:24.04
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Image published
if: github.event_name != 'pull_request'
run: |
echo "Image published to GitHub Packages"
echo ""
echo "Pull with:"
echo " docker pull ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}:latest"
echo ""
echo "Or via docker-compose:"
echo " image: ${{ env.REGISTRY }}/${{ steps.image.outputs.name }}:latest"
44 changes: 0 additions & 44 deletions .github/workflows/github-package.yml

This file was deleted.

57 changes: 0 additions & 57 deletions .github/workflows/release.yml

This file was deleted.

25 changes: 0 additions & 25 deletions .npmignore

This file was deleted.

70 changes: 70 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
FROM ubuntu:24.04 AS builder

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
g++ \
python3 \
python3-pip \
python3-venv \
curl \
ca-certificates \
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y --no-install-recommends nodejs \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY . .
RUN g++ -std=c++17 -O2 -I. -Iinclude -o quadtrix main.cpp
RUN cd frontend \
&& npm ci \
&& npm run build
RUN python3 -m venv /venv \
&& /venv/bin/pip install --upgrade pip --quiet \
&& /venv/bin/pip install -r backend/requirements.txt --quiet

ARG BASE_IMAGE=ubuntu:24.04
FROM ${BASE_IMAGE:-ubuntu:24.04} AS runtime

LABEL org.opencontainers.image.title="Quadtrix.cpp"
LABEL org.opencontainers.image.description="Local LLM with C++/PyTorch backends and React UI"
LABEL org.opencontainers.image.source="https://github.com/Eamon2009/Quadtrix.cpp"
LABEL org.opencontainers.image.version="1.1.0"
LABEL org.opencontainers.image.licenses="MIT"

ENV DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
PATH="/venv/bin:$PATH"

# Runtime system packages
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
supervisor \
curl \
ca-certificates \
&& curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
&& apt-get install -y --no-install-recommends nodejs \
&& npm install -g serve --quiet \
&& rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY --from=builder /venv /venv
COPY --from=builder /build/quadtrix /app/quadtrix
COPY --from=builder /build/frontend/dist /app/frontend/dist
COPY --from=builder /build/backend /app/backend
COPY --from=builder /build/engine /app/engine
COPY supervisord.conf /etc/supervisor/conf.d/quadtrix.conf
COPY docker-entrypoint.sh /app/entrypoint.sh

RUN chmod +x /app/entrypoint.sh /app/quadtrix \
&& mkdir -p /var/log/supervisor /app/models
VOLUME ["/app/models"]
ENV TORCH_CHECKPOINT_PATH=/app/models/best_model.pt \
GPT_MODEL_PATH=/app/models/best_model.bin \
API_PORT=3001 \
CORS_ORIGINS=http://localhost:8080 \
LOG_LEVEL=INFO \
MAX_SESSIONS=1000 \
SESSION_TTL_HOURS=24
EXPOSE 3001 8080

ENTRYPOINT ["/app/entrypoint.sh"]
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
# Quadtrix.cpp
<img width="2442" height="1586" alt="run_20260508_110726" src="https://github.com/user-attachments/assets/ef51d1c3-e28e-4674-8a71-5513e753b174" />

Quadtrix.cpp is a local language model project with several execution paths:

- A dependency-free C++17 transformer implementation with manual forward and backward passes.
- A PyTorch training and inference path for faster experimentation on CPU, CUDA, or supported accelerator backends.
- A FastAPI middleware layer for chat sessions, health checks, backend selection, and feedback.
- A React + TypeScript frontend for local chat, settings, session history, and model status.
- Optional package/CLI support through `bin/quadtrix.js`.
---
Quadtrix.cpp is a local large language model project built around a modular, multi-path architecture that allows to choose the right execution strategy for their hardware and workflow. Whether you are working on a bare-metal embedded environment, running experiments on a GPU cluster, serving a REST API, or interacting through a browser-based chat interface, Quadtrix.cpp provides a coherent and composable foundation for each of those scenarios. This is designed to be approachable for people who want to read and modify every layer of the stack, while remaining practical enough for people who simply want to spin up a working local model quickly.
> For full technical reference, check the documentation — <a href="https://eamon2009.github.io/LLMs/" style="color:#1a73e8;text-decoration:underline;" target="_blank"> Docs</a>



> [!IMPORTANT]
> Please be aware that several commands listed in this documentation—specifically those involving file paths and directory navigation—should not be directly copied and pasted into your terminal. Because file structures and path syntax (such as / vs \) vary significantly across operating systems like Windows, macOS, and Linux, you must manually adjust these arguments to match your local environment. Ensure you verify your current working directory and replace any placeholder paths with the absolute or relative path specific to your machine to avoid execution errors.

---
## Architecture

The project is designed as a technical learning implementation. The C++ path exposes the transformer internals directly: tensor operations, attention, layer normalization, cross-entropy, analytical gradients, AdamW, checkpointing, and autoregressive generation.
<img width="1016" height="684" alt="image" src="https://github.com/user-attachments/assets/0e9faad4-71a9-4c7f-80e9-1136dfea6e57" />
The diagram shows how tokens enter at the bottom as raw IDs, get converted into vector embeddings with positional information added, then pass upward through a repeated stack of decoder blocks - each block applying masked attention followed by a feed-forward layer, with normalisation wrapping both. At the very top, a linear projection maps those representations to output logits across the vocabulary. The right-hand side zooms into the attention mechanism itself, showing how queries, keys, and values are linearly projected, fed into a scaled dot-product with an optional causal mask and softmax, then concatenated across all heads before being projected back out. The training flow panel on the far right shows this running as a five-step cycle per batch: data loading, forward pass, loss computation, backward pass for gradients, and a weight update. The bottom section confirms the behaviour through training loss, validation loss, and perplexity plots - all three curves descending and converging steadily as steps increase, indicating the model is learning as expected.


## v1.1.0
<img width="2185" height="829" alt="run_20260430_192930" src="https://github.com/user-attachments/assets/c6db061a-aa8d-4d8d-a1e2-1a81418bb613" />
<img width="2442" height="1586" alt="run_20260508_110726" src="https://github.com/user-attachments/assets/ef51d1c3-e28e-4674-8a71-5513e753b174" />


---

Expand Down
2 changes: 1 addition & 1 deletion backend/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ LOG_LEVEL=INFO
MAX_SESSIONS=1000
SESSION_TTL_HOURS=24
CPP_SERVER_URL=http://localhost:8080
TORCH_CHECKPOINT_PATH=../engine/best_model .pt
TORCH_CHECKPOINT_PATH=../engine/best_model.pt
REQUEST_TIMEOUT_SECONDS=60
2 changes: 1 addition & 1 deletion backend/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class Settings(BaseSettings):
max_sessions: int = Field(default=1000, alias="MAX_SESSIONS")
session_ttl_hours: int = Field(default=24, alias="SESSION_TTL_HOURS")
cpp_server_url: str = Field(default="http://localhost:8080", alias="CPP_SERVER_URL")
torch_checkpoint_path: str = Field(default="../engine/best_model .pt", alias="TORCH_CHECKPOINT_PATH")
torch_checkpoint_path: str = Field(default="../engine/best_model.pt", alias="TORCH_CHECKPOINT_PATH")
request_timeout_seconds: float = Field(default=60.0, alias="REQUEST_TIMEOUT_SECONDS")

model_config = SettingsConfigDict(env_file=".env", extra="ignore", populate_by_name=True)
Expand Down
Loading
Loading