Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 46 additions & 27 deletions docs/wiki/08-01-methodology.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,69 @@
# Overview of Methodology
# Overview of Methodology & Risk Exposure Index

> **The Core Philosophy: From Raw Text to Relational Knowledge Graph**
>
> GitGalaxy does not measure subjective "Code Quality," which implies judgment. Instead, it measures objective **Risk Exposures**.
>
> By utilizing deterministic keyword heuristics, the engine parses raw text to build a massive, interconnected knowledge graph of the entire repository. We extract structural markers from the code (the data) and translate them into visible risk heatmaps (the projection). This allows architecture teams to instantly see where their system is drifting into dangerous territory without reading a single line of code.
>
> **The Structural Taxonomy of GitGalaxy**
>
> GitGalaxy assesses functions against 50+ metrics. We then roll these measurements up to the class, file, folder and repo level.

## The Universal Risk Spectrum (A11y Standard)

To reduce cognitive load on the user, GitGalaxy abandons distinct color palettes for individual metrics. Instead, we project the knowledge graph's telemetry using a single, unified **High-Contrast Spectrum** for all risk and exposure dashboards.
## The Universal Risk Spectrum

Regardless of the metric being viewed, the visual translation is always the same:
* 🟦 **Deep Blue:** Very Low Exposure (Safe / Cold / Clean)
Regardless of the risk metric being viewed, the visual translation is always the same color palette:
* 🟦 **Blue:** Very Low Exposure
* 🩵 **Cyan:** Low Exposure
* 🟨 **Yellow:** Moderate / Intermediate Exposure
* 🟧 **Orange:** High Exposure
* 🟥 **Bright Red:** Critical / Extreme Exposure (Hot / Dangerous)

## The Exposure Metrics (Graph Telemetry)

When a user selects a metric from the HUD, the visualizer queries the underlying knowledge graph and recolors the nodes using the Universal Spectrum. The table below defines what the engine's heuristics are actively hunting for, and what a "Red" (Critical) state represents for each mode.

| Labeling Mode | Heuristic Target | The "Red" (Critical) State Indicates... |
| :--- | :--- | :--- |
| **Cognitive Load** | **How hard is it to read?** Scans for deeply nested logic, sprawling methods, and high control-flow complexity. | The logic is incredibly difficult for a human to follow. A prime target for refactoring. |
| **Deep Churn** | **How often does it change?** Identifies files that are constantly being rewritten, patched, or reverted based on Git history. | The file refuses to settle down. It is highly fluid and likely a source of recurring bugs. |
| **Error & Exception Exposure** | **Is it fragile?** Compares the ratio of defensive code (error handling, guards) against aggressive logic. | The file lacks safety nets. It is performing complex logic without adequate exception handling. |
| **Tech Debt** | **Are there shortcuts?** Scans for `TODO`s, `FIXME`s, known hacks, and temporary architectural band-aids. | The file is heavily burdened by unfinished business and documented technical debt. |
| **Documentation Risk** | **Is it explained?** Measures the ratio and quality of instructional literature against the raw executable code. | The file is essentially undocumented. It operates as a "black box" to new developers. |
| **Verification (Tests)** | **Is it proven?** Checks if the file has a corresponding safety net of tests proving it works. | The code is heavily exposed due to a severe lack of testing and verification coverage. |
| **Stability (Heat)** | **Is it fresh?** Shows where work is happening *right now* vs. code that was written months ago via OS `mtime` or Git logs. | The file is "Hot." It has been actively edited or committed in the very recent past. |
| **Graveyard** | **Is there dead code?** Finds massive blocks of code that were commented out and abandoned. | The file is hoarding historical, dead code that needs to be purged. |
| **API Exposure** | **Is it public?** Highlights the entry points and network routers where the system talks to the outside world. | The file serves as a major public endpoint, demanding strict security scrutiny. |
| **Concurrency** | **Is it multitasking?** Highlights complex timing, threads, or asynchronous execution logic. | Heavy reliance on asynchronous timing, introducing severe risks for race conditions. |
| **State Flux** | **Is the data changing?** Highlights variables that are constantly being modified or mutated in memory. | "Boiling" data. The file mutates state aggressively, making it hard to track standard values. |
| **Ownership Entropy** | **Who wrote this?** Measures the Shannon Entropy of Git blame data to see if a file is owned by one person or many. | A "Community" file. It has been touched by so many different developers that no single person truly owns it. |
## Primary Risk Exposures

**Definition:** The mathematical output of parsing regex heuristics, file boundaries, version control counts, and ecosystem multipliers. These metrics represent the unified risk calculations applied across the repository.

| Metric | Level 1: Function<br>([count based](08-02-sub-equations.md))<br>(0-infinity) | Level 2: Class<br>([count based](08-02-sub-equations.md))<br>(0-infinity) | Level 3: File<br>([Sigmoid normalized](08-03-transforming-regex-counts.md))<br>(0-100) | Level 4: Folder<br>(Weighted Avg)<br>(0-100) | Level 5: Repo<br>(Weighted Avg)<br>(0-100) |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **[Cognitive Load](08-05-cognitive-load.md)** | # of branches, mutations, and danger triggers, adjusted | Sum of function counts + Gini | Normalized via sigmoid function + GuideStar Shield | Mass-Weighted Avg | Mass-Weighted Avg |
| **[State Flux](08-16-state-flux-exposure.md)** | # of variable mutations, adjusted | Sum of function counts + LCOM | Normalized via sigmoid function | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Concurrency](08-15-concurrency-exposure.md)** | # of async/thread operations minus locks, adjusted | Sum of function counts | Normalized via sigmoid function | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Technical Debt](08-08-technical-debt.md)** | # of engineer comments (ex: HACK, TODO), adjusted | Sum of function counts | Normalized via sigmoid function | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Structural Fortification](08-07-structural-fortification.md)**| # of attacker vs. defender keywords, adjusted | Sum of function counts | Normalized via sigmoid function w/ Breach Floor | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Graveyard](08-13-graveyard-detector.md)** | N/A | N/A | Commented out code lines, normalized via sigmoid function | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Specification Alignment](08-18-specification-alignment.md)**| # of functions with Spec tags, adjusted | Sum of function counts w/ Spec mapping | Percentage of entities lacking spec tags, adjusted | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Civil War (Tabs/Spaces)](08-12-civil-war.md)** | N/A | N/A | Ratio of tab vs space indented lines, percentage | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Ownership Entropy](08-04-ownership-entropy.md)** | N/A | N/A | Distribution of git authors, normalized via Shannon entropy | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Deep Churn](08-06-deep-churn.md)** | N/A | N/A | Total git commits over time, normalized via log curve | Mass-Weighted Avg | Mass-Weighted Avg |
| **[File Stability (Heat)](08-10-file-stability.md)**| N/A | N/A | Time since last commit, normalized via time distance | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Verification Risk (Test Coverage)](08-11-test-coverage.md)**| # of unverified execution paths, adjusted | Sum of function counts | Normalized via sigmoid function * Cross-File Test Tethers | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Documentation Risk Exposure](08-09-documentation-risk.md)**| # of missing docstrings, comments, and readmes, adjusted | Sum of function counts | Normalized via sigmoid function * Bus Factor | Mass-Weighted Avg | Mass-Weighted Avg |
| **[API Exposure](08-14-api-exposure.md)** | # of export/public modifiers, adjusted | Sum of function counts + Public interfaces | Normalized via ratio and volume * Network Blast Radius | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Algorithmic DoS Exposure](08-24-Big-O-Detection.md)** | # of deep loops interacting with data/network, adjusted | Sum of function counts | Normalized via sigmoid function * Network Blast Radius | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Logic Bomb Exposure](08-20-logic-bomb-exposure.md)** | # of conditions leading to payloads, adjusted | Sum of function counts | Normalized via sigmoid function * Taint Flow Tracking | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Injection Surface Exposure](08-21-injection-surface-exposure.md)**| # of inputs flowing to execution, adjusted | Sum of function counts | Normalized via sigmoid function * Taint Flow Tracking | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Memory Corruption Exposure](08-22-memory-corruption-exposure.md)**| # of pointers/allocations minus cleanups, adjusted | Sum of function counts | Normalized via sigmoid function * ML Archetype Map | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Obscured Payload Exposure](08-19-obscured-payload-exposure.md)**| # of obfuscation and intent triggers, adjusted | Sum of function counts | Normalized via sigmoid function * Biaxial Drift | Mass-Weighted Avg | Mass-Weighted Avg |
| **[Hardcoded Secrets Exposure](08-23-hardcoded-secrets-exposure.md)**| N/A | N/A | Detected credential strings, normalized via sigmoid function | Mass-Weighted Avg | Mass-Weighted Avg |

## Custom Topological Scales

Certain metrics do not represent a "Safe to Dangerous" pipeline, but rather a difference in structural style or identity within the graph. These bypass the Universal Spectrum and use custom rendering palettes.
Certain metrics do not represent a "Safe to Dangerous" pipeline, but rather a difference in structural style within the graph. These bypass the Universal Spectrum and use custom rendering palettes.

* **Civil War (Tabs vs. Spaces):** Checks for indentation consistency across the codebase.
* 🟩 **Green:** Strictly uses Tabs.
* 🟨 **Yellow:** Strictly uses Spaces.
* 🟦 **Blue:** A chaotic, mixed indentation style (The "Warzone").
* **Language Identity:** Colors the file based on its evaluated taxonomy (e.g., JavaScript is Yellow, Python is Blue, Rust is Orange) to create a visual map of the system's tech stack.


---

## Expanding the Physics Framework

These documents outline the specific variables and mathematical normalization strategies that power the equations detailed in the matrix above.

* [08-02: Sub-Equations (Scanner Variables)](08-02-sub-equations.md)
* [08-03: Transforming Regex Counts (The UEF)](08-03-transforming-regex-counts.md)

<br><br>

Expand All @@ -61,4 +80,4 @@ This documentation is part of the [GitGalaxy Ecosystem](https://github.com/squid

---

**[⬅️ Back to Master Index](index.md)**
**[⬅️ Back to Master Index](index.md)**
Loading
Loading