fix/feat!: LLMs context management by mateuszlampert · Pull Request #819 · software-mansion/react-native-executorch

mateuszlampert · 2026-02-19T01:46:51Z

Description

This PR fixes few bugs related to the LLMs, caused by mixing two approaches - functional (as we pass whole messages history each time) and stateful (as we keep pos_ in the runner, representing at which position the KV cache is), which resulted in 3 bugs:

broken KV cache for reasoning models - in the runner, we counted tokens generated for the reasoning and included these in KV cache (pos_ += num_generated_tokens), but in next turns, jinja template removed these reasoning tokens from the messages history - as a result, KV-cache was incoherent
duplicated tokens in KV cache - we were passing whole messages history to the runner (functional approach), but we were also appending all tokens (prompt and generated) to the KV cache (which position is represented by pos_) - as a result tokens were "duplicated" in the KV cache and we were running out of available tokens very fast (exceeding context_window_length)
stateful TS functional API - even though our generate() method is called functional, it kept internal state in the runner (e. g. pos_)

These bugs were fixed by resetting the runner before each generation, which makes it truly functional - old messages are prefilled and the KV cache can be still used during generation phase.

Additionally, this PR adds ContextStrategy to ChatConfig interface, so now it's possible to define (or use one of already implemented) strategy for managing context (e. g. naive, message count based, sliding window) - it gives us more flexibility and user can decide what's best for their use case. From now on, SlidingWindowContextStrategy is also configured as the default one.

Introduces a breaking change?

Yes
No

These changes will not break anything until max number of messages is not modified (I removed contextWindowLength from ChatConfig and replaced it with contextStrategy)

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run example llm app, open executorch logs (adb logcat | grep -i "executorch" for example) and see if numbers of tokens are properly aligned and if pos_ is correct.

To test different context management strategies, change contextStrategy in llm app and modify model configuration.

Screenshots

Related issues

#776

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

Position in KV cache, number of prompt tokens and number of generated tokens for both non-reasoning and reasoning models BEFORE changes.

LLAMA 3.2 1B SPINQUANT (without reasoning)

pos_	Prompt tokens	Generated tokens
0	335	269
604=269+335	872	372
1848=604+872+372	1513	CRASH

QWEN 3.0 0.6B QUANTIZED (with reasoning)

pos_	Prompt tokens	Generated tokens
0	309	457
766=309+457	617 (<766!)	192
1575=766+617+192	925 (<1575!)	CRASH

mateuszlampert · 2026-02-19T10:02:38Z

I will update docs after the code gets approved

mateuszlampert · 2026-02-20T12:39:19Z

I changed only docs for useLLM and LLMModule - the rest of changes is related to autogenerated api reference changes

mateuszlampert added 7 commits February 19, 2026 00:44

fix: reset() llm before each call to keep it functional

fb26ec7

chore: remove console.log()

1ebccfe

feat: add various strategies for handling context window

edb1306

feat: make necessary methods and classes public

3ebcb6e

Merge branch 'main' into @ml/llm-functional-api-context-management

0f12695

fix: method name

56ff74c

chore: update llms example app with strategy configuration

d2eed3c

mateuszlampert changed the title ~~@ml/llm functional api context management~~ fix/feat: LLMs context management Feb 19, 2026

mateuszlampert linked an issue Feb 19, 2026 that may be closed by this pull request

LLM message history management not working + exceeding context window ([Error: Failed to generate text, error code: 18]) #776

Open

mateuszlampert marked this pull request as ready for review February 19, 2026 10:01

mateuszlampert requested review from chmjkb, mkopcins and msluszniak February 19, 2026 10:01

mateuszlampert added 6 commits February 19, 2026 23:57

Merge branch 'main' into @ml/llm-functional-api-context-management

14a6b9c

rework context strategies maContextLength

8b7ba1d

chore: update default context strategy

587f773

chore: unify types

3acba46

chore: update docs

6db88a2

chore: update config example in skill

c176e20

mateuszlampert requested a review from msluszniak February 20, 2026 12:38

msluszniak assigned mateuszlampert Feb 20, 2026

fix: unify types to work on android

dae9356

mateuszlampert requested a review from msluszniak February 20, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix/feat!: LLMs context management#819

fix/feat!: LLMs context management#819
mateuszlampert wants to merge 14 commits intomainfrom
@ml/llm-functional-api-context-management

mateuszlampert commented Feb 19, 2026 •

edited by msluszniak

Loading

Uh oh!

mateuszlampert commented Feb 19, 2026

Uh oh!

mateuszlampert commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

mateuszlampert commented Feb 19, 2026 • edited by msluszniak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

mateuszlampert commented Feb 19, 2026

Uh oh!

mateuszlampert commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mateuszlampert commented Feb 19, 2026 •

edited by msluszniak

Loading