Expand base model support to include Mistral, Phi, and Qwen families#35
Open
bbkx226 wants to merge 4 commits intocodefuse-ai:mainfrom
Open
Expand base model support to include Mistral, Phi, and Qwen families#35bbkx226 wants to merge 4 commits intocodefuse-ai:mainfrom
bbkx226 wants to merge 4 commits intocodefuse-ai:mainfrom
Conversation
- Introduced new configuration file for Phi-3 Mini model. - Refactored model initialization in `model.py` to support flexible configurations and model factory usage. - Implemented a `ModelFactory` class to handle dynamic model instantiation and configuration management. - Created a `ModelRegistry` class to maintain a centralized registry of supported models with detailed configurations. - Developed a generic tokenizer module to support multiple model families and improve tokenization processes. - Added validation utilities for testing model loading, tokenization, and embedding generation. - Updated requirements to ensure compatibility with new features and dependencies.
…odels and add support for Qwen3-4B
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #6
This pull request introduces a major refactor to the model initialization system, adding a flexible model factory pattern to support multiple popular base models and simplifying configuration management. It also updates documentation, adds new model configuration files, and improves .gitignore coverage.
Key changes include:
Model initialization and configuration refactor
model_factory.pymodule implementing a factory pattern for dynamic model and tokenizer instantiation, with support for different model families (Qwen, LLaMA, Mistral, Phi, Code-LLaMA, Gemma). This centralizes and standardizes model configuration and initialization logic.model.pyto use the new model factory by default, allowing flexible selection of models, improved configuration, and fallback to standard initialization if the factory is unavailable. Added support for additional arguments likemodel_id,use_flash_attention, andtorch_dtype.Expanded model support and configuration
llama2-7b.json,llama3-8b.json,mistral-7b.json,phi3-mini.json,code-llama-7b.json, andgemma-7b.json, enabling easy training setup for each. [1] [2] [3] [4] [5] [6]Documentation and usability improvements
README.mdto document support for 13 base models across 6 families, provide quick-start examples for different models, and clarify new training and data tokenization steps.Project hygiene
.gitignoreto exclude local training data, outputs, cache directories, and Python cache files, preventing accidental commits of large or sensitive files.