Models, Corpus, and Tools
We have released the models and tools developed by LLM-jp. Some of the datasets and documentation in our development are planned to be released soon.
Open platforms
- Models: https://huggingface.co/llm-jp
- Tools: https://github.com/llm-jp
Pre-trained Models
- LLM-jp-3
- 13B v2.0
- 13B v1.0
- 1.3B v1.0
Fine-tuned Models
- LLM-jp-3
- 13B v2.0
- 13B v1.1
- 13B v1.0
Corpora for Pre-training
Evaluation and fine-tuning datasets
Other data is based on publicly available data, and details can be found in the “Evaluation Tools” and “Tuning Scripts” below, respectively.
Tools
- Pre-training Corpus Building Scripts v2.0
- Pre-training Corpus Building Scripts v1.0
- Tokenizer
- Evaluation Tools
- Fine-tuning Script
- DPO Script