|
|
||
|---|---|---|
| .devops | ||
| .github | ||
| examples | ||
| models | ||
| prompts | ||
| rwkv | ||
| spm-headers | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| CMakeLists.txt | ||
| LICENSE | ||
| Makefile | ||
| Package.swift | ||
| README.md | ||
| SHA256SUMS | ||
| convert-ggml-to-pth.py | ||
| convert-gpt4all-to-ggml.py | ||
| convert-gptq-to-ggml.py | ||
| convert-pth-to-ggml.py | ||
| convert-unversioned-ggml-to-ggml.py | ||
| flake.lock | ||
| flake.nix | ||
| ggml.c | ||
| ggml.h | ||
| llama.cpp | ||
| llama.h | ||
| quantize.py | ||
README.md
rwkv.cpp
This is a port of RWKV-LM by @BlinkDL to ggml library by @ggerganov. The end goal is to allow 4-bit quanized inference on CPU.
WORK IN PROGRESS: NOTHING WORKS YET! If you know C/C++/ggml, please help!
Converting and loading the model works, but I'm not sure if the element/dimension order is correct -- more debugging needed.
Plan
- Make FP32 inference work
- Validate states and logits against reference implementation by creating a testing script
- Heavily refactor code; optimize where possible
- Make FP16 inference work
- Create fancy interface with sockets/shared memory/pipes/etc.
- Create Python wrapper with sampling and simple chat interface
- Write a good
README.mdand publish links to this repo - Make INT4 inference work
- Create pull request to main
ggmlrepo with all improvements made here
Structure
This repo is based on the llama.cpp repo. RWKV-related code is in these directories:
./rwkv: directory containing Python scripts./examples/main_rwkw: directory containing script that loads and infers RWKV model
Please do not change files in other directories -- this will make pulling recent changes easier.
How to use
Windows
Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.
Clone the repo and set it up for build:
git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake .
Download an RWKV model from Huggingface and convert it into ggml format:
python convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32
Compile and run the script:
cmake --build . --config Release
bin\Release\main_rwkv.exe "C:\rwkv.cpp-169M.bin" 123 "C:\state_in.bin" "C:\state_out.bin" "C:\logits_out.bin"
The script will read state from state_in.bin, do single inference using the state and token 123 as an input, save new state into state_out.bin and logits into logits_out.bin.