|
||
---|---|---|
.devops | ||
.github | ||
examples | ||
models | ||
prompts | ||
rwkv | ||
spm-headers | ||
tests | ||
.dockerignore | ||
.gitignore | ||
CMakeLists.txt | ||
LICENSE | ||
Makefile | ||
Package.swift | ||
README.md | ||
SHA256SUMS | ||
convert-ggml-to-pth.py | ||
convert-gpt4all-to-ggml.py | ||
convert-gptq-to-ggml.py | ||
convert-pth-to-ggml.py | ||
convert-unversioned-ggml-to-ggml.py | ||
flake.lock | ||
flake.nix | ||
ggml.c | ||
ggml.h | ||
llama.cpp | ||
llama.h | ||
quantize.py |
README.md
rwkv.cpp
This is a port of BlinkDL/RWKV-LM to ggerganov/ggml. The end goal is to allow 4-bit quanized inference on CPU.
WORK IN PROGRESS: NOTHING WORKS YET! If you know C/C++/ggml, please help!
Inference code runs, but outputs all NaN
s in logits, most probably due to missing operators. Values are correct up to ln0
: result of ln0
matches with reference implementation.
Plan
- Make FP32 inference work
- Implement and use element-wise
max
,exp
,sigmoid
- Compare vectors step-by-step with reference implementation
- Implement and use element-wise
- Validate states and logits against reference implementation by creating a testing script
- Heavily refactor code; optimize where possible
- Make FP16 inference work
- Create fancy interface with sockets/shared memory/pipes/etc.
- Create Python wrapper with sampling and simple chat interface
- Write a good
README.md
and publish links to this repo - Make INT4 inference work
- Create pull request to main
ggml
repo with all improvements made here
Structure
This repo is based on the llama.cpp repo. RWKV-related code is in these directories:
./rwkv
: directory containing Python scripts./examples/main_rwkw
: directory containing script that loads and infers RWKV model
Please do not change files in other directories -- this will make pulling recent changes easier.
How to use
Windows
Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.
Clone the repo and set it up for build:
git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake .
Download an RWKV model from Huggingface and convert it into ggml
format:
python convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32
Compile and run the script:
cmake --build . --config Release
bin\Release\main_rwkv.exe "C:\rwkv.cpp-169M.bin" 123 "C:\state_in.bin" "C:\state_out.bin" "C:\logits_out.bin"
The script will read state from state_in.bin
, do single inference using the state and token 123
as an input, save new state into state_out.bin
and logits into logits_out.bin
.