Go to file

saharNooby 93c8dcae75 Update README.md		2023-03-30 20:37:09 +04:00
.devops	Remove oboslete command from Docker script	2023-03-23 22:39:44 +02:00
.github	ci : re-enable AVX512 testing (Windows-MSVC) (#584 )	2023-03-29 23:44:39 +03:00
examples	Implement time mixing, fix matrix shape mismatch	2023-03-30 20:29:41 +04:00
models	Introduce C-style API (#370 )	2023-03-22 07:32:36 +02:00
prompts	add example of re-act pattern (#583 )	2023-03-29 10:10:24 -05:00
rwkv	Make ln0 work correctly	2023-03-30 20:01:26 +04:00
spm-headers	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
tests	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
.dockerignore	🚀 Dockerize llamacpp (#132 )	2023-03-17 10:47:06 +01:00
.gitignore	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
CMakeLists.txt	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
LICENSE	Add LICENSE (#21 )	2023-03-12 08:36:03 +02:00
Makefile	all : be more strict about converting float to double (#458 )	2023-03-28 19:48:20 +03:00
Package.swift	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
README.md	Update README.md	2023-03-30 20:37:09 +04:00
SHA256SUMS	Revert "Delete SHA256SUMS for now" (#429 )	2023-03-23 15:15:48 +01:00
convert-ggml-to-pth.py	rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600 )	2023-03-29 20:09:25 +02:00
convert-gpt4all-to-ggml.py	py : add GPT4All conversion script	2023-03-29 19:29:52 +03:00
convert-gptq-to-ggml.py	Fix GPTQ converter (#423 )	2023-03-23 22:18:13 +02:00
convert-pth-to-ggml.py	py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547 )	2023-03-28 20:02:34 +03:00
convert-unversioned-ggml-to-ggml.py	py : add GPT4All conversion script	2023-03-29 19:29:52 +03:00
flake.lock	Nix flake (#40 )	2023-03-17 23:03:48 +01:00
flake.nix	Fix Nix build	2023-03-23 17:51:26 +01:00
ggml.c	Fix GGML_F32Cx8_STORE in AVX without F16C path (#619 )	2023-03-30 11:16:30 +02:00
ggml.h	ggml : introduce structs for the q4 data blocks (#356 )	2023-03-28 18:56:03 +03:00
llama.cpp	llama : fix compile warnings when reading the vocab	2023-03-29 22:13:12 +03:00
llama.h	Fix typo in llama.h (#593 )	2023-03-29 13:19:29 +00:00
quantize.py	Check the existence of f16_model_path_base in quantize.py (#574 )	2023-03-28 18:06:28 +03:00

README.md

rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml. The end goal is to allow 4-bit quanized inference on CPU.

WORK IN PROGRESS: NOTHING WORKS YET! If you know C/C++/ggml, please help!

Inference code runs, but outputs all NaNs in logits, most probably due to missing operators. Values are correct up to ln0: result of ln0 matches with reference implementation.

Plan

Make FP32 inference work
1. Implement and use element-wise max, exp, sigmoid
2. Compare vectors step-by-step with reference implementation
Validate states and logits against reference implementation by creating a testing script
Heavily refactor code; optimize where possible
Make FP16 inference work
Create fancy interface with sockets/shared memory/pipes/etc.
Create Python wrapper with sampling and simple chat interface
Write a good README.md and publish links to this repo
Make INT4 inference work
Create pull request to main ggml repo with all improvements made here

Structure

This repo is based on the llama.cpp repo. RWKV-related code is in these directories:

./rwkv: directory containing Python scripts
./examples/main_rwkw: directory containing script that loads and infers RWKV model

Please do not change files in other directories -- this will make pulling recent changes easier.

How to use

Windows

Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.

Clone the repo and set it up for build:

git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake .

Download an RWKV model from Huggingface and convert it into ggml format:

python convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32

Compile and run the script:

cmake --build . --config Release
bin\Release\main_rwkv.exe "C:\rwkv.cpp-169M.bin" 123 "C:\state_in.bin" "C:\state_out.bin" "C:\logits_out.bin"

The script will read state from state_in.bin, do single inference using the state and token 123 as an input, save new state into state_out.bin and logits into logits_out.bin.