Go to file

saharNooby 935d16f5db Move library wrapper to separate file, refactor code		2023-04-02 12:24:40 +04:00
.devops	Remove oboslete command from Docker script	2023-03-23 22:39:44 +02:00
.github	ci : re-enable AVX512 testing (Windows-MSVC) (#584 )	2023-03-29 23:44:39 +03:00
examples	Move model to separate C library file	2023-04-01 14:38:50 +04:00
models	Introduce C-style API (#370 )	2023-03-22 07:32:36 +02:00
prompts	add example of re-act pattern (#583 )	2023-03-29 10:10:24 -05:00
rwkv	Move library wrapper to separate file, refactor code	2023-04-02 12:24:40 +04:00
spm-headers	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
tests	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
.dockerignore	🚀 Dockerize llamacpp (#132 )	2023-03-17 10:47:06 +01:00
.gitignore	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
CMakeLists.txt	Add Python wrapper for C library	2023-04-01 16:02:22 +04:00
LICENSE	Add LICENSE (#21 )	2023-03-12 08:36:03 +02:00
Makefile	Add Python wrapper for C library	2023-04-01 16:02:22 +04:00
Package.swift	deploy : add a Package.swift for SwiftPM support (#393 )	2023-03-28 19:39:01 +03:00
README.md	Move library wrapper to separate file, refactor code	2023-04-02 12:24:40 +04:00
SHA256SUMS	Revert "Delete SHA256SUMS for now" (#429 )	2023-03-23 15:15:48 +01:00
convert-ggml-to-pth.py	rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600 )	2023-03-29 20:09:25 +02:00
convert-gpt4all-to-ggml.py	py : add GPT4All conversion script	2023-03-29 19:29:52 +03:00
convert-gptq-to-ggml.py	Fix GPTQ converter (#423 )	2023-03-23 22:18:13 +02:00
convert-pth-to-ggml.py	py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547 )	2023-03-28 20:02:34 +03:00
convert-unversioned-ggml-to-ggml.py	py : add GPT4All conversion script	2023-03-29 19:29:52 +03:00
flake.lock	Nix flake (#40 )	2023-03-17 23:03:48 +01:00
flake.nix	Fix Nix build	2023-03-23 17:51:26 +01:00
ggml.c	Fix quantization from FP16	2023-04-01 20:01:06 +04:00
ggml.h	Implement exp, max, 1_minus_x, sigmoid operators in ggml	2023-03-31 19:04:35 +04:00
llama.cpp	llama : fix compile warnings when reading the vocab	2023-03-29 22:13:12 +03:00
llama.h	Fix typo in llama.h (#593 )	2023-03-29 13:19:29 +00:00
quantize.py	Check the existence of f16_model_path_base in quantize.py (#574 )	2023-03-28 18:06:28 +03:00
rwkv.cpp	Move library wrapper to separate file, refactor code	2023-04-02 12:24:40 +04:00
rwkv.h	Move library wrapper to separate file, refactor code	2023-04-02 12:24:40 +04:00

README.md

rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml. The end goal is to allow 4-bit quanized inference on CPU.

WORK IN PROGRESS! Status: FP32, FP16 and INT4 inference work. INT4 gives not so good quality, need to properly measure and compare perplexity.

Plan

Create Python script with sampling and simple chat interface
Measure performance and quality of different model sizes and data types
Clean up the repo (remove llama related files and mentions)
Write a good README.md and publish links to this repo
Create pull request to main ggml repo with all improvements made here

Structure

This repo is based on the llama.cpp repo. RWKV-related code is in these directories:

./rwkv: directory containing Python scripts for conversion, inference and validation
./examples/main_rwkw: directory containing script that loads and infers RWKV model

Please do not change files in other directories — this will make pulling recent changes easier.

How to use

Windows

Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.

1. Clone the repo and build it:

git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake -DBUILD_SHARED_LIBS=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF .
cmake --build . --config Release

If everything went OK, bin\Release\rwkv.dll file should appear.

2. Download an RWKV model from Huggingface and convert it into `ggml` format:

python rwkv\convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32

3. Use the model in Python:

# These files are located in rwkv directory
import rwkv_cpp_model
import rwkv_cpp_shared_library

model = rwkv_cpp_model.RWKVModel(
    rwkv_cpp_shared_library.load_rwkv_shared_library(),
    r'C:\rwkv.cpp-169M.bin'
)

logits, state = None, None

for token in [1, 2, 3]:
    logits, state = model.eval(token, state)
    
    print(f'Output logits: {logits}')

# Don't forget to free the memory after you've done working with the model
model.free()