Go to file
saharNooby ee46ad208e Add quantization test back, run ggml tests on first context init 2023-04-02 13:05:17 +04:00
rwkv Move library wrapper to separate file, refactor code 2023-04-02 12:24:40 +04:00
.gitignore deploy : add a Package.swift for SwiftPM support (#393) 2023-03-28 19:39:01 +03:00
CMakeLists.txt Remove unused files 2023-04-02 12:53:41 +04:00
LICENSE Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
Makefile Remove unused files 2023-04-02 12:53:41 +04:00
README.md Remove unused files 2023-04-02 12:53:41 +04:00
ggml.c Add quantization test back, run ggml tests on first context init 2023-04-02 13:05:17 +04:00
ggml.h Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
rwkv.cpp Move library wrapper to separate file, refactor code 2023-04-02 12:24:40 +04:00
rwkv.h Remove unused files 2023-04-02 12:53:41 +04:00

README.md

rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml.

Besides usual FP32, it supports FP16 and quantized INT4 inference on CPU. This project is CPU only.

WORK IN PROGRESS! Status: INT4 gives not so good quality, need to properly measure and compare perplexity.

Plan

  1. Create Python script with sampling and simple chat interface
  2. Measure performance and quality of different model sizes and data types
  3. Write a good README.md and publish links to this repo
  4. Create pull request to main ggml repo with all improvements made here

Structure

  • ./rwkv.h, ./rwkv.cpp: source code of the shared library.
  • ./rwkv: directory containing Python scripts for conversion, inference and validation.

How to use

Windows

Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.

1. Clone the repo and build it:

git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake -DBUILD_SHARED_LIBS=ON .
cmake --build . --config Release

If everything went OK, bin\Release\rwkv.dll file should appear.

2. Download an RWKV model from Huggingface and convert it into ggml format:

python rwkv\convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32

3. Use the model in Python:

# These files are located in rwkv directory
import rwkv_cpp_model
import rwkv_cpp_shared_library

model = rwkv_cpp_model.RWKVModel(
    rwkv_cpp_shared_library.load_rwkv_shared_library(),
    r'C:\rwkv.cpp-169M.bin'
)

logits, state = None, None

for token in [1, 2, 3]:
    logits, state = model.eval(token, state)
    
    print(f'Output logits: {logits}')

# Don't forget to free the memory after you've done working with the model
model.free()