Go to file
hypnopump a64aaa81ec
initial addition
2023-04-03 00:52:26 +02:00
.github/workflows Add GitHub workflows file 2023-04-02 16:56:04 +04:00
rwkv initial addition 2023-04-03 00:52:26 +02:00
.gitignore deploy : add a Package.swift for SwiftPM support (#393) 2023-03-28 19:39:01 +03:00
CMakeLists.txt Remove unused files 2023-04-02 12:53:41 +04:00
LICENSE Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
Makefile Remove unused files 2023-04-02 12:53:41 +04:00
README.md initial addition 2023-04-03 00:52:26 +02:00
ggml.c Fix build errors and warnings 2023-04-02 17:23:39 +04:00
ggml.h Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
rwkv.cpp Remove hardcoded memory requirements table 2023-04-02 18:37:45 +04:00
rwkv.h Remove unused files 2023-04-02 12:53:41 +04:00

README.md

rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml.

Besides the usual FP32, it supports FP16 and quantized INT4 inference on CPU. This project is CPU only.

RWKV is a novel large language model architecture, with the largest model in the family having 14B parameters. In contrast to Transformer with O(n^2) attention, RWKV requires only state from previous step to calculate logits. This makes RWKV very CPU-friendly on large context lenghts.

This project provides a C library rwkv.h and a convinient Python wrapper for it.

TODO:

  1. Measure performance and perplexity of different model sizes and data types
  2. Write a good README.md (motivation, benchmarks, perplexity) and publish links to this repo
  3. Create pull request to main ggml repo with all improvements made here

How to use

1. Clone the repo and build the library

Windows

Requirements: git, CMake, MSVC compiler.

git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake -DBUILD_SHARED_LIBS=ON .
cmake --build . --config Release

If everything went OK, bin\Release\rwkv.dll file should appear.

2. Download an RWKV model from Hugging Face like this one and convert it into ggml format

Requirements: Python 3.x with PyTorch.

# Windows
python rwkv\convert_rwkv_to_ggml.py C:\RWKV-4b-Pile-171M-20230202-7922.pth C:\rwkv.cpp-171M.bin float32
# Linux/MacOS
python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4b-Pile-171M-20230202-7922.pth ~/Downloads/rwkv.cpp-171M.bin float32

2.1. Optionally, quantize the model

To convert the model into INT4 quantized format, run:

# Windows
python rwkv\quantize.py C:\rwkv.cpp-171M.bin C:\rwkv.cpp-171M-Q4_1.bin 3
# Linux / MacOS
python rwkv/quantize.py ~/Downloads/rwkv.cpp-171M.bin ~/Downloads/rwkv.cpp-171M-Q4_1.bin 3

Pass 2 for Q4_0 format (smaller size, lower quality), 3 for Q4_1 format (larger size, higher quality).

3. Run the model

Requirements: Python 3.x with PyTorch and tokenizers.

Note: change the model path with the non-quantized model for the full weights model.

To generate some text, run:

# Windows
python rwkv\generate_completions.py C:\rwkv.cpp-171M-Q4_1.bin
# Linux / MacOS
python rwkv/generate_completions.py ~/Downloads/rwkv.cpp-171M-Q4_1.bin

To chat with a bot, run:

# Windows
python rwkv\chat_with_bot.py C:\rwkv.cpp-171M-Q4_1.bin
# Linux / MacOS
python rwkv/chat_with_bot.py ~/Downloads/rwkv.cpp-171M-Q4_1.bin

Edit generate_completions.py or chat_with_bot.py to change prompts and sampling settings.


Example of using rwkv.cpp in your custom Python script:

import rwkv_cpp_model
import rwkv_cpp_shared_library

# change by model paths used above (quantized or full weights) 
model_path = r'C:\rwkv.cpp-169M.bin'


model = rwkv_cpp_model.RWKVModel(
    rwkv_cpp_shared_library.load_rwkv_shared_library(),
    model_path
)

logits, state = None, None

for token in [1, 2, 3]:
    logits, state = model.eval(token, state)
    
    print(f'Output logits: {logits}')

# Don't forget to free the memory after you've done working with the model
model.free()