Go to file
saharNooby b164bf4e27 Allocate memory as needed for specific configuration of model 2023-04-01 17:15:23 +04:00
.devops Remove oboslete command from Docker script 2023-03-23 22:39:44 +02:00
.github ci : re-enable AVX512 testing (Windows-MSVC) (#584) 2023-03-29 23:44:39 +03:00
examples Move model to separate C library file 2023-04-01 14:38:50 +04:00
models Introduce C-style API (#370) 2023-03-22 07:32:36 +02:00
prompts add example of re-act pattern (#583) 2023-03-29 10:10:24 -05:00
rwkv Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
spm-headers deploy : add a Package.swift for SwiftPM support (#393) 2023-03-28 19:39:01 +03:00
tests tests : free llama context at the end of the test 2023-03-28 19:51:55 +03:00
.dockerignore 🚀 Dockerize llamacpp (#132) 2023-03-17 10:47:06 +01:00
.gitignore deploy : add a Package.swift for SwiftPM support (#393) 2023-03-28 19:39:01 +03:00
CMakeLists.txt Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
LICENSE Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
Makefile Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
Package.swift deploy : add a Package.swift for SwiftPM support (#393) 2023-03-28 19:39:01 +03:00
README.md Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
SHA256SUMS Revert "Delete SHA256SUMS for now" (#429) 2023-03-23 15:15:48 +01:00
convert-ggml-to-pth.py rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600) 2023-03-29 20:09:25 +02:00
convert-gpt4all-to-ggml.py py : add GPT4All conversion script 2023-03-29 19:29:52 +03:00
convert-gptq-to-ggml.py Fix GPTQ converter (#423) 2023-03-23 22:18:13 +02:00
convert-pth-to-ggml.py py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) 2023-03-28 20:02:34 +03:00
convert-unversioned-ggml-to-ggml.py py : add GPT4All conversion script 2023-03-29 19:29:52 +03:00
flake.lock Nix flake (#40) 2023-03-17 23:03:48 +01:00
flake.nix Fix Nix build 2023-03-23 17:51:26 +01:00
ggml.c Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
ggml.h Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
llama.cpp llama : fix compile warnings when reading the vocab 2023-03-29 22:13:12 +03:00
llama.h Fix typo in llama.h (#593) 2023-03-29 13:19:29 +00:00
quantize.py Check the existence of f16_model_path_base in quantize.py (#574) 2023-03-28 18:06:28 +03:00
rwkv.cpp Allocate memory as needed for specific configuration of model 2023-04-01 17:15:23 +04:00
rwkv.h Add Python wrapper for C library 2023-04-01 16:02:22 +04:00

README.md

rwkv.cpp

This is a port of BlinkDL/RWKV-LM to ggerganov/ggml. The end goal is to allow 4-bit quanized inference on CPU.

WORK IN PROGRESS! Status: There is a Python wrapper, FP32 and FP16 inference work correctly. Currently, I'm working on INT4 quantization support.

Plan

  1. Make INT4 inference work
  2. Create Python script with sampling and simple chat interface
  3. Clean up the repo (remove llama related files and mentions)
  4. Write a good README.md and publish links to this repo
  5. Create pull request to main ggml repo with all improvements made here

Structure

This repo is based on the llama.cpp repo. RWKV-related code is in these directories:

  • ./rwkv: directory containing Python scripts for conversion, inference and validation
  • ./examples/main_rwkw: directory containing script that loads and infers RWKV model

Please do not change files in other directories — this will make pulling recent changes easier.

How to use

Windows

Requirements: git, CMake, MSVC compiler, Python 3.x with PyTorch.

1. Clone the repo and build it:

git clone https://github.com/saharNooby/rwkv.cpp.git
cd rwkv.cpp
cmake -DBUILD_SHARED_LIBS=ON -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=OFF .
cmake --build . --config Release

If everything went OK, bin\Release\rwkv.dll file should appear.

2. Download an RWKV model from Huggingface and convert it into ggml format:

python rwkv\convert_pytorch_rwkv_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float32

3. Use the model in Python:

# This file is located at rwkv/rwkv_cpp.py
import rwkv_cpp

model = rwkv_cpp.RWKVModel(r'bin\Release\rwkv.dll', r'C:\rwkv.cpp-169M.bin')

logits, state = None, None

for token in [1, 2, 3]:
    logits, state = model.eval(token, state)
    
    print(f'Output logits: {logits}')

# Don't forget to free memory after you've done working with the model
model.free()