From 874826cb20d5d65045514dc091d6df0bd79fbb1c Mon Sep 17 00:00:00 2001 From: saharNooby Date: Sat, 8 Apr 2023 10:45:42 +0400 Subject: [PATCH] Update README.md --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a3ed331..e709bd6 100644 --- a/README.md +++ b/README.md @@ -10,9 +10,10 @@ This project provides [a C library rwkv.h](rwkv.h) and [a convinient Python wrap **TODO (contributions welcome!)**: -1. Measure latency and perplexity of different model sizes (169M to 14B) and data types (FP32, FP16, Q4_0, Q4_1, Q4_1_O) -2. Test on Linux (including Colab) and MacOS -3. Make required memory calculation more robust (see #4) +1. Optimize AVX2 implementation of `Q4_1_O` matmul — currently, it is as slow as `FP32` +2. Measure latency and perplexity of different model sizes (169M to 14B) and data types (`FP32`, `FP16`, `Q4_0`, `Q4_1`, `Q4_1_O`) +3. Test on Linux (including Colab) and MacOS +4. Make required memory calculation more robust (see [#4](https://github.com/saharNooby/rwkv.cpp/issues/4)) ## How to use @@ -88,9 +89,9 @@ python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M- Formats available: -- `4`: `Q4_1_O`, preserves outliers, best quality, very slow (as FP32). -- `3`: `Q4_1`, preserves range, poor quality, very fast (as FP16). -- `2`: `Q4_0`, worst quality, moderately fast (between FP16 and FP32). +- `4`: `Q4_1_O`, best quality, very slow (as `FP32`). +- `3`: `Q4_1`, poor quality, very fast (as `FP16`). +- `2`: `Q4_0`, worst quality, breaks larger models, moderately fast (between `FP16` and `FP32`). ### 4. Run the model