Update data type info

2023-04-17 19:17:47 +04:00 · 2023-04-17 19:17:47 +04:00 · 82e2faa190
parent 05825d2370
commit 82e2faa190
1 changed files with 4 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -12,7 +12,7 @@ Loading LoRA checkpoints in [Blealtan's format](https://github.com/Blealtan/RWKV
 **TODO (contributions welcome!)**:
-1. Optimize AVX2 implementation of `Q4_1_O` matmul — currently, it is as slow as `FP32`
+1. Optimize AVX2 implementation of `Q4_1_O` matmul — currently, it is 40% slower than `Q4_1`
 2. Measure latency and perplexity of different model sizes (169M to 14B) and data types (`FP32`, `FP16`, `Q4_0`, `Q4_1`, `Q4_1_O`)
 3. Test on Linux (including Colab) and MacOS
 4. Make required memory calculation more robust (see [#4](https://github.com/saharNooby/rwkv.cpp/issues/4))
@ -91,9 +91,9 @@ python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-
 Formats available:
- `4`: `Q4_1_O`, best quality, very slow (as `FP32`).
+- `4`: `Q4_1_O`, best quality, slow (30% slower than `FP16`).
- `3`: `Q4_1`, poor quality, very fast (as `FP16`).
+- `3`: `Q4_1`, poor quality, fast (comparable to `FP16`).
- `2`: `Q4_0`, worst quality, breaks larger models, moderately fast (between `FP16` and `FP32`).
+- `2`: `Q4_0`, worst quality, breaks larger models, very fast.
 ### 4. Run the model