Update README.md
This commit is contained in:
parent
85db23c7de
commit
874826cb20
13
README.md
13
README.md
|
@ -10,9 +10,10 @@ This project provides [a C library rwkv.h](rwkv.h) and [a convinient Python wrap
|
||||||
|
|
||||||
**TODO (contributions welcome!)**:
|
**TODO (contributions welcome!)**:
|
||||||
|
|
||||||
1. Measure latency and perplexity of different model sizes (169M to 14B) and data types (FP32, FP16, Q4_0, Q4_1, Q4_1_O)
|
1. Optimize AVX2 implementation of `Q4_1_O` matmul — currently, it is as slow as `FP32`
|
||||||
2. Test on Linux (including Colab) and MacOS
|
2. Measure latency and perplexity of different model sizes (169M to 14B) and data types (`FP32`, `FP16`, `Q4_0`, `Q4_1`, `Q4_1_O`)
|
||||||
3. Make required memory calculation more robust (see #4)
|
3. Test on Linux (including Colab) and MacOS
|
||||||
|
4. Make required memory calculation more robust (see [#4](https://github.com/saharNooby/rwkv.cpp/issues/4))
|
||||||
|
|
||||||
## How to use
|
## How to use
|
||||||
|
|
||||||
|
@ -88,9 +89,9 @@ python rwkv/quantize.py ~/Downloads/rwkv.cpp-169M.bin ~/Downloads/rwkv.cpp-169M-
|
||||||
|
|
||||||
Formats available:
|
Formats available:
|
||||||
|
|
||||||
- `4`: `Q4_1_O`, preserves outliers, best quality, very slow (as FP32).
|
- `4`: `Q4_1_O`, best quality, very slow (as `FP32`).
|
||||||
- `3`: `Q4_1`, preserves range, poor quality, very fast (as FP16).
|
- `3`: `Q4_1`, poor quality, very fast (as `FP16`).
|
||||||
- `2`: `Q4_0`, worst quality, moderately fast (between FP16 and FP32).
|
- `2`: `Q4_0`, worst quality, breaks larger models, moderately fast (between `FP16` and `FP32`).
|
||||||
|
|
||||||
### 4. Run the model
|
### 4. Run the model
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue