* chore: add ggml import in the head of rwkv.h
* chore: add ggml import in the head of rwkv.h
* feat: add cublas support
* feat: update rwkv.cpp
* feat: remove unused change
* chore: fix linux build issue
* chore: sync ggml and offload tensor to gpu
* chore: comment out tensors which occurs error on GPU
* chore: update comment and readme
* chore: update ggml to recent
* chore: add more performance test results
* chore: add more performance test results
* chore: fix problem of reading file more than 2 gb
* chore: merge master
* chore: remove unused comment
* chore: fix for comments
* Update README.md
* Update rwkv.cpp
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Use types from typing for better compatibility with older Python versions
* Split last double end of line token as per BlinkDL's suggestion
* Fix MSVC warnings
* Drop Q4_2 support
* Update ggml
* Bump file format version for quantization changes
* Apply suggestions
* Remove Q4_3 support
* Add Q5_0, Q5_1, Q8_0 support
* Add more clear message when loading Q4_3 model
* Remove Q4_1_O format
* Fix indentation in .gitmodules
* Simplify sanitizer matrix