* chore: add ggml import in the head of rwkv.h
* chore: add ggml import in the head of rwkv.h
* feat: add cublas support
* feat: update rwkv.cpp
* feat: remove unused change
* chore: fix linux build issue
* chore: sync ggml and offload tensor to gpu
* chore: comment out tensors which occurs error on GPU
* chore: update comment and readme
* chore: update ggml to recent
* chore: add more performance test results
* chore: add more performance test results
* chore: fix problem of reading file more than 2 gb
* chore: merge master
* chore: remove unused comment
* chore: fix for comments
* Update README.md
* Update rwkv.cpp
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Use types from typing for better compatibility with older Python versions
* Split last double end of line token as per BlinkDL's suggestion
* Fix MSVC warnings
* Drop Q4_2 support
* Update ggml
* Bump file format version for quantization changes
* Apply suggestions
* Update ggml
* Pack only rwkv.dll for Windows releases
Test executables would not be packed anymore.
* Move test code into a separate file
* Remove redundant zeroing
* Refactor chat script