* chore: add ggml import in the head of rwkv.h
* chore: add ggml import in the head of rwkv.h
* feat: add cublas support
* feat: update rwkv.cpp
* feat: remove unused change
* chore: fix linux build issue
* chore: sync ggml and offload tensor to gpu
* chore: comment out tensors which occurs error on GPU
* chore: update comment and readme
* chore: update ggml to recent
* chore: add more performance test results
* chore: add more performance test results
* chore: fix problem of reading file more than 2 gb
* chore: merge master
* chore: remove unused comment
* chore: fix for comments
* Update README.md
* Update rwkv.cpp
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Use types from typing for better compatibility with older Python versions
* Split last double end of line token as per BlinkDL's suggestion
* Fix MSVC warnings
* Drop Q4_2 support
* Update ggml
* Bump file format version for quantization changes
* Apply suggestions
* Remove Q4_3 support
* Add Q5_0, Q5_1, Q8_0 support
* Add more clear message when loading Q4_3 model
* Remove Q4_1_O format
* Fix indentation in .gitmodules
* Simplify sanitizer matrix
* Revert "Delete SHA256SUMS for now (#416)"
This reverts commit 8eea5ae0e5.
* Remove ggml files until they can be verified
* Remove alpaca json
* Add also model/tokenizer.model to SHA256SUMS + update README
---------
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
* Update custom.md
* Removed Model section as it is better placed in README.md
* Updates to README.md model section
* Inserted text that was removed from issue template about obtaining models from FB and links to papers describing the various models
* Removed IPF down links for the Alpaca 7B models as these look to be in the old data format and probably shouldn't be directly linked to, anyway
* Updated the perplexity section to point at Perplexity scores #406 discussion