* Allow creating multiple contexts per model
This allows for parallel inference and I am preparing to support
sequence mode using a method similar to this
* Fix cuBLAS
* Update rwkv.h
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv.cpp
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Inherit print_errors from parent ctx when cloning
* Add context cloning test
* Free
* Free ggml context when last rwkv_context is freed
* Free before exit
* int main
* add explanation of ffn_key_size
* Update rwkv_instance and rwkv_context comments
* Thread safety notes
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>