* Allow creating multiple contexts per model This allows for parallel inference and I am preparing to support sequence mode using a method similar to this * Fix cuBLAS * Update rwkv.h Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Update rwkv.cpp Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Inherit print_errors from parent ctx when cloning * Add context cloning test * Free * Free ggml context when last rwkv_context is freed * Free before exit * int main * add explanation of ffn_key_size * Update rwkv_instance and rwkv_context comments * Thread safety notes --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com> |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
expected_logits.bin | ||
test_context_cloning.c | ||
test_ggml_basics.c | ||
test_tiny_rwkv.c | ||
tiny-rwkv-660K-FP16.bin | ||
tiny-rwkv-660K-FP32.bin |