Commit Graph

7 Commits

Author SHA1 Message Date
YorkZero 241350fde6
Feature add cublas support (#65)
* chore: add ggml import in the head of rwkv.h

* chore: add ggml import in the head of rwkv.h

* feat: add cublas support

* feat: update rwkv.cpp

* feat: remove unused change

* chore: fix linux build issue

* chore: sync ggml and offload tensor to gpu

* chore: comment out tensors which occurs error on GPU

* chore: update comment and readme

* chore: update ggml to recent

* chore: add more performance test results

* chore: add more performance test results

* chore: fix problem of reading file more than 2 gb

* chore: merge master

* chore: remove unused comment

* chore: fix for comments

* Update README.md

* Update rwkv.cpp

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-05-29 17:10:19 +05:00
Alex dea929f8ca
Various improvements & upgrade ggml (#75)
* Use types from typing for better compatibility with older Python versions

* Split last double end of line token as per BlinkDL's suggestion

* Fix MSVC warnings

* Drop Q4_2 support

* Update ggml

* Bump file format version for quantization changes

* Apply suggestions
2023-05-27 16:02:24 +05:00
LoganDark 9e2a0de843
Add rwkv_set_print_errors and rwkv_get_last_error (#68)
* Add rwkv_set_print_errors and rwkv_get_last_error

Fixes #63

This allows retrieving errors from the library without having to
pipe stderr. Also it was annoying that rwkv.cpp assumed control of
the caller process by doing things like calling abort() when it
shouldn't, so I also fixed that.

The basic way this works is:

1. by default, not much is different, except more errors are caught,
   and rwkv.cpp should never abort the process or throw a C++
   exception.

2. the difference comes when you call rwkv_set_print_errors
   (working title):

   1. errors will no longer be printed to stderr automatically
   2. errors will be assigned to a thread-local variable (during
      init/quantization) or a context-local variable (during eval)
   3. the last error can be retrieved using rwkv_get_last_error

I also overhauled the assert macros so more error cases are
handled:

- the file is now closed if rwkv_init_from_file exits early
- the ggml context is freed if rwkv_init_from_file exits early
- if parameters cannot be found an error will be set about it

I also made some optimizations:

- just use fstat instead of opening the file twice
- deduplicated some code / removed edge cases that do not exist
- switched to ggml inplace operations where they exist

test_tiny_rwkv.c seems to run perfectly fine. The Python scripts
also.

The built DLL is perfectly backwards compatible with existing API
consumers like the python library, because it does not remove or
change any functions, only adds some optional ones.

The sad thing is that this will break every PR because the error
handling in this library was terrible and needed to be totally
redone. But I think it is worth it.

* Fix typo

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Visual Studio lied and _fileno is incorrect

* Fix trailing comma in assert macros

This was an accident left over from something that didn't pan out,
some compilers do not like when function arguments have a trailing
comma.

* Include header file for fstat

* Remove uses of std::make_unique

* Fix width of format string argument on all platforms

* Use C free for smart pointers

* Revert "Use C free for smart pointers" and try nothrow

* Initialize cgraph to zero

* Fix ggml_cgraph initialization

* Zero-initialize allocations

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-05-24 16:06:52 +05:00
Alex a3178b20ea
Various improvements (#52)
* Update ggml

* Add link to pre-quantized models in README

* Enable W4 for MSVC

* Fix warnings, clean up code

* Fix LoRA merge script
2023-05-08 14:28:54 +05:00
Alex 1198892888
Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44)
* Remove Q4_3 support

* Add Q5_0, Q5_1, Q8_0 support

* Add more clear message when loading Q4_3 model

* Remove Q4_1_O format

* Fix indentation in .gitmodules

* Simplify sanitizer matrix
2023-04-29 17:39:11 +05:00
Alex 3587ff9e58
Sync ggml with upstream (#38)
* Sync ggml with upstream

* Remove file filters from Actions triggers

* Update ggml

* Add Q4_2 and Q4_3 support

* Improve output of perplexity measuring script

* Add tests for new formats

* Add token limit argument to perplexity measuring script

* Update README

* Update README

* Update ggml

* Use master branch of ggml
2023-04-22 20:25:29 +05:00
Alex 1be9fda248
Add robust automatic testing (#33) 2023-04-20 11:00:35 +05:00