Commit Graph

351 Commits

Author SHA1 Message Date
Alex dea929f8ca
Various improvements & upgrade ggml (#75)
* Use types from typing for better compatibility with older Python versions

* Split last double end of line token as per BlinkDL's suggestion

* Fix MSVC warnings

* Drop Q4_2 support

* Update ggml

* Bump file format version for quantization changes

* Apply suggestions
2023-05-27 16:02:24 +05:00
LoganDark 3ca9c7f785
Move graph building into its own function (#69)
step towards #50 and loading models from memory among other things
2023-05-26 17:30:07 +05:00
LoganDark b61d94aef0
Flush output every token in generate_completions.py (#73) 2023-05-26 17:23:58 +05:00
LoganDark 83983bbb84
last second move things over in the error enum (#71)
I realized I didn't give enough space for additional failure modes
to be added in the future, and I should do this as soon as possible
to prevent things from being made that depend on the old constants
2023-05-26 17:22:32 +05:00
LoganDark d26791b5bc
Silence PyTorch warnings by using untyped storage (#72) 2023-05-26 17:21:18 +05:00
LoganDark 7cbfbc55c8
Switch to fstat64 (#70)
Switch to fstat64
2023-05-26 17:20:51 +05:00
LoganDark 9e2a0de843
Add rwkv_set_print_errors and rwkv_get_last_error (#68)
* Add rwkv_set_print_errors and rwkv_get_last_error

Fixes #63

This allows retrieving errors from the library without having to
pipe stderr. Also it was annoying that rwkv.cpp assumed control of
the caller process by doing things like calling abort() when it
shouldn't, so I also fixed that.

The basic way this works is:

1. by default, not much is different, except more errors are caught,
   and rwkv.cpp should never abort the process or throw a C++
   exception.

2. the difference comes when you call rwkv_set_print_errors
   (working title):

   1. errors will no longer be printed to stderr automatically
   2. errors will be assigned to a thread-local variable (during
      init/quantization) or a context-local variable (during eval)
   3. the last error can be retrieved using rwkv_get_last_error

I also overhauled the assert macros so more error cases are
handled:

- the file is now closed if rwkv_init_from_file exits early
- the ggml context is freed if rwkv_init_from_file exits early
- if parameters cannot be found an error will be set about it

I also made some optimizations:

- just use fstat instead of opening the file twice
- deduplicated some code / removed edge cases that do not exist
- switched to ggml inplace operations where they exist

test_tiny_rwkv.c seems to run perfectly fine. The Python scripts
also.

The built DLL is perfectly backwards compatible with existing API
consumers like the python library, because it does not remove or
change any functions, only adds some optional ones.

The sad thing is that this will break every PR because the error
handling in this library was terrible and needed to be totally
redone. But I think it is worth it.

* Fix typo

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Visual Studio lied and _fileno is incorrect

* Fix trailing comma in assert macros

This was an accident left over from something that didn't pan out,
some compilers do not like when function arguments have a trailing
comma.

* Include header file for fstat

* Remove uses of std::make_unique

* Fix width of format string argument on all platforms

* Use C free for smart pointers

* Revert "Use C free for smart pointers" and try nothrow

* Initialize cgraph to zero

* Fix ggml_cgraph initialization

* Zero-initialize allocations

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-05-24 16:06:52 +05:00
柏园猫 1c363e6d5f
Fix encoding issue when loading prompt data (#58)
* Fix encoding issue when loading prompt data

* Update chat_with_bot.py

Fix code style

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-05-13 21:53:54 +05:00
Alex a3178b20ea
Various improvements (#52)
* Update ggml

* Add link to pre-quantized models in README

* Enable W4 for MSVC

* Fix warnings, clean up code

* Fix LoRA merge script
2023-05-08 14:28:54 +05:00
Alex 5eb8f09c14
Various improvements (#47)
* Update ggml

* Pack only rwkv.dll for Windows releases

Test executables would not be packed anymore.

* Move test code into a separate file

* Remove redundant zeroing

* Refactor chat script
2023-04-30 20:27:14 +05:00
Jarrett Ye 3621172428
punish repetitions & break if END_OF_TEXT & decouple prompts from chat script (#37)
* punish repetitions & break if END_OF_TEXT

* decouple prompts from chat_with_bot.py

* improve code style

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* add types

* JSON prompt

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-30 18:50:05 +05:00
Alex 06dac0f80d
Use main ggml repo (#45) 2023-04-29 21:35:36 +05:00
Alex 1198892888
Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44)
* Remove Q4_3 support

* Add Q5_0, Q5_1, Q8_0 support

* Add more clear message when loading Q4_3 model

* Remove Q4_1_O format

* Fix indentation in .gitmodules

* Simplify sanitizer matrix
2023-04-29 17:39:11 +05:00
Alex c736ef5411
Improve chat_with_bot.py script (#39) 2023-04-22 20:33:58 +05:00
Alex 3587ff9e58
Sync ggml with upstream (#38)
* Sync ggml with upstream

* Remove file filters from Actions triggers

* Update ggml

* Add Q4_2 and Q4_3 support

* Improve output of perplexity measuring script

* Add tests for new formats

* Add token limit argument to perplexity measuring script

* Update README

* Update README

* Update ggml

* Use master branch of ggml
2023-04-22 20:25:29 +05:00
Jarrett Ye ac663631e1
Improve the prompt & fix chinese display issue & support commands (#34)
* update the prompt

* Fix/chinese display issue

* remove debug code

* support commands (#1)

+reset +gen +i +qq +qa +++ ++ +

* run_rnn before decode

* remove debug code

* deep copy logits

* remove extra print()

* print newline if reach max_tokens_per_generation

* fix typo in init prompt

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* refine code & type annotation

* add comments for commands

* support change temp & top_p during chat.

* set default language & prompt

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-22 12:48:44 +05:00
Alex 1be9fda248
Add robust automatic testing (#33) 2023-04-20 11:00:35 +05:00
saharNooby 7b28076243 Fix Q4_1_O optimization 2023-04-18 16:46:27 +04:00
saharNooby 2ef7ee0fac Optimize Q4_1_O by moving outlier multiplication out of the dequantize+dot loop 2023-04-18 09:47:20 +04:00
Alex 0a8157d1ee
Merge pull request #28 from saharNooby/ggml-to-submodule
Move ggml to submodule
2023-04-17 20:18:02 +05:00
saharNooby 82e2faa190 Update data type info 2023-04-17 19:17:47 +04:00
saharNooby 05825d2370 Fix GitHub Actions 2023-04-17 19:04:55 +04:00
saharNooby e29da07731 Fix warnings 2023-04-17 18:57:38 +04:00
saharNooby 38eea116b8 Restore Q4_1_O support 2023-04-17 18:53:48 +04:00
saharNooby 28e354c183 Delete Makefile and make workflows 2023-04-17 17:37:09 +04:00
saharNooby b2bdeb1d95 Use ggml as a submodule 2023-04-17 17:35:58 +04:00
saharNooby a96ec01b1a Revert "Replace ggml_1_minus_x with ggml_sub"
This reverts commit 189ad78a0d.
2023-04-17 16:47:11 +04:00
saharNooby 189ad78a0d Replace ggml_1_minus_x with ggml_sub 2023-04-17 16:46:55 +04:00
saharNooby 2f37c6b019 Fix FP16 lookup table 2023-04-17 16:39:43 +04:00
saharNooby 678f5233a5 Add LoRA loading support 2023-04-15 20:46:30 +04:00
saharNooby e4268a36c8 Update file format documentation 2023-04-14 18:59:16 +04:00
Alex e84c446d95
Merge pull request #20 from BrutalCoding/patch-1
fix: Mention of incorrect filename for MacOS cmake build artifact
2023-04-10 09:48:31 +05:00
Daniel Breedeveld 70f7eece06
fix: Mention of incorrect filename for MacOS cmake build artifact
Executing the cmake build produces "librwkv.dylib" on MacOS (tested on Ventura 13.3.1)
2023-04-10 02:01:28 +08:00
saharNooby 4f315441ba Merge remote-tracking branch 'origin/master' 2023-04-08 19:39:47 +04:00
saharNooby 7437e1d860 Clarify that we now have binaries for Linux/MacOS 2023-04-08 19:39:31 +04:00
Alex 5d99741eab
Merge pull request #18 from yorkzero831/master
Update github action to support linux and macos asset uploading
2023-04-08 20:37:01 +05:00
YorkZero 5662bf4b4f chore: make the asset file at the root of the zip file 2023-04-09 00:32:32 +09:00
YorkZero a3fe1c63d8 chore: align asset file name 2023-04-09 00:21:30 +09:00
YorkZero 37f890ff3e chore: update github action
chore: update github action

chore: update github action
2023-04-08 23:18:31 +09:00
Alex 84e0698f2b
Merge pull request #16 from saharNooby/outliers-preserving-quantization-PR
Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32
2023-04-08 16:51:47 +05:00
saharNooby 874826cb20 Update README.md 2023-04-08 10:45:42 +04:00
saharNooby 85db23c7de Add script that measures perplexity 2023-04-08 10:41:16 +04:00
saharNooby e04baa032c Remove reference impl comparison test 2023-04-08 10:01:29 +04:00
saharNooby edd57a186c Update README.md 2023-04-07 10:16:12 +04:00
saharNooby e26b408ea7 Add Q4_1_O test 2023-04-07 10:12:19 +04:00
saharNooby 18bf02fea4 Use ggml function for parameter size calculation 2023-04-07 10:01:04 +04:00
saharNooby c40941d9d0 Add Q4_1_O format 2023-04-07 09:55:39 +04:00
saharNooby ec99bc1765 Do not quantize head 2023-04-06 20:30:32 +04:00
saharNooby 058b5cd1e6 Show file compression ratio 2023-04-06 20:29:58 +04:00
saharNooby fa9ad13a39 Free ggml context when model is garbage collected 2023-04-06 20:27:33 +04:00