rwkv.cpp

Commit Graph

Author	SHA1	Message	Date
Alex	dea929f8ca	Various improvements & upgrade ggml (#75 ) * Use types from typing for better compatibility with older Python versions * Split last double end of line token as per BlinkDL's suggestion * Fix MSVC warnings * Drop Q4_2 support * Update ggml * Bump file format version for quantization changes * Apply suggestions	2023-05-27 16:02:24 +05:00
LoganDark	3ca9c7f785	Move graph building into its own function (#69 ) step towards #50 and loading models from memory among other things	2023-05-26 17:30:07 +05:00
LoganDark	b61d94aef0	Flush output every token in generate_completions.py (#73 )	2023-05-26 17:23:58 +05:00
LoganDark	83983bbb84	last second move things over in the error enum (#71 ) I realized I didn't give enough space for additional failure modes to be added in the future, and I should do this as soon as possible to prevent things from being made that depend on the old constants	2023-05-26 17:22:32 +05:00
LoganDark	d26791b5bc	Silence PyTorch warnings by using untyped storage (#72 )	2023-05-26 17:21:18 +05:00
LoganDark	7cbfbc55c8	Switch to fstat64 (#70 ) Switch to fstat64	2023-05-26 17:20:51 +05:00
LoganDark	9e2a0de843	Add rwkv_set_print_errors and rwkv_get_last_error (#68 ) * Add rwkv_set_print_errors and rwkv_get_last_error Fixes #63 This allows retrieving errors from the library without having to pipe stderr. Also it was annoying that rwkv.cpp assumed control of the caller process by doing things like calling abort() when it shouldn't, so I also fixed that. The basic way this works is: 1. by default, not much is different, except more errors are caught, and rwkv.cpp should never abort the process or throw a C++ exception. 2. the difference comes when you call rwkv_set_print_errors (working title): 1. errors will no longer be printed to stderr automatically 2. errors will be assigned to a thread-local variable (during init/quantization) or a context-local variable (during eval) 3. the last error can be retrieved using rwkv_get_last_error I also overhauled the assert macros so more error cases are handled: - the file is now closed if rwkv_init_from_file exits early - the ggml context is freed if rwkv_init_from_file exits early - if parameters cannot be found an error will be set about it I also made some optimizations: - just use fstat instead of opening the file twice - deduplicated some code / removed edge cases that do not exist - switched to ggml inplace operations where they exist test_tiny_rwkv.c seems to run perfectly fine. The Python scripts also. The built DLL is perfectly backwards compatible with existing API consumers like the python library, because it does not remove or change any functions, only adds some optional ones. The sad thing is that this will break every PR because the error handling in this library was terrible and needed to be totally redone. But I think it is worth it. * Fix typo Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Visual Studio lied and _fileno is incorrect * Fix trailing comma in assert macros This was an accident left over from something that didn't pan out, some compilers do not like when function arguments have a trailing comma. * Include header file for fstat * Remove uses of std::make_unique * Fix width of format string argument on all platforms * Use C free for smart pointers * Revert "Use C free for smart pointers" and try nothrow * Initialize cgraph to zero * Fix ggml_cgraph initialization * Zero-initialize allocations --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-05-24 16:06:52 +05:00
柏园猫	1c363e6d5f	Fix encoding issue when loading prompt data (#58 ) * Fix encoding issue when loading prompt data * Update chat_with_bot.py Fix code style --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-05-13 21:53:54 +05:00
Alex	a3178b20ea	Various improvements (#52 ) * Update ggml * Add link to pre-quantized models in README * Enable W4 for MSVC * Fix warnings, clean up code * Fix LoRA merge script	2023-05-08 14:28:54 +05:00
Alex	5eb8f09c14	Various improvements (#47 ) * Update ggml * Pack only rwkv.dll for Windows releases Test executables would not be packed anymore. * Move test code into a separate file * Remove redundant zeroing * Refactor chat script	2023-04-30 20:27:14 +05:00
Jarrett Ye	3621172428	punish repetitions & break if END_OF_TEXT & decouple prompts from chat script (#37 ) * punish repetitions & break if END_OF_TEXT * decouple prompts from chat_with_bot.py * improve code style * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * add types * JSON prompt --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-04-30 18:50:05 +05:00
Alex	06dac0f80d	Use main ggml repo (#45 )	2023-04-29 21:35:36 +05:00
Alex	1198892888	Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44 ) * Remove Q4_3 support * Add Q5_0, Q5_1, Q8_0 support * Add more clear message when loading Q4_3 model * Remove Q4_1_O format * Fix indentation in .gitmodules * Simplify sanitizer matrix	2023-04-29 17:39:11 +05:00
Alex	c736ef5411	Improve chat_with_bot.py script (#39 )	2023-04-22 20:33:58 +05:00
Alex	3587ff9e58	Sync ggml with upstream (#38 ) * Sync ggml with upstream * Remove file filters from Actions triggers * Update ggml * Add Q4_2 and Q4_3 support * Improve output of perplexity measuring script * Add tests for new formats * Add token limit argument to perplexity measuring script * Update README * Update README * Update ggml * Use master branch of ggml	2023-04-22 20:25:29 +05:00
Jarrett Ye	ac663631e1	Improve the prompt & fix chinese display issue & support commands (#34 ) * update the prompt * Fix/chinese display issue * remove debug code * support commands (#1) +reset +gen +i +qq +qa +++ ++ + * run_rnn before decode * remove debug code * deep copy logits * remove extra print() * print newline if reach max_tokens_per_generation * fix typo in init prompt * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Update rwkv/chat_with_bot.py Co-authored-by: Alex <saharNooby@users.noreply.github.com> * refine code & type annotation * add comments for commands * support change temp & top_p during chat. * set default language & prompt --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-04-22 12:48:44 +05:00
Alex	1be9fda248	Add robust automatic testing (#33 )	2023-04-20 11:00:35 +05:00
saharNooby	7b28076243	Fix Q4_1_O optimization	2023-04-18 16:46:27 +04:00
saharNooby	2ef7ee0fac	Optimize Q4_1_O by moving outlier multiplication out of the dequantize+dot loop	2023-04-18 09:47:20 +04:00
Alex	0a8157d1ee	Merge pull request #28 from saharNooby/ggml-to-submodule Move ggml to submodule	2023-04-17 20:18:02 +05:00
saharNooby	82e2faa190	Update data type info	2023-04-17 19:17:47 +04:00
saharNooby	05825d2370	Fix GitHub Actions	2023-04-17 19:04:55 +04:00
saharNooby	e29da07731	Fix warnings	2023-04-17 18:57:38 +04:00
saharNooby	38eea116b8	Restore Q4_1_O support	2023-04-17 18:53:48 +04:00
saharNooby	28e354c183	Delete Makefile and make workflows	2023-04-17 17:37:09 +04:00
saharNooby	b2bdeb1d95	Use ggml as a submodule	2023-04-17 17:35:58 +04:00
saharNooby	a96ec01b1a	Revert "Replace ggml_1_minus_x with ggml_sub" This reverts commit `189ad78a0d`.	2023-04-17 16:47:11 +04:00
saharNooby	189ad78a0d	Replace ggml_1_minus_x with ggml_sub	2023-04-17 16:46:55 +04:00
saharNooby	2f37c6b019	Fix FP16 lookup table	2023-04-17 16:39:43 +04:00
saharNooby	678f5233a5	Add LoRA loading support	2023-04-15 20:46:30 +04:00
saharNooby	e4268a36c8	Update file format documentation	2023-04-14 18:59:16 +04:00
Alex	e84c446d95	Merge pull request #20 from BrutalCoding/patch-1 fix: Mention of incorrect filename for MacOS cmake build artifact	2023-04-10 09:48:31 +05:00
Daniel Breedeveld	70f7eece06	fix: Mention of incorrect filename for MacOS cmake build artifact Executing the cmake build produces "librwkv.dylib" on MacOS (tested on Ventura 13.3.1)	2023-04-10 02:01:28 +08:00
saharNooby	4f315441ba	Merge remote-tracking branch 'origin/master'	2023-04-08 19:39:47 +04:00
saharNooby	7437e1d860	Clarify that we now have binaries for Linux/MacOS	2023-04-08 19:39:31 +04:00
Alex	5d99741eab	Merge pull request #18 from yorkzero831/master Update github action to support linux and macos asset uploading	2023-04-08 20:37:01 +05:00
YorkZero	5662bf4b4f	chore: make the asset file at the root of the zip file	2023-04-09 00:32:32 +09:00
YorkZero	a3fe1c63d8	chore: align asset file name	2023-04-09 00:21:30 +09:00
YorkZero	37f890ff3e	chore: update github action chore: update github action chore: update github action	2023-04-08 23:18:31 +09:00
Alex	84e0698f2b	Merge pull request #16 from saharNooby/outliers-preserving-quantization-PR Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32	2023-04-08 16:51:47 +05:00
saharNooby	874826cb20	Update README.md	2023-04-08 10:45:42 +04:00
saharNooby	85db23c7de	Add script that measures perplexity	2023-04-08 10:41:16 +04:00
saharNooby	e04baa032c	Remove reference impl comparison test	2023-04-08 10:01:29 +04:00
saharNooby	edd57a186c	Update README.md	2023-04-07 10:16:12 +04:00
saharNooby	e26b408ea7	Add Q4_1_O test	2023-04-07 10:12:19 +04:00
saharNooby	18bf02fea4	Use ggml function for parameter size calculation	2023-04-07 10:01:04 +04:00
saharNooby	c40941d9d0	Add Q4_1_O format	2023-04-07 09:55:39 +04:00
saharNooby	ec99bc1765	Do not quantize head	2023-04-06 20:30:32 +04:00
saharNooby	058b5cd1e6	Show file compression ratio	2023-04-06 20:29:58 +04:00
saharNooby	fa9ad13a39	Free ggml context when model is garbage collected	2023-04-06 20:27:33 +04:00

1 2 3 4 5 ...

351 Commits All Branches Search

351 Commits

All Branches