Commit Graph

337 Commits

Author SHA1 Message Date
Alex 3587ff9e58
Sync ggml with upstream (#38)
* Sync ggml with upstream

* Remove file filters from Actions triggers

* Update ggml

* Add Q4_2 and Q4_3 support

* Improve output of perplexity measuring script

* Add tests for new formats

* Add token limit argument to perplexity measuring script

* Update README

* Update README

* Update ggml

* Use master branch of ggml
2023-04-22 20:25:29 +05:00
Jarrett Ye ac663631e1
Improve the prompt & fix chinese display issue & support commands (#34)
* update the prompt

* Fix/chinese display issue

* remove debug code

* support commands (#1)

+reset +gen +i +qq +qa +++ ++ +

* run_rnn before decode

* remove debug code

* deep copy logits

* remove extra print()

* print newline if reach max_tokens_per_generation

* fix typo in init prompt

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv/chat_with_bot.py

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* refine code & type annotation

* add comments for commands

* support change temp & top_p during chat.

* set default language & prompt

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-22 12:48:44 +05:00
Alex 1be9fda248
Add robust automatic testing (#33) 2023-04-20 11:00:35 +05:00
saharNooby 7b28076243 Fix Q4_1_O optimization 2023-04-18 16:46:27 +04:00
saharNooby 2ef7ee0fac Optimize Q4_1_O by moving outlier multiplication out of the dequantize+dot loop 2023-04-18 09:47:20 +04:00
Alex 0a8157d1ee
Merge pull request #28 from saharNooby/ggml-to-submodule
Move ggml to submodule
2023-04-17 20:18:02 +05:00
saharNooby 82e2faa190 Update data type info 2023-04-17 19:17:47 +04:00
saharNooby 05825d2370 Fix GitHub Actions 2023-04-17 19:04:55 +04:00
saharNooby e29da07731 Fix warnings 2023-04-17 18:57:38 +04:00
saharNooby 38eea116b8 Restore Q4_1_O support 2023-04-17 18:53:48 +04:00
saharNooby 28e354c183 Delete Makefile and make workflows 2023-04-17 17:37:09 +04:00
saharNooby b2bdeb1d95 Use ggml as a submodule 2023-04-17 17:35:58 +04:00
saharNooby a96ec01b1a Revert "Replace ggml_1_minus_x with ggml_sub"
This reverts commit 189ad78a0d.
2023-04-17 16:47:11 +04:00
saharNooby 189ad78a0d Replace ggml_1_minus_x with ggml_sub 2023-04-17 16:46:55 +04:00
saharNooby 2f37c6b019 Fix FP16 lookup table 2023-04-17 16:39:43 +04:00
saharNooby 678f5233a5 Add LoRA loading support 2023-04-15 20:46:30 +04:00
saharNooby e4268a36c8 Update file format documentation 2023-04-14 18:59:16 +04:00
Alex e84c446d95
Merge pull request #20 from BrutalCoding/patch-1
fix: Mention of incorrect filename for MacOS cmake build artifact
2023-04-10 09:48:31 +05:00
Daniel Breedeveld 70f7eece06
fix: Mention of incorrect filename for MacOS cmake build artifact
Executing the cmake build produces "librwkv.dylib" on MacOS (tested on Ventura 13.3.1)
2023-04-10 02:01:28 +08:00
saharNooby 4f315441ba Merge remote-tracking branch 'origin/master' 2023-04-08 19:39:47 +04:00
saharNooby 7437e1d860 Clarify that we now have binaries for Linux/MacOS 2023-04-08 19:39:31 +04:00
Alex 5d99741eab
Merge pull request #18 from yorkzero831/master
Update github action to support linux and macos asset uploading
2023-04-08 20:37:01 +05:00
YorkZero 5662bf4b4f chore: make the asset file at the root of the zip file 2023-04-09 00:32:32 +09:00
YorkZero a3fe1c63d8 chore: align asset file name 2023-04-09 00:21:30 +09:00
YorkZero 37f890ff3e chore: update github action
chore: update github action

chore: update github action
2023-04-08 23:18:31 +09:00
Alex 84e0698f2b
Merge pull request #16 from saharNooby/outliers-preserving-quantization-PR
Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32
2023-04-08 16:51:47 +05:00
saharNooby 874826cb20 Update README.md 2023-04-08 10:45:42 +04:00
saharNooby 85db23c7de Add script that measures perplexity 2023-04-08 10:41:16 +04:00
saharNooby e04baa032c Remove reference impl comparison test 2023-04-08 10:01:29 +04:00
saharNooby edd57a186c Update README.md 2023-04-07 10:16:12 +04:00
saharNooby e26b408ea7 Add Q4_1_O test 2023-04-07 10:12:19 +04:00
saharNooby 18bf02fea4 Use ggml function for parameter size calculation 2023-04-07 10:01:04 +04:00
saharNooby c40941d9d0 Add Q4_1_O format 2023-04-07 09:55:39 +04:00
saharNooby ec99bc1765 Do not quantize head 2023-04-06 20:30:32 +04:00
saharNooby 058b5cd1e6 Show file compression ratio 2023-04-06 20:29:58 +04:00
saharNooby fa9ad13a39 Free ggml context when model is garbage collected 2023-04-06 20:27:33 +04:00
saharNooby ad3a4ebc57 Add missing labels and symbols for new operators 2023-04-06 20:26:31 +04:00
saharNooby d12088e164 Minor formatting changes 2023-04-05 15:31:23 +04:00
Alexander dc679bf971
Merge pull request #14 from hypnopump/update_macos
Update macOS, better instructions, streaming output
2023-04-04 21:42:45 +05:00
hypnopump d3801340f3
streaming output 2023-04-04 18:27:14 +02:00
hypnopump a9cb9adfd6
streaming output 2023-04-04 18:27:04 +02:00
hypnopump c320573b5e
verify instructions can be followed 2023-04-04 17:45:55 +02:00
hypnopump f5feb7470b
verify instructions can be followed 2023-04-04 17:45:06 +02:00
hypnopump b75a805563
working on macos. no point in fp32 if all weights distributed in fp16 2023-04-04 17:39:21 +02:00
Alexander 77e19980e9
Merge pull request #13 from pixelkaiser/rwkv-macos
we actually build a dylib on macos
2023-04-04 14:24:21 +05:00
PXLKSR 977efba905 we actually build a dylib on macos 2023-04-04 10:19:06 +02:00
saharNooby aacc8b6872 Minor formatting changes 2023-04-03 10:39:28 +04:00
Alexander 4f1df7c89e
Merge pull request #9 from hypnopump/more_instructions_works_linux
Adds instructions and works on linux as well
2023-04-03 11:35:38 +05:00
hypnopump fa74b016c6
more details for macos/linux 2023-04-03 08:33:57 +02:00
Eric Alcaide bea02c4b4c
Merge branch 'master' into more_instructions_works_linux 2023-04-03 08:29:55 +02:00