Alex
1198892888
Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format ( #44 )
...
* Remove Q4_3 support
* Add Q5_0, Q5_1, Q8_0 support
* Add more clear message when loading Q4_3 model
* Remove Q4_1_O format
* Fix indentation in .gitmodules
* Simplify sanitizer matrix
2023-04-29 17:39:11 +05:00
Alex
c736ef5411
Improve chat_with_bot.py script ( #39 )
2023-04-22 20:33:58 +05:00
Alex
3587ff9e58
Sync ggml with upstream ( #38 )
...
* Sync ggml with upstream
* Remove file filters from Actions triggers
* Update ggml
* Add Q4_2 and Q4_3 support
* Improve output of perplexity measuring script
* Add tests for new formats
* Add token limit argument to perplexity measuring script
* Update README
* Update README
* Update ggml
* Use master branch of ggml
2023-04-22 20:25:29 +05:00
Jarrett Ye
ac663631e1
Improve the prompt & fix chinese display issue & support commands ( #34 )
...
* update the prompt
* Fix/chinese display issue
* remove debug code
* support commands (#1 )
+reset +gen +i +qq +qa +++ ++ +
* run_rnn before decode
* remove debug code
* deep copy logits
* remove extra print()
* print newline if reach max_tokens_per_generation
* fix typo in init prompt
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* refine code & type annotation
* add comments for commands
* support change temp & top_p during chat.
* set default language & prompt
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-22 12:48:44 +05:00
Alex
1be9fda248
Add robust automatic testing ( #33 )
2023-04-20 11:00:35 +05:00
saharNooby
7b28076243
Fix Q4_1_O optimization
2023-04-18 16:46:27 +04:00
saharNooby
2ef7ee0fac
Optimize Q4_1_O by moving outlier multiplication out of the dequantize+dot loop
2023-04-18 09:47:20 +04:00
Alex
0a8157d1ee
Merge pull request #28 from saharNooby/ggml-to-submodule
...
Move ggml to submodule
2023-04-17 20:18:02 +05:00
saharNooby
82e2faa190
Update data type info
2023-04-17 19:17:47 +04:00
saharNooby
05825d2370
Fix GitHub Actions
2023-04-17 19:04:55 +04:00
saharNooby
e29da07731
Fix warnings
2023-04-17 18:57:38 +04:00
saharNooby
38eea116b8
Restore Q4_1_O support
2023-04-17 18:53:48 +04:00
saharNooby
28e354c183
Delete Makefile and make workflows
2023-04-17 17:37:09 +04:00
saharNooby
b2bdeb1d95
Use ggml as a submodule
2023-04-17 17:35:58 +04:00
saharNooby
a96ec01b1a
Revert "Replace ggml_1_minus_x with ggml_sub"
...
This reverts commit 189ad78a0d
.
2023-04-17 16:47:11 +04:00
saharNooby
189ad78a0d
Replace ggml_1_minus_x with ggml_sub
2023-04-17 16:46:55 +04:00
saharNooby
2f37c6b019
Fix FP16 lookup table
2023-04-17 16:39:43 +04:00
saharNooby
678f5233a5
Add LoRA loading support
2023-04-15 20:46:30 +04:00
saharNooby
e4268a36c8
Update file format documentation
2023-04-14 18:59:16 +04:00
Alex
e84c446d95
Merge pull request #20 from BrutalCoding/patch-1
...
fix: Mention of incorrect filename for MacOS cmake build artifact
2023-04-10 09:48:31 +05:00
Daniel Breedeveld
70f7eece06
fix: Mention of incorrect filename for MacOS cmake build artifact
...
Executing the cmake build produces "librwkv.dylib" on MacOS (tested on Ventura 13.3.1)
2023-04-10 02:01:28 +08:00
saharNooby
4f315441ba
Merge remote-tracking branch 'origin/master'
2023-04-08 19:39:47 +04:00
saharNooby
7437e1d860
Clarify that we now have binaries for Linux/MacOS
2023-04-08 19:39:31 +04:00
Alex
5d99741eab
Merge pull request #18 from yorkzero831/master
...
Update github action to support linux and macos asset uploading
2023-04-08 20:37:01 +05:00
YorkZero
5662bf4b4f
chore: make the asset file at the root of the zip file
2023-04-09 00:32:32 +09:00
YorkZero
a3fe1c63d8
chore: align asset file name
2023-04-09 00:21:30 +09:00
YorkZero
37f890ff3e
chore: update github action
...
chore: update github action
chore: update github action
2023-04-08 23:18:31 +09:00
Alex
84e0698f2b
Merge pull request #16 from saharNooby/outliers-preserving-quantization-PR
...
Add Q4_1_O quantization format that preserves outliers in weights and does dot in FP32
2023-04-08 16:51:47 +05:00
saharNooby
874826cb20
Update README.md
2023-04-08 10:45:42 +04:00
saharNooby
85db23c7de
Add script that measures perplexity
2023-04-08 10:41:16 +04:00
saharNooby
e04baa032c
Remove reference impl comparison test
2023-04-08 10:01:29 +04:00
saharNooby
edd57a186c
Update README.md
2023-04-07 10:16:12 +04:00
saharNooby
e26b408ea7
Add Q4_1_O test
2023-04-07 10:12:19 +04:00
saharNooby
18bf02fea4
Use ggml function for parameter size calculation
2023-04-07 10:01:04 +04:00
saharNooby
c40941d9d0
Add Q4_1_O format
2023-04-07 09:55:39 +04:00
saharNooby
ec99bc1765
Do not quantize head
2023-04-06 20:30:32 +04:00
saharNooby
058b5cd1e6
Show file compression ratio
2023-04-06 20:29:58 +04:00
saharNooby
fa9ad13a39
Free ggml context when model is garbage collected
2023-04-06 20:27:33 +04:00
saharNooby
ad3a4ebc57
Add missing labels and symbols for new operators
2023-04-06 20:26:31 +04:00
saharNooby
d12088e164
Minor formatting changes
2023-04-05 15:31:23 +04:00
Alexander
dc679bf971
Merge pull request #14 from hypnopump/update_macos
...
Update macOS, better instructions, streaming output
2023-04-04 21:42:45 +05:00
hypnopump
d3801340f3
streaming output
2023-04-04 18:27:14 +02:00
hypnopump
a9cb9adfd6
streaming output
2023-04-04 18:27:04 +02:00
hypnopump
c320573b5e
verify instructions can be followed
2023-04-04 17:45:55 +02:00
hypnopump
f5feb7470b
verify instructions can be followed
2023-04-04 17:45:06 +02:00
hypnopump
b75a805563
working on macos. no point in fp32 if all weights distributed in fp16
2023-04-04 17:39:21 +02:00
Alexander
77e19980e9
Merge pull request #13 from pixelkaiser/rwkv-macos
...
we actually build a dylib on macos
2023-04-04 14:24:21 +05:00
PXLKSR
977efba905
we actually build a dylib on macos
2023-04-04 10:19:06 +02:00
saharNooby
aacc8b6872
Minor formatting changes
2023-04-03 10:39:28 +04:00
Alexander
4f1df7c89e
Merge pull request #9 from hypnopump/more_instructions_works_linux
...
Adds instructions and works on linux as well
2023-04-03 11:35:38 +05:00