柏园猫
1c363e6d5f
Fix encoding issue when loading prompt data ( #58 )
...
* Fix encoding issue when loading prompt data
* Update chat_with_bot.py
Fix code style
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-05-13 21:53:54 +05:00
Alex
a3178b20ea
Various improvements ( #52 )
...
* Update ggml
* Add link to pre-quantized models in README
* Enable W4 for MSVC
* Fix warnings, clean up code
* Fix LoRA merge script
2023-05-08 14:28:54 +05:00
Alex
5eb8f09c14
Various improvements ( #47 )
...
* Update ggml
* Pack only rwkv.dll for Windows releases
Test executables would not be packed anymore.
* Move test code into a separate file
* Remove redundant zeroing
* Refactor chat script
2023-04-30 20:27:14 +05:00
Jarrett Ye
3621172428
punish repetitions & break if END_OF_TEXT & decouple prompts from chat script ( #37 )
...
* punish repetitions & break if END_OF_TEXT
* decouple prompts from chat_with_bot.py
* improve code style
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* add types
* JSON prompt
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-30 18:50:05 +05:00
Alex
1198892888
Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format ( #44 )
...
* Remove Q4_3 support
* Add Q5_0, Q5_1, Q8_0 support
* Add more clear message when loading Q4_3 model
* Remove Q4_1_O format
* Fix indentation in .gitmodules
* Simplify sanitizer matrix
2023-04-29 17:39:11 +05:00
Alex
c736ef5411
Improve chat_with_bot.py script ( #39 )
2023-04-22 20:33:58 +05:00
Alex
3587ff9e58
Sync ggml with upstream ( #38 )
...
* Sync ggml with upstream
* Remove file filters from Actions triggers
* Update ggml
* Add Q4_2 and Q4_3 support
* Improve output of perplexity measuring script
* Add tests for new formats
* Add token limit argument to perplexity measuring script
* Update README
* Update README
* Update ggml
* Use master branch of ggml
2023-04-22 20:25:29 +05:00
Jarrett Ye
ac663631e1
Improve the prompt & fix chinese display issue & support commands ( #34 )
...
* update the prompt
* Fix/chinese display issue
* remove debug code
* support commands (#1 )
+reset +gen +i +qq +qa +++ ++ +
* run_rnn before decode
* remove debug code
* deep copy logits
* remove extra print()
* print newline if reach max_tokens_per_generation
* fix typo in init prompt
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Update rwkv/chat_with_bot.py
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* refine code & type annotation
* add comments for commands
* support change temp & top_p during chat.
* set default language & prompt
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
2023-04-22 12:48:44 +05:00
saharNooby
678f5233a5
Add LoRA loading support
2023-04-15 20:46:30 +04:00
saharNooby
e4268a36c8
Update file format documentation
2023-04-14 18:59:16 +04:00
saharNooby
85db23c7de
Add script that measures perplexity
2023-04-08 10:41:16 +04:00
saharNooby
e04baa032c
Remove reference impl comparison test
2023-04-08 10:01:29 +04:00
saharNooby
c40941d9d0
Add Q4_1_O format
2023-04-07 09:55:39 +04:00
saharNooby
fa9ad13a39
Free ggml context when model is garbage collected
2023-04-06 20:27:33 +04:00
hypnopump
a9cb9adfd6
streaming output
2023-04-04 18:27:04 +02:00
PXLKSR
977efba905
we actually build a dylib on macos
2023-04-04 10:19:06 +02:00
hypnopump
0a0cabc4c7
for consistency
2023-04-03 08:27:00 +02:00
hypnopump
6f3fb01913
suggestions
2023-04-03 08:25:54 +02:00
hypnopump
a64aaa81ec
initial addition
2023-04-03 00:52:26 +02:00
saharNooby
e0684e8104
Add text generation and chat scripts
2023-04-02 15:03:31 +04:00
saharNooby
935d16f5db
Move library wrapper to separate file, refactor code
2023-04-02 12:24:40 +04:00
saharNooby
972e28d48d
Implement INT4 conversion and inference
2023-04-01 19:22:01 +04:00
saharNooby
a1e1d34c93
Add Python wrapper for C library
2023-04-01 16:02:22 +04:00
saharNooby
7130a89d1f
[FILE FORMAT CHANGED] Reverse dimensions in ggml file (makes it more similar to llama.cpp format)
2023-04-01 14:41:30 +04:00
saharNooby
f6d45baec0
Support FP16 inference
2023-04-01 11:53:49 +04:00
saharNooby
fe98c94a63
[FILE FORMAT CHANGED] Use ggml_get_rows to get embedding
2023-04-01 11:28:32 +04:00
saharNooby
16ec7a5c18
Add fail-fast version of the test
2023-04-01 11:15:15 +04:00
saharNooby
0fcb7c64c6
Remove reference implementation code and test against pre-created logits
2023-04-01 11:09:24 +04:00
saharNooby
6fe9486cee
Finally, FP32 inference
2023-04-01 10:06:39 +04:00
saharNooby
61c6b1a4e0
Add comparison against reference implementation script, implement state & logits saving
2023-03-31 20:23:42 +04:00
saharNooby
d00f28581a
Add reference implementation of RWKV RNN
2023-03-31 19:57:16 +04:00
saharNooby
fe272dc3d3
Minor changes
2023-03-31 10:24:12 +04:00
saharNooby
873cb954d0
Make ln0 work correctly
2023-03-30 20:01:26 +04:00
saharNooby
2f51451561
Initial commit
2023-03-30 17:55:30 +04:00