Commit Graph

304 Commits

Author SHA1 Message Date
saharNooby ec99bc1765 Do not quantize head 2023-04-06 20:30:32 +04:00
saharNooby 058b5cd1e6 Show file compression ratio 2023-04-06 20:29:58 +04:00
saharNooby fa9ad13a39 Free ggml context when model is garbage collected 2023-04-06 20:27:33 +04:00
saharNooby ad3a4ebc57 Add missing labels and symbols for new operators 2023-04-06 20:26:31 +04:00
saharNooby d12088e164 Minor formatting changes 2023-04-05 15:31:23 +04:00
Alexander dc679bf971
Merge pull request #14 from hypnopump/update_macos
Update macOS, better instructions, streaming output
2023-04-04 21:42:45 +05:00
hypnopump d3801340f3
streaming output 2023-04-04 18:27:14 +02:00
hypnopump a9cb9adfd6
streaming output 2023-04-04 18:27:04 +02:00
hypnopump c320573b5e
verify instructions can be followed 2023-04-04 17:45:55 +02:00
hypnopump f5feb7470b
verify instructions can be followed 2023-04-04 17:45:06 +02:00
hypnopump b75a805563
working on macos. no point in fp32 if all weights distributed in fp16 2023-04-04 17:39:21 +02:00
Alexander 77e19980e9
Merge pull request #13 from pixelkaiser/rwkv-macos
we actually build a dylib on macos
2023-04-04 14:24:21 +05:00
PXLKSR 977efba905 we actually build a dylib on macos 2023-04-04 10:19:06 +02:00
saharNooby aacc8b6872 Minor formatting changes 2023-04-03 10:39:28 +04:00
Alexander 4f1df7c89e
Merge pull request #9 from hypnopump/more_instructions_works_linux
Adds instructions and works on linux as well
2023-04-03 11:35:38 +05:00
hypnopump fa74b016c6
more details for macos/linux 2023-04-03 08:33:57 +02:00
Eric Alcaide bea02c4b4c
Merge branch 'master' into more_instructions_works_linux 2023-04-03 08:29:55 +02:00
hypnopump 0a0cabc4c7
for consistency 2023-04-03 08:27:00 +02:00
hypnopump 6f3fb01913
suggestions 2023-04-03 08:25:54 +02:00
saharNooby 3535476987 Update README.md: include info about pre-compiled library 2023-04-03 09:48:53 +04:00
saharNooby 5b2830ed30 Increase memory for overhead from 32 MB to 256 MB 2023-04-03 09:32:58 +04:00
hypnopump a64aaa81ec
initial addition 2023-04-03 00:52:26 +02:00
saharNooby d62a050144 Remove hardcoded memory requirements table 2023-04-02 18:37:45 +04:00
saharNooby 1262ad0456 Fix build errors and warnings 2023-04-02 17:23:39 +04:00
saharNooby f2b1dad22b Add GitHub workflows file 2023-04-02 16:56:04 +04:00
saharNooby 6b4ebc328a Update README.md 2023-04-02 15:28:34 +04:00
saharNooby e0684e8104 Add text generation and chat scripts 2023-04-02 15:03:31 +04:00
saharNooby ee46ad208e Add quantization test back, run ggml tests on first context init 2023-04-02 13:05:17 +04:00
saharNooby 1ecbad3a65 Remove unused files 2023-04-02 12:53:41 +04:00
saharNooby 935d16f5db Move library wrapper to separate file, refactor code 2023-04-02 12:24:40 +04:00
saharNooby 38f9d02d52 Fix quantization from FP16 2023-04-01 20:01:06 +04:00
saharNooby 972e28d48d Implement INT4 conversion and inference 2023-04-01 19:22:01 +04:00
saharNooby b164bf4e27 Allocate memory as needed for specific configuration of model 2023-04-01 17:15:23 +04:00
saharNooby a1e1d34c93 Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
saharNooby 7130a89d1f [FILE FORMAT CHANGED] Reverse dimensions in ggml file (makes it more similar to llama.cpp format) 2023-04-01 14:41:30 +04:00
saharNooby ac03019fcf Move model to separate C library file 2023-04-01 14:38:50 +04:00
saharNooby f6d45baec0 Support FP16 inference 2023-04-01 11:53:49 +04:00
saharNooby fe98c94a63 [FILE FORMAT CHANGED] Use ggml_get_rows to get embedding 2023-04-01 11:28:32 +04:00
saharNooby 16ec7a5c18 Add fail-fast version of the test 2023-04-01 11:15:15 +04:00
saharNooby 0fcb7c64c6 Remove reference implementation code and test against pre-created logits 2023-04-01 11:09:24 +04:00
saharNooby bf88e8a246 Update README.md 2023-04-01 10:12:10 +04:00
saharNooby 6fe9486cee Finally, FP32 inference 2023-04-01 10:06:39 +04:00
saharNooby 61c6b1a4e0 Add comparison against reference implementation script, implement state & logits saving 2023-03-31 20:23:42 +04:00
saharNooby d00f28581a Add reference implementation of RWKV RNN 2023-03-31 19:57:16 +04:00
saharNooby 02c9946b57 Update README.md 2023-03-31 19:06:31 +04:00
saharNooby 01d667f066 Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
saharNooby fe272dc3d3 Minor changes 2023-03-31 10:24:12 +04:00
saharNooby 93c8dcae75 Update README.md 2023-03-30 20:37:09 +04:00
saharNooby 56bf4fc856 Implement time mixing, fix matrix shape mismatch 2023-03-30 20:29:41 +04:00
saharNooby 873cb954d0 Make ln0 work correctly 2023-03-30 20:01:26 +04:00