Commit Graph

  • e9ccfc44fd flask server added master ed barz 2023-07-18 22:27:09 +0200
  • 70e5f07d5f README update ed barz 2023-06-12 08:07:14 +0200
  • 099392db01 ??? ed barz 2023-06-12 08:00:48 +0200
  • 446a5ecf8c rm ggml ed barz 2023-06-12 08:00:15 +0200
  • 3f267fe1b9 ref fix ed barz 2023-06-12 07:58:11 +0200
  • d67fbad269 remove ref to ggml repo ed barz 2023-06-12 07:55:38 +0200
  • b88ae59604
    Fix bug in world tokenizer (#93) Mathmagician8191 2023-06-11 18:46:54 +1200
  • 82c4ac78f4
    Add support for the world tokenizer (#86) Mathmagician8191 2023-06-08 23:37:18 +1200
  • 09ec3145b3
    Fix visual bug in quantization (#92) LoganDark 2023-06-07 04:45:21 -0700
  • 5b41cd7e5d
    Add capability for extra binaries to be built with rwkv.cpp (#87) LoganDark 2023-06-03 03:44:50 -0700
  • fb6708b555
    Fix pytorch storage warnings, fixes #80 (#88) LoganDark 2023-06-03 03:09:51 -0700
  • 3f8bb2c080
    Allow creating multiple contexts per model (#83) LoganDark 2023-06-03 03:06:24 -0700
  • 363dfb1a06
    File parsing and memory usage optimization (#74) LoganDark 2023-05-31 04:31:19 -0700
  • 241350fde6
    Feature add cublas support (#65) YorkZero 2023-05-29 21:10:19 +0900
  • dea929f8ca
    Various improvements & upgrade ggml (#75) Alex 2023-05-27 16:02:24 +0500
  • 3ca9c7f785
    Move graph building into its own function (#69) LoganDark 2023-05-26 05:30:07 -0700
  • b61d94aef0
    Flush output every token in generate_completions.py (#73) LoganDark 2023-05-26 05:23:58 -0700
  • 83983bbb84
    last second move things over in the error enum (#71) LoganDark 2023-05-26 05:22:32 -0700
  • d26791b5bc
    Silence PyTorch warnings by using untyped storage (#72) LoganDark 2023-05-26 05:21:18 -0700
  • 7cbfbc55c8
    Switch to fstat64 (#70) LoganDark 2023-05-26 05:20:51 -0700
  • 9e2a0de843
    Add rwkv_set_print_errors and rwkv_get_last_error (#68) LoganDark 2023-05-24 04:06:52 -0700
  • 1c363e6d5f
    Fix encoding issue when loading prompt data (#58) 柏园猫 2023-05-14 00:53:54 +0800
  • a3178b20ea
    Various improvements (#52) Alex 2023-05-08 14:28:54 +0500
  • 5eb8f09c14
    Various improvements (#47) Alex 2023-04-30 20:27:14 +0500
  • 3621172428
    punish repetitions & break if END_OF_TEXT & decouple prompts from chat script (#37) Jarrett Ye 2023-04-30 21:50:05 +0800
  • 06dac0f80d
    Use main ggml repo (#45) Alex 2023-04-29 21:35:36 +0500
  • 1198892888
    Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44) Alex 2023-04-29 17:39:11 +0500
  • c736ef5411
    Improve chat_with_bot.py script (#39) Alex 2023-04-22 20:33:58 +0500
  • 3587ff9e58
    Sync ggml with upstream (#38) Alex 2023-04-22 20:25:29 +0500
  • ac663631e1
    Improve the prompt & fix chinese display issue & support commands (#34) Jarrett Ye 2023-04-22 15:48:44 +0800
  • 1be9fda248
    Add robust automatic testing (#33) Alex 2023-04-20 11:00:35 +0500
  • 7b28076243 Fix Q4_1_O optimization saharNooby 2023-04-18 16:46:27 +0400
  • 2ef7ee0fac Optimize Q4_1_O by moving outlier multiplication out of the dequantize+dot loop saharNooby 2023-04-18 09:47:20 +0400
  • 0a8157d1ee
    Merge pull request #28 from saharNooby/ggml-to-submodule Alex 2023-04-17 20:18:02 +0500
  • 82e2faa190 Update data type info saharNooby 2023-04-17 19:17:47 +0400
  • 05825d2370 Fix GitHub Actions saharNooby 2023-04-17 19:04:55 +0400
  • e29da07731 Fix warnings saharNooby 2023-04-17 18:57:38 +0400
  • 38eea116b8 Restore Q4_1_O support saharNooby 2023-04-17 18:53:48 +0400
  • 28e354c183 Delete Makefile and make workflows saharNooby 2023-04-17 17:37:09 +0400
  • b2bdeb1d95 Use ggml as a submodule saharNooby 2023-04-17 17:35:58 +0400
  • a96ec01b1a Revert "Replace ggml_1_minus_x with ggml_sub" saharNooby 2023-04-17 16:47:11 +0400
  • 189ad78a0d Replace ggml_1_minus_x with ggml_sub saharNooby 2023-04-17 16:46:55 +0400
  • 2f37c6b019 Fix FP16 lookup table saharNooby 2023-04-17 16:39:43 +0400
  • 678f5233a5 Add LoRA loading support saharNooby 2023-04-15 20:46:30 +0400
  • e4268a36c8 Update file format documentation saharNooby 2023-04-14 18:59:16 +0400
  • e84c446d95
    Merge pull request #20 from BrutalCoding/patch-1 Alex 2023-04-10 09:48:31 +0500
  • 70f7eece06
    fix: Mention of incorrect filename for MacOS cmake build artifact Daniel Breedeveld 2023-04-10 02:01:28 +0800
  • 4f315441ba Merge remote-tracking branch 'origin/master' saharNooby 2023-04-08 19:39:47 +0400
  • 7437e1d860 Clarify that we now have binaries for Linux/MacOS saharNooby 2023-04-08 19:39:31 +0400
  • 5d99741eab
    Merge pull request #18 from yorkzero831/master Alex 2023-04-08 20:37:01 +0500
  • 5662bf4b4f chore: make the asset file at the root of the zip file YorkZero 2023-04-09 00:32:32 +0900
  • a3fe1c63d8 chore: align asset file name YorkZero 2023-04-09 00:21:30 +0900
  • 37f890ff3e chore: update github action YorkZero 2023-04-08 23:00:31 +0900
  • 84e0698f2b
    Merge pull request #16 from saharNooby/outliers-preserving-quantization-PR Alex 2023-04-08 16:51:47 +0500
  • 874826cb20 Update README.md saharNooby 2023-04-08 10:45:42 +0400
  • 85db23c7de Add script that measures perplexity saharNooby 2023-04-08 10:41:16 +0400
  • e04baa032c Remove reference impl comparison test saharNooby 2023-04-08 10:01:29 +0400
  • edd57a186c Update README.md saharNooby 2023-04-07 10:16:12 +0400
  • e26b408ea7 Add Q4_1_O test saharNooby 2023-04-07 10:12:19 +0400
  • 18bf02fea4 Use ggml function for parameter size calculation saharNooby 2023-04-07 10:01:04 +0400
  • c40941d9d0 Add Q4_1_O format saharNooby 2023-04-07 09:55:39 +0400
  • ec99bc1765 Do not quantize head saharNooby 2023-04-06 16:26:18 +0400
  • 058b5cd1e6 Show file compression ratio saharNooby 2023-04-04 20:20:34 +0400
  • fa9ad13a39 Free ggml context when model is garbage collected saharNooby 2023-04-05 15:55:47 +0400
  • ad3a4ebc57 Add missing labels and symbols for new operators saharNooby 2023-04-06 20:26:31 +0400
  • d12088e164 Minor formatting changes saharNooby 2023-04-05 15:31:23 +0400
  • dc679bf971
    Merge pull request #14 from hypnopump/update_macos Alexander 2023-04-04 21:42:45 +0500
  • d3801340f3
    streaming output hypnopump 2023-04-04 18:27:14 +0200
  • a9cb9adfd6
    streaming output hypnopump 2023-04-04 18:27:04 +0200
  • c320573b5e
    verify instructions can be followed hypnopump 2023-04-04 17:45:55 +0200
  • f5feb7470b
    verify instructions can be followed hypnopump 2023-04-04 17:45:06 +0200
  • b75a805563
    working on macos. no point in fp32 if all weights distributed in fp16 hypnopump 2023-04-04 17:39:21 +0200
  • 77e19980e9
    Merge pull request #13 from pixelkaiser/rwkv-macos Alexander 2023-04-04 14:24:21 +0500
  • 977efba905 we actually build a dylib on macos PXLKSR 2023-04-04 10:19:06 +0200
  • aacc8b6872 Minor formatting changes saharNooby 2023-04-03 10:39:28 +0400
  • 4f1df7c89e
    Merge pull request #9 from hypnopump/more_instructions_works_linux Alexander 2023-04-03 11:35:38 +0500
  • fa74b016c6
    more details for macos/linux hypnopump 2023-04-03 08:33:57 +0200
  • bea02c4b4c
    Merge branch 'master' into more_instructions_works_linux Eric Alcaide 2023-04-03 08:29:55 +0200
  • 0a0cabc4c7
    for consistency hypnopump 2023-04-03 08:27:00 +0200
  • 6f3fb01913
    suggestions hypnopump 2023-04-03 08:25:54 +0200
  • 3535476987 Update README.md: include info about pre-compiled library saharNooby 2023-04-03 09:48:53 +0400
  • 5b2830ed30 Increase memory for overhead from 32 MB to 256 MB saharNooby 2023-04-03 09:32:58 +0400
  • a64aaa81ec
    initial addition hypnopump 2023-04-03 00:52:26 +0200
  • d62a050144 Remove hardcoded memory requirements table saharNooby 2023-04-02 18:37:45 +0400
  • 1262ad0456 Fix build errors and warnings saharNooby 2023-04-02 17:23:39 +0400
  • f2b1dad22b Add GitHub workflows file saharNooby 2023-04-02 16:56:04 +0400
  • 6b4ebc328a Update README.md saharNooby 2023-04-02 15:28:34 +0400
  • e0684e8104 Add text generation and chat scripts saharNooby 2023-04-02 15:03:31 +0400
  • ee46ad208e Add quantization test back, run ggml tests on first context init saharNooby 2023-04-02 13:05:17 +0400
  • 1ecbad3a65 Remove unused files saharNooby 2023-04-02 12:53:41 +0400
  • 935d16f5db Move library wrapper to separate file, refactor code saharNooby 2023-04-02 12:24:40 +0400
  • 38f9d02d52 Fix quantization from FP16 saharNooby 2023-04-01 20:01:06 +0400
  • 972e28d48d Implement INT4 conversion and inference saharNooby 2023-04-01 19:22:01 +0400
  • b164bf4e27 Allocate memory as needed for specific configuration of model saharNooby 2023-04-01 17:15:23 +0400
  • a1e1d34c93 Add Python wrapper for C library saharNooby 2023-04-01 16:02:22 +0400
  • 7130a89d1f [FILE FORMAT CHANGED] Reverse dimensions in ggml file (makes it more similar to llama.cpp format) saharNooby 2023-04-01 14:41:30 +0400
  • ac03019fcf Move model to separate C library file saharNooby 2023-04-01 14:38:50 +0400
  • f6d45baec0 Support FP16 inference saharNooby 2023-04-01 11:53:49 +0400
  • fe98c94a63 [FILE FORMAT CHANGED] Use ggml_get_rows to get embedding saharNooby 2023-04-01 11:28:32 +0400
  • 16ec7a5c18 Add fail-fast version of the test saharNooby 2023-04-01 11:15:15 +0400