Commit Graph

276 Commits

Author SHA1 Message Date
saharNooby 1ecbad3a65 Remove unused files 2023-04-02 12:53:41 +04:00
saharNooby 935d16f5db Move library wrapper to separate file, refactor code 2023-04-02 12:24:40 +04:00
saharNooby 38f9d02d52 Fix quantization from FP16 2023-04-01 20:01:06 +04:00
saharNooby 972e28d48d Implement INT4 conversion and inference 2023-04-01 19:22:01 +04:00
saharNooby b164bf4e27 Allocate memory as needed for specific configuration of model 2023-04-01 17:15:23 +04:00
saharNooby a1e1d34c93 Add Python wrapper for C library 2023-04-01 16:02:22 +04:00
saharNooby 7130a89d1f [FILE FORMAT CHANGED] Reverse dimensions in ggml file (makes it more similar to llama.cpp format) 2023-04-01 14:41:30 +04:00
saharNooby ac03019fcf Move model to separate C library file 2023-04-01 14:38:50 +04:00
saharNooby f6d45baec0 Support FP16 inference 2023-04-01 11:53:49 +04:00
saharNooby fe98c94a63 [FILE FORMAT CHANGED] Use ggml_get_rows to get embedding 2023-04-01 11:28:32 +04:00
saharNooby 16ec7a5c18 Add fail-fast version of the test 2023-04-01 11:15:15 +04:00
saharNooby 0fcb7c64c6 Remove reference implementation code and test against pre-created logits 2023-04-01 11:09:24 +04:00
saharNooby bf88e8a246 Update README.md 2023-04-01 10:12:10 +04:00
saharNooby 6fe9486cee Finally, FP32 inference 2023-04-01 10:06:39 +04:00
saharNooby 61c6b1a4e0 Add comparison against reference implementation script, implement state & logits saving 2023-03-31 20:23:42 +04:00
saharNooby d00f28581a Add reference implementation of RWKV RNN 2023-03-31 19:57:16 +04:00
saharNooby 02c9946b57 Update README.md 2023-03-31 19:06:31 +04:00
saharNooby 01d667f066 Implement exp, max, 1_minus_x, sigmoid operators in ggml 2023-03-31 19:04:35 +04:00
saharNooby fe272dc3d3 Minor changes 2023-03-31 10:24:12 +04:00
saharNooby 93c8dcae75 Update README.md 2023-03-30 20:37:09 +04:00
saharNooby 56bf4fc856 Implement time mixing, fix matrix shape mismatch 2023-03-30 20:29:41 +04:00
saharNooby 873cb954d0 Make ln0 work correctly 2023-03-30 20:01:26 +04:00
saharNooby 2f51451561 Initial commit 2023-03-30 17:55:30 +04:00
slaren ed3c680bcd
Fix GGML_F32Cx8_STORE in AVX without F16C path (#619) 2023-03-30 11:16:30 +02:00
anzz1 9cbc404ba6
ci : re-enable AVX512 testing (Windows-MSVC) (#584)
* CI: Re-enable AVX512 testing (Windows-MSVC)

Now with 100% less base64 encoding

* plain __cpuid is enough here
2023-03-29 23:44:39 +03:00
Georgi Gerganov b51c717d5c
ggml : init time on first ggml_init() call 2023-03-29 22:15:34 +03:00
Georgi Gerganov 0ba76c1e73
llama : fix compile warnings when reading the vocab 2023-03-29 22:13:12 +03:00
Georgi Gerganov cea1c85948
ggml : add ARM_NEON dequantize_row_q4_1() 2023-03-29 22:10:01 +03:00
Georgi Gerganov f202ada131
ggml : add ARM_NEON quantize_row_q4_1() 2023-03-29 22:03:07 +03:00
Georgi Gerganov 3b44d30d9b
ggml : add ARM_NEON ggml_vec_dot_q4_1() 2023-03-29 22:03:07 +03:00
Pavol Rusnak 61cbfff5c9
rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600)
to match filenames of other converters
2023-03-29 20:09:25 +02:00
Thérence d9ad104440
Create chat-13B.bat (#592)
* Create chat-13B.bat

Same script than chat-13B.sh, but for windows users.
Tested and working on windows 10/11 v 22H2

* Apply suggestions from code review

---------

Co-authored-by: anzz1 <anzz1@live.com>
2023-03-29 20:21:09 +03:00
Georgi Gerganov b467702b87
readme : fix typos 2023-03-29 19:38:31 +03:00
Georgi Gerganov 516d88e75c
readme : add GPT4All instructions (close #588) 2023-03-29 19:37:20 +03:00
Georgi Gerganov 53635c081c
py : add GPT4All conversion script
For now: copy-paste
Too much time for me to deduplicate the python code
2023-03-29 19:29:52 +03:00
Maël Kerbiriou 41318d708e
llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) 2023-03-29 19:10:07 +03:00
Tobias Lütke a6956b25a1
add example of re-act pattern (#583)
* add example of re-act pattern

* spelling...

* fixed whitespace in reverse prompt issue
2023-03-29 10:10:24 -05:00
anzz1 83df5639eb
Fix GCC warning about binary literal (#595)
0b10101010 -> 0xAA /* 0b10101010 */
2023-03-29 13:20:07 +00:00
anzz1 a5c42c4b13
Fix typo in llama.h (#593) 2023-03-29 13:19:29 +00:00
anzz1 5a5f8b1501
Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375)
* Enable Fused-Multiply-Add (FMA) instructions on MSVC

__FMA__ macro does not exist in MSVC

* Enable F16C/CVT16 vector extensions on MSVC

__F16C__ macro does not exist in MSVC, but is implied with AVX2/AVX512

* MSVC cvt intrinsics

* Add __SSE3__ macro for MSVC too because why not

even though it's not currently used for anything when AVX is defined
2023-03-28 22:44:29 +03:00
anzz1 f1217055ea
CI: fix subdirectory path globbing (#546)
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
2023-03-28 22:43:25 +03:00
anzz1 7f4c5c6651
llama : fix linkage with mingw (#551)
* Revert 7e53955 (#542)

Still needs to be fixed properly

* Fix linking on mingw32
2023-03-28 21:23:09 +03:00
slaren 2a98bc18ea
ggml : add AVX2 implementation of quantize_row_q4_1 (#515)
* Add AVX2 implementation of quantize_row_q4_1

* Actually use AVX2

* Make quantize_row_q4_1 static

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 21:06:03 +03:00
thement d0aaff571c
py : add temporary script to convert old ggml files to newer version (#539)
Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-28 20:55:42 +03:00
Tai Duc Nguyen d0330fd783
py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) 2023-03-28 20:51:29 +03:00
Stephan Walter 99c5b27654
ggml : refactor quantized processing functions (#509)
* Refactor quantized processing functions

* ggml : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 20:13:01 +03:00
DooWoong Lee (David) 692ce3164e
py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) 2023-03-28 20:02:34 +03:00
Georgi Gerganov 96f9c0506f
ci : make ctest verbose, hopefully we see what is wrong with the sanitizer 2023-03-28 20:01:09 +03:00
Georgi Gerganov d502bc7c9d
tests : free llama context at the end of the test 2023-03-28 19:51:55 +03:00
Stephan Walter 436e561931
all : be more strict about converting float to double (#458)
* Be more strict about converting float to double

* Test equivalence of round, SILU implementations

Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.

* Fix softmax in perplexity.cpp

* all : prefer float over double where appropriate

* perplexity : add <cmath>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28 19:48:20 +03:00