This website requires JavaScript.
0fcb7c64c6
Remove reference implementation code and test against pre-created logits
saharNooby
2023-04-01 11:09:24 +0400
bf88e8a246
Update README.md
saharNooby
2023-04-01 10:12:10 +0400
6fe9486cee
Finally, FP32 inference
saharNooby
2023-04-01 10:06:39 +0400
61c6b1a4e0
Add comparison against reference implementation script, implement state & logits saving
saharNooby
2023-03-31 20:23:42 +0400
d00f28581a
Add reference implementation of RWKV RNN
saharNooby
2023-03-31 19:57:16 +0400
02c9946b57
Update README.md
saharNooby
2023-03-31 19:06:31 +0400
01d667f066
Implement exp, max, 1_minus_x, sigmoid operators in ggml
saharNooby
2023-03-31 19:04:35 +0400
fe272dc3d3
Minor changes
saharNooby
2023-03-31 10:24:12 +0400
93c8dcae75
Update README.md
saharNooby
2023-03-30 20:37:09 +0400
56bf4fc856
Implement time mixing, fix matrix shape mismatch
saharNooby
2023-03-30 20:29:41 +0400
873cb954d0
Make ln0 work correctly
saharNooby
2023-03-30 20:01:26 +0400
2f51451561
Initial commit
saharNooby
2023-03-30 17:55:30 +0400
ed3c680bcd
Fix GGML_F32Cx8_STORE in AVX without F16C path (#619 )
slaren
2023-03-30 11:16:30 +0200
9cbc404ba6
ci : re-enable AVX512 testing (Windows-MSVC) (#584 )
anzz1
2023-03-29 23:44:39 +0300
b51c717d5c
ggml : init time on first ggml_init() call
Georgi Gerganov
2023-03-29 22:15:34 +0300
0ba76c1e73
llama : fix compile warnings when reading the vocab
Georgi Gerganov
2023-03-29 22:13:12 +0300
cea1c85948
ggml : add ARM_NEON dequantize_row_q4_1()
Georgi Gerganov
2023-03-29 22:10:01 +0300
f202ada131
ggml : add ARM_NEON quantize_row_q4_1()
Georgi Gerganov
2023-03-29 22:03:02 +0300
3b44d30d9b
ggml : add ARM_NEON ggml_vec_dot_q4_1()
Georgi Gerganov
2023-03-29 21:47:33 +0300
61cbfff5c9
rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600 )
Pavol Rusnak
2023-03-29 20:09:25 +0200
d9ad104440
Create chat-13B.bat (#592 )
Thérence
2023-03-29 19:21:09 +0200
b467702b87
readme : fix typos
Georgi Gerganov
2023-03-29 19:38:31 +0300
516d88e75c
readme : add GPT4All instructions (close #588 )
Georgi Gerganov
2023-03-29 19:37:20 +0300
53635c081c
py : add GPT4All conversion script
Georgi Gerganov
2023-03-29 19:29:26 +0300
41318d708e
llama : use the same threshold for OpenBLAS and ggml thread limiting (#577 )
Maël Kerbiriou
2023-03-29 18:10:07 +0200
a6956b25a1
add example of re-act pattern (#583 )
Tobias Lütke
2023-03-29 17:10:24 +0200
83df5639eb
Fix GCC warning about binary literal (#595 )
anzz1
2023-03-29 16:20:07 +0300
a5c42c4b13
Fix typo in llama.h (#593 )
anzz1
2023-03-29 16:19:29 +0300
5a5f8b1501
Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375 )
anzz1
2023-03-28 22:44:29 +0300
f1217055ea
CI: fix subdirectory path globbing (#546 )
anzz1
2023-03-28 22:43:25 +0300
7f4c5c6651
llama : fix linkage with mingw (#551 )
anzz1
2023-03-28 21:23:09 +0300
2a98bc18ea
ggml : add AVX2 implementation of quantize_row_q4_1 (#515 )
slaren
2023-03-28 20:06:03 +0200
d0aaff571c
py : add temporary script to convert old ggml files to newer version (#539 )
thement
2023-03-28 19:55:42 +0200
d0330fd783
py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403 )
Tai Duc Nguyen
2023-03-28 13:51:29 -0400
99c5b27654
ggml : refactor quantized processing functions (#509 )
Stephan Walter
2023-03-28 17:13:01 +0000
692ce3164e
py : removed unused `model` variable and verified that the code functions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547 )
DooWoong Lee (David)
2023-03-29 02:02:34 +0900
96f9c0506f
ci : make ctest verbose, hopefully we see what is wrong with the sanitizer
Georgi Gerganov
2023-03-28 20:01:09 +0300
d502bc7c9d
tests : free llama context at the end of the test
Georgi Gerganov
2023-03-28 19:51:55 +0300
436e561931
all : be more strict about converting float to double (#458 )
Stephan Walter
2023-03-28 16:48:20 +0000
20e1e84884
deploy : add a Package.swift for SwiftPM support (#393 )
Jed Fox
2023-03-28 11:39:01 -0500
c1f885067c
ggml : introduce structs for the q4 data blocks (#356 )
Stephan Walter
2023-03-28 15:56:03 +0000
e0670260fb
gitignore : add "embedding"
Georgi Gerganov
2023-03-28 18:34:35 +0300
28ba975aea
Check the existence of f16_model_path_base in quantize.py (#574 )
dotpy314
2023-03-28 23:06:28 +0800
a6bdc47cba
Fix usage of F16C intrinsics in AVX code (#563 )
slaren
2023-03-28 16:26:55 +0200
7b8dbcb78b
main.cpp fixes, refactoring (#571 )
anzz1
2023-03-28 17:09:55 +0300
4b8efff0e3
Add embedding example to Makefile (#540 )
RJ Adriaansen
2023-03-28 08:11:09 +0200
7e5395575a
Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542 )
Marco Matthies
2023-03-27 06:55:26 +0200
34c1072e49
ci: add debug build to sanitizer build matrix (#527 )
Erik Scholz
2023-03-26 17:48:40 +0200
939ad2d3a5
Fix undefined variables in debug build, remove unused variables (#531 )
Stephan Walter
2023-03-26 15:34:02 +0000
8c2ec5e21d
Add support for linux/arm64 platform during Docker Builds (#514 )
Juan Calderon-Perez
2023-03-26 10:48:42 -0400
b391579db9
Update README and comments for standalone perplexity tool (#525 )
Stephan Walter
2023-03-26 13:14:01 +0000
7a87d31f4f
[main] fix infinite generation (-n == -1) (#523 )
anzz1
2023-03-26 16:06:10 +0300
348d6926ee
Add logo to README.md
Georgi Gerganov
2023-03-26 10:20:49 +0300
33e35b8fe8
Exit from interactive mode if input stream is bad (#491 )
Harald Fernengel
2023-03-26 07:25:46 +0200
19726169b3
CI: Run other sanitizer builds even if one fails (#511 )
anzz1
2023-03-26 00:13:28 +0200
f732695cd5
Clarify console output in convert-pth-to-ggml.py (#512 )
jp-x-g
2023-03-25 14:53:55 -0700
2f7bf7dd7c
CMake / CI additions (#497 )
anzz1
2023-03-25 23:38:11 +0200
34ab526843
(Windows) Set console to UTF-8 on init (#420 )
anzz1
2023-03-25 22:29:22 +0200
c2b25b6912
Fix colors enabling on WIN32
Georgi Gerganov
2023-03-25 21:53:39 +0200
79b2b266db
If n_predict == -1, generate forever
Georgi Gerganov
2023-03-25 21:51:41 +0200
e2d490dafd
Inifinite generation via context swapping (#71 )
Georgi Gerganov
2023-03-25 21:36:22 +0200
03f7e33560
Cleanup STL headers + fix embedding examples + minor stuff
Georgi Gerganov
2023-03-25 20:51:14 +0200
55ad42af84
Move chat scripts into "./examples"
Georgi Gerganov
2023-03-25 20:36:52 +0200
459e93cce0
Add AVX2 implementation of dequantize_row_q4_1 (#505 )
slaren
2023-03-25 19:31:48 +0100
a316a425d0
Overhaul the examples structure
Georgi Gerganov
2023-03-25 20:26:40 +0200
ecbe466a36
Retire the ggml_mul_mat() branch for transposed src0 (#500 )
Georgi Gerganov
2023-03-25 19:47:21 +0200
502a400192
Disable prompt verbosity by default and add option to enable (#480 )
Georgi Gerganov
2023-03-25 17:16:50 +0200
09aecbf628
Add AVX2 implementation of dequantize_row_q4_0 (#467 )
slaren
2023-03-25 16:06:49 +0100
4640eff23d
Don't interefe with BLAS for large prompts by running only 1 thread
Georgi Gerganov
2023-03-25 17:03:10 +0200
ab77d76312
Add longer DAN prompt for testing big batch numbers
Georgi Gerganov
2023-03-25 16:47:59 +0200
29b7baab67
Add timings for the prompt evaluation (#478 )
slaren
2023-03-25 15:34:23 +0100
4a7129acd2
Remove obsolete information from README
Georgi Gerganov
2023-03-25 16:30:32 +0200
6b6dbc8910
Remove obsolete assert and fix compiler warning
Georgi Gerganov
2023-03-25 16:22:05 +0200
2a2e63ce05
Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS
Georgi Gerganov
2023-03-25 16:09:54 +0200
e899bf54b2
bounds checking for input prefix (#492 )
anzz1
2023-03-25 14:42:09 +0200
fbd4d38c64
feat: '--in-prefix STRING' option (#426 )
anzz1
2023-03-25 14:03:19 +0200
58e6c9f36f
Add support for file load progress reporting callbacks (#434 )
Jed Fox
2023-03-25 01:26:28 -0400
36d07532ef
Add missing struct annotation (#483 )
Doomsdayrs
2023-03-25 01:21:24 -0400
6f1ee4b640
Fix crash for 65B model with pre-allocated memory (#485 )
Chris Kuehl
2023-03-24 23:38:14 -0500
8520fc310e
Disable BLAS altogether - the bug is not just for qunatized mat mul
Georgi Gerganov
2023-03-24 23:47:06 +0200
b3f460e941
Disable BLAS branch in mul_mat - seems there is a bug
Georgi Gerganov
2023-03-24 23:39:17 +0200
04c6f5ed6f
Immediately start processing the prompt before user input has been provided (#476 )
Georgi Gerganov
2023-03-24 23:17:58 +0200
7a9b6c3a8b
Reduce memory usage and allocate enough memory for largest context (#473 )
Georgi Gerganov
2023-03-24 23:17:37 +0200
31572d9665
Temporary bump the memory buffer size - hopefully fix issues from 483bab2e
Georgi Gerganov
2023-03-24 18:23:56 +0200
f4f5362edb
Update README.md (#444 )
Gary Mulder
2023-03-24 15:23:09 +0000
863f65e2e3
fix instruct mode (#445 )
rabidcopy
2023-03-24 10:22:39 -0500
afd220d9c6
Properly free llama_context on failure
Georgi Gerganov
2023-03-24 17:21:01 +0200
481044d50c
additional optimizations for POWER9 (#454 )
Cameron Kaiser
2023-03-24 08:19:26 -0700
563cdc391d
Support calling mlock() on loaded model data on Linux and macOS (#453 )
comex
2023-03-24 08:19:05 -0700
8d4a855c24
Add embedding mode with arg flag. Currently working (#282 )
Luciano
2023-03-24 08:05:13 -0700
b6b268d441
Add link to Roadmap discussion
Georgi Gerganov
2023-03-24 09:13:35 +0200
3cd8dde0d1
Revert "Fix memory allocation issues and seg faults"
Georgi Gerganov
2023-03-24 06:22:28 +0200
4870e455b3
Fix memory allocation issues and seg faults
Georgi Gerganov
2023-03-24 00:11:53 +0200
483bab2e3d
Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439 )
Georgi Gerganov
2023-03-23 23:22:01 +0200
404e1da38e
Fix quantize script not finding models in parent directory (#428 )
Jed Fox
2023-03-23 16:42:52 -0400
4cc053b6d5
Remove oboslete command from Docker script
Georgi Gerganov
2023-03-23 22:39:44 +0200
0ba5a3a9a5
Obsolete
Georgi Gerganov
2023-03-23 22:32:02 +0200
2e17dfd80a
Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode (#333 )
rabidcopy
2023-03-23 15:22:47 -0500
20a1a4e09c
Fix GPTQ converter (#423 )
Timmy Knight
2023-03-23 10:18:13 -1000
ad072fc5ad
Generate library with CMake (#430 )
nusu-github
2023-03-24 05:16:48 +0900