rwkv.cpp

Commit Graph

Author	SHA1	Message	Date
Alex	1198892888	Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44 ) * Remove Q4_3 support * Add Q5_0, Q5_1, Q8_0 support * Add more clear message when loading Q4_3 model * Remove Q4_1_O format * Fix indentation in .gitmodules * Simplify sanitizer matrix	2023-04-29 17:39:11 +05:00
Alex	3587ff9e58	Sync ggml with upstream (#38 ) * Sync ggml with upstream * Remove file filters from Actions triggers * Update ggml * Add Q4_2 and Q4_3 support * Improve output of perplexity measuring script * Add tests for new formats * Add token limit argument to perplexity measuring script * Update README * Update README * Update ggml * Use master branch of ggml	2023-04-22 20:25:29 +05:00
Alex	1be9fda248	Add robust automatic testing (#33 )	2023-04-20 11:00:35 +05:00
saharNooby	1ecbad3a65	Remove unused files	2023-04-02 12:53:41 +04:00
Georgi Gerganov	d502bc7c9d	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
Stephan Walter	436e561931	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Stephan Walter	c1f885067c	ggml : introduce structs for the q4 data blocks (#356 ) * Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 18:56:03 +03:00
Georgi Gerganov	a316a425d0	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Stephan Walter	69c92298a9	Deduplicate q4 quantization functions (#383 ) * Deduplicate q4 quantization functions * Use const; add basic test * Re-enable quantization test * Disable AVX2 flags in CI --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 19:29:06 +02:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00

11 Commits