rwkv.cpp

Commit Graph

Author	SHA1	Message	Date
YorkZero	241350fde6	Feature add cublas support (#65 ) * chore: add ggml import in the head of rwkv.h * chore: add ggml import in the head of rwkv.h * feat: add cublas support * feat: update rwkv.cpp * feat: remove unused change * chore: fix linux build issue * chore: sync ggml and offload tensor to gpu * chore: comment out tensors which occurs error on GPU * chore: update comment and readme * chore: update ggml to recent * chore: add more performance test results * chore: add more performance test results * chore: fix problem of reading file more than 2 gb * chore: merge master * chore: remove unused comment * chore: fix for comments * Update README.md * Update rwkv.cpp --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-05-29 17:10:19 +05:00
Alex	dea929f8ca	Various improvements & upgrade ggml (#75 ) * Use types from typing for better compatibility with older Python versions * Split last double end of line token as per BlinkDL's suggestion * Fix MSVC warnings * Drop Q4_2 support * Update ggml * Bump file format version for quantization changes * Apply suggestions	2023-05-27 16:02:24 +05:00
LoganDark	9e2a0de843	Add rwkv_set_print_errors and rwkv_get_last_error (#68 ) * Add rwkv_set_print_errors and rwkv_get_last_error Fixes #63 This allows retrieving errors from the library without having to pipe stderr. Also it was annoying that rwkv.cpp assumed control of the caller process by doing things like calling abort() when it shouldn't, so I also fixed that. The basic way this works is: 1. by default, not much is different, except more errors are caught, and rwkv.cpp should never abort the process or throw a C++ exception. 2. the difference comes when you call rwkv_set_print_errors (working title): 1. errors will no longer be printed to stderr automatically 2. errors will be assigned to a thread-local variable (during init/quantization) or a context-local variable (during eval) 3. the last error can be retrieved using rwkv_get_last_error I also overhauled the assert macros so more error cases are handled: - the file is now closed if rwkv_init_from_file exits early - the ggml context is freed if rwkv_init_from_file exits early - if parameters cannot be found an error will be set about it I also made some optimizations: - just use fstat instead of opening the file twice - deduplicated some code / removed edge cases that do not exist - switched to ggml inplace operations where they exist test_tiny_rwkv.c seems to run perfectly fine. The Python scripts also. The built DLL is perfectly backwards compatible with existing API consumers like the python library, because it does not remove or change any functions, only adds some optional ones. The sad thing is that this will break every PR because the error handling in this library was terrible and needed to be totally redone. But I think it is worth it. * Fix typo Co-authored-by: Alex <saharNooby@users.noreply.github.com> * Visual Studio lied and _fileno is incorrect * Fix trailing comma in assert macros This was an accident left over from something that didn't pan out, some compilers do not like when function arguments have a trailing comma. * Include header file for fstat * Remove uses of std::make_unique * Fix width of format string argument on all platforms * Use C free for smart pointers * Revert "Use C free for smart pointers" and try nothrow * Initialize cgraph to zero * Fix ggml_cgraph initialization * Zero-initialize allocations --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>	2023-05-24 16:06:52 +05:00
Alex	a3178b20ea	Various improvements (#52 ) * Update ggml * Add link to pre-quantized models in README * Enable W4 for MSVC * Fix warnings, clean up code * Fix LoRA merge script	2023-05-08 14:28:54 +05:00
Alex	1198892888	Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44 ) * Remove Q4_3 support * Add Q5_0, Q5_1, Q8_0 support * Add more clear message when loading Q4_3 model * Remove Q4_1_O format * Fix indentation in .gitmodules * Simplify sanitizer matrix	2023-04-29 17:39:11 +05:00
Alex	3587ff9e58	Sync ggml with upstream (#38 ) * Sync ggml with upstream * Remove file filters from Actions triggers * Update ggml * Add Q4_2 and Q4_3 support * Improve output of perplexity measuring script * Add tests for new formats * Add token limit argument to perplexity measuring script * Update README * Update README * Update ggml * Use master branch of ggml	2023-04-22 20:25:29 +05:00
Alex	1be9fda248	Add robust automatic testing (#33 )	2023-04-20 11:00:35 +05:00
saharNooby	1ecbad3a65	Remove unused files	2023-04-02 12:53:41 +04:00
Georgi Gerganov	d502bc7c9d	tests : free llama context at the end of the test	2023-03-28 19:51:55 +03:00
Stephan Walter	436e561931	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Stephan Walter	c1f885067c	ggml : introduce structs for the q4 data blocks (#356 ) * Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 18:56:03 +03:00
Georgi Gerganov	a316a425d0	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Stephan Walter	69c92298a9	Deduplicate q4 quantization functions (#383 ) * Deduplicate q4 quantization functions * Use const; add basic test * Re-enable quantization test * Disable AVX2 flags in CI --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 19:29:06 +02:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00

15 Commits