* Rework the entire file parsing system
prepare for future changes
* Estimate memory usage perfectly
Removes whatever issue with small models that used to exist
* Fix file stream ops on macOS
for me this compiles on Windows 11, Ubuntu 20.04, and macOS 10.14
* Fix rwkv.cpp for non-WIN32 MSVC invocations like bindgen-rs
* Implement Q8_1 quantization
...and disable the type, because GGML doesn't support the ops
required to run inference with it.
It's not worth any nasty hacks or workarounds right now, Q8_0 is
very very similar if one wants 8-bit quantization.
* Completely remove Q8_1 type
This type isn't meant to be user-facing in any way so I may as well
get rid of it now since it will probably never exist as a data
format.
* Switch from std::vector to unique array for model layers
These don't ever need to be resized
* Factor ffn.key.weight height into memory estimate
some models have this set weirdly, in various different ways.
just give up and record the actual size of it and use that
* Make a few more operations inplace
ggml doesn't currently expose most of the stuff it supports, so
force some things. not 100% sure about this, I don't think the
memory savings are that worth it
* attempt a perfect upper bound size for the scratch space
This should be the largest work_size seen in any model, since it
is always larger than any of the other paramters except vocab
(which does not participate in the graph work size).
* Revert "Make a few more operations inplace"
This reverts commit f94d6eb216040ae0ad23d2b9c87fae8349882f89.
* Make less calls to fread
micro-optimization
* Fix memory size estimation for smaller models
ggml works with some larger formats internally
* print location in all assert macros
* remove trailing whitespace
* add type_to_string entry for unknown
* Simplify quantization a bit
* fix cuBLAS compatibility
adding n_gpu_layers to rwkv_init_from_file won't work.
add an extra function instead
* fix quantize
* quantize: don't create output file if opening input fails
* Rename gpu offload layers
might want to avoid branding it with cublas in case we add something
like clblast support in the future
* Remove old read_int32 and write_int32 functions
It's all uints now
* Remove static from things
* Only call gpu_offload_layers if gpu_layer_count > 0
* Add rwkv_ prefix to all structures
* Braces
* Functions naming convention
* Remove blank line after comment
* Capitalize comments
* Re-add quantize explanatory comment
* Re-add histogram comment
* Convert all error messages to uppercase
* Make type conversions extern
for ffi bindings from other langs
* Name the state parts
The code in rwkv_eval to initialize the state (when state_in is
NULL) was getting very confusing so I just put everything in a
struct to name it.
* Fnvalid
* chore: add ggml import in the head of rwkv.h
* chore: add ggml import in the head of rwkv.h
* feat: add cublas support
* feat: update rwkv.cpp
* feat: remove unused change
* chore: fix linux build issue
* chore: sync ggml and offload tensor to gpu
* chore: comment out tensors which occurs error on GPU
* chore: update comment and readme
* chore: update ggml to recent
* chore: add more performance test results
* chore: add more performance test results
* chore: fix problem of reading file more than 2 gb
* chore: merge master
* chore: remove unused comment
* chore: fix for comments
* Update README.md
* Update rwkv.cpp
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Use types from typing for better compatibility with older Python versions
* Split last double end of line token as per BlinkDL's suggestion
* Fix MSVC warnings
* Drop Q4_2 support
* Update ggml
* Bump file format version for quantization changes
* Apply suggestions
* Add rwkv_set_print_errors and rwkv_get_last_error
Fixes#63
This allows retrieving errors from the library without having to
pipe stderr. Also it was annoying that rwkv.cpp assumed control of
the caller process by doing things like calling abort() when it
shouldn't, so I also fixed that.
The basic way this works is:
1. by default, not much is different, except more errors are caught,
and rwkv.cpp should never abort the process or throw a C++
exception.
2. the difference comes when you call rwkv_set_print_errors
(working title):
1. errors will no longer be printed to stderr automatically
2. errors will be assigned to a thread-local variable (during
init/quantization) or a context-local variable (during eval)
3. the last error can be retrieved using rwkv_get_last_error
I also overhauled the assert macros so more error cases are
handled:
- the file is now closed if rwkv_init_from_file exits early
- the ggml context is freed if rwkv_init_from_file exits early
- if parameters cannot be found an error will be set about it
I also made some optimizations:
- just use fstat instead of opening the file twice
- deduplicated some code / removed edge cases that do not exist
- switched to ggml inplace operations where they exist
test_tiny_rwkv.c seems to run perfectly fine. The Python scripts
also.
The built DLL is perfectly backwards compatible with existing API
consumers like the python library, because it does not remove or
change any functions, only adds some optional ones.
The sad thing is that this will break every PR because the error
handling in this library was terrible and needed to be totally
redone. But I think it is worth it.
* Fix typo
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Visual Studio lied and _fileno is incorrect
* Fix trailing comma in assert macros
This was an accident left over from something that didn't pan out,
some compilers do not like when function arguments have a trailing
comma.
* Include header file for fstat
* Remove uses of std::make_unique
* Fix width of format string argument on all platforms
* Use C free for smart pointers
* Revert "Use C free for smart pointers" and try nothrow
* Initialize cgraph to zero
* Fix ggml_cgraph initialization
* Zero-initialize allocations
---------
Co-authored-by: Alex <saharNooby@users.noreply.github.com>
* Remove Q4_3 support
* Add Q5_0, Q5_1, Q8_0 support
* Add more clear message when loading Q4_3 model
* Remove Q4_1_O format
* Fix indentation in .gitmodules
* Simplify sanitizer matrix