rwkv.cpp/rwkv
LoganDark 363dfb1a06
File parsing and memory usage optimization (#74)
* Rework the entire file parsing system

prepare for future changes

* Estimate memory usage perfectly

Removes whatever issue with small models that used to exist

* Fix file stream ops on macOS

for me this compiles on Windows 11, Ubuntu 20.04, and macOS 10.14

* Fix rwkv.cpp for non-WIN32 MSVC invocations like bindgen-rs

* Implement Q8_1 quantization

...and disable the type, because GGML doesn't support the ops
required to run inference with it.

It's not worth any nasty hacks or workarounds right now, Q8_0 is
very very similar if one wants 8-bit quantization.

* Completely remove Q8_1 type

This type isn't meant to be user-facing in any way so I may as well
get rid of it now since it will probably never exist as a data
format.

* Switch from std::vector to unique array for model layers

These don't ever need to be resized

* Factor ffn.key.weight height into memory estimate

some models have this set weirdly, in various different ways.
just give up and record the actual size of it and use that

* Make a few more operations inplace

ggml doesn't currently expose most of the stuff it supports, so
force some things. not 100% sure about this, I don't think the
memory savings are that worth it

* attempt a perfect upper bound size for the scratch space

This should be the largest work_size seen in any model, since it
is always larger than any of the other paramters except vocab
(which does not participate in the graph work size).

* Revert "Make a few more operations inplace"

This reverts commit f94d6eb216040ae0ad23d2b9c87fae8349882f89.

* Make less calls to fread

micro-optimization

* Fix memory size estimation for smaller models

ggml works with some larger formats internally

* print location in all assert macros

* remove trailing whitespace

* add type_to_string entry for unknown

* Simplify quantization a bit

* fix cuBLAS compatibility

adding n_gpu_layers to rwkv_init_from_file won't work.
add an extra function instead

* fix quantize

* quantize: don't create output file if opening input fails

* Rename gpu offload layers

might want to avoid branding it with cublas in case we add something
like clblast support in the future

* Remove old read_int32 and write_int32 functions

It's all uints now

* Remove static from things

* Only call gpu_offload_layers if gpu_layer_count > 0

* Add rwkv_ prefix to all structures

* Braces

* Functions naming convention

* Remove blank line after comment

* Capitalize comments

* Re-add quantize explanatory comment

* Re-add histogram comment

* Convert all error messages to uppercase

* Make type conversions extern

for ffi bindings from other langs

* Name the state parts

The code in rwkv_eval to initialize the state (when state_in is
NULL) was getting very confusing so I just put everything in a
struct to name it.

* Fnvalid
2023-05-31 16:31:19 +05:00
..
prompt punish repetitions & break if END_OF_TEXT & decouple prompts from chat script (#37) 2023-04-30 18:50:05 +05:00
20B_tokenizer.json Add text generation and chat scripts 2023-04-02 15:03:31 +04:00
chat_with_bot.py Feature add cublas support (#65) 2023-05-29 17:10:19 +05:00
convert_pytorch_to_ggml.py Various improvements & upgrade ggml (#75) 2023-05-27 16:02:24 +05:00
convert_pytorch_to_ggml.test.py Various improvements & upgrade ggml (#75) 2023-05-27 16:02:24 +05:00
generate_completions.py Flush output every token in generate_completions.py (#73) 2023-05-26 17:23:58 +05:00
measure_pexplexity.py Sync ggml with upstream (#38) 2023-04-22 20:25:29 +05:00
merge_lora_into_ggml.py Various improvements & upgrade ggml (#75) 2023-05-27 16:02:24 +05:00
quantize.py Various improvements & upgrade ggml (#75) 2023-05-27 16:02:24 +05:00
requirements.txt Add text generation and chat scripts 2023-04-02 15:03:31 +04:00
rwkv_cpp_model.py File parsing and memory usage optimization (#74) 2023-05-31 16:31:19 +05:00
rwkv_cpp_shared_library.py File parsing and memory usage optimization (#74) 2023-05-31 16:31:19 +05:00
sampling.py Add text generation and chat scripts 2023-04-02 15:03:31 +04:00