69 lines
1.9 KiB
Markdown
69 lines
1.9 KiB
Markdown
# rwkv.cpp file format
|
|
|
|
This format is used by `rwkv.cpp` to store RWKV model checkpoints.
|
|
|
|
Preferred file extension: `.bin`
|
|
|
|
Specification in C-like pseudocode:
|
|
|
|
```
|
|
RWKVModelFile {
|
|
// All ints and floats are in machine byte order.
|
|
// Magic is "ggml" string bytes.
|
|
int32 magic = 0x67676d66;
|
|
// Can be either 100 or 101. See "File versions" section below for details.
|
|
int32 version = 101;
|
|
int32 n_vocab;
|
|
int32 n_embed;
|
|
int32 n_layer;
|
|
// Data type of most of the parameters. See "Data types" below for possible values.
|
|
int32 data_type;
|
|
// Read until EOF.
|
|
Parameter[] parameters;
|
|
}
|
|
|
|
Parameter {
|
|
int32 dim_count;
|
|
int32 key_length;
|
|
// Data type of the parameter. See "Data types" below for possible values.
|
|
int32 data_type;
|
|
// Compared to PyTorch's parameter.shape, dimension order is reversed here!
|
|
int32[dim_count] shape;
|
|
// Keys are like "emb.weight", "block.0.ln1.weight".
|
|
uint8[key_length] key_utf8;
|
|
// Length of the data array depends on parameter data type:
|
|
// - FP32: 4 * element_count
|
|
// - FP16: 2 * element_count
|
|
// - QX_Y (quantized): element_count / QKX_Y * sizeof(block_qx_y)
|
|
// See ggml.c for values of QK and block sizes of specific formats.
|
|
byte[] data;
|
|
}
|
|
```
|
|
|
|
## File versions
|
|
|
|
### `100`
|
|
|
|
Original version number, chosen to not interfere with `llama.cpp` file version number of `1`.
|
|
|
|
### `101`
|
|
|
|
Introduced on 2023-05-27, as `ggml` was updated to commit [00b49ec](https://github.com/ggerganov/ggml/commit/00b49ec707d73df0176e21630a6e23c2aa0e938c).
|
|
|
|
All quantized formats (`QX_Y`) were changed in a backwards-incompatible way: new version of `ggml` can not handle loading version `100` quantized models.
|
|
|
|
`FP32` and `FP16` remain the same.
|
|
|
|
## Data types
|
|
|
|
- 0: `FP32`
|
|
- 1: `FP16`
|
|
- 2: `Q4_0`
|
|
- 3: `Q4_1`
|
|
- 4: *unused*
|
|
- 5: *unused*
|
|
- 6: *unused*
|
|
- 7: `Q5_0`
|
|
- 8: `Q5_1`
|
|
- 9: `Q8_0`
|