You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support DraftRetriever datastore read/write for large vocab sizes (i.e. llama3+) and REST inference for llama3 (#24)
* support large vocab sizes (i.e. llama3) for DraftRetriever datastore
* updates comments to explain implementation changes
* modify modeling_llama_kv for llama3 compatibility
self.index_file.write_u32::<LittleEndian>((self.buffer.len()*4)asu32)?;// self.buffer.len() is the length of the buffer (in # of integers). This is variable because sometimes we dump_data() early, its not always self.buffer.capacity().
116
+
// * 4 because this value will actually tell us how much space is needed for this buffer in file, and we store each as 4 bytes
115
117
118
+
// For larger vocabularies (ie > 65,535), we should write the integers as i32 instead of u16
119
+
// Keeping i32 instead of u32 so negative values can be used as pad tokens (i.e. pad_path(path, max_length, -2))
0 commit comments