nncf.data#
Functions#
|
Generates text dataset based on the model output. |
- nncf.data.generate_text_data(model, tokenizer, seq_len=32, dataset_size=128, unique_tokens_lower_limit=5)[source]#
Generates text dataset based on the model output.
Since the model is required to be the instance of the PreTrainedModel and the tokenizer is required to be the instance of the PreTrainedTokenizerBase, environment must have transformers & torch modules installed to run this method.
- Parameters:
model (TModel) – Model instance.
tokenizer (TTokenizer) – Tokenizer instance.
seq_len (int) – Sequence length for generation.
dataset_size (int) – Size of the data.
unique_tokens_lower_limit (int) –
- Returns:
List of the text data ready to use.
- Return type:
list[str]