Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Error loading saved tokenizer  #1255

@imagineer258

Description

@imagineer258

❓ Questions and Help

Description

When I try to load a saved tokenizer from torchtext I get the following error :

Loading model...
terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name '__torch__.torch.classes.torchtext.RegexTokenizer':
Serialized   File "code/__torch__/torchtext/experimental/transforms.py", line 6
  training : bool
  _is_full_backward_hook : Optional[bool]
  regex_tokenizer : __torch__.torch.classes.torchtext.RegexTokenizer
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  def forward(self: __torch__.torchtext.experimental.transforms.RegexTokenizer,
    line: str) -> List[str]:

Aborted (core dumped)

I saved the tokenizer with the following code :

from torchtext.experimental.transforms import regex_tokenizer
tokenizer = regex_tokenizer([])
tokenizer_scripted = torch.jit.script(tokenizer)
tokenizer_scripted.save("tokenizer.pt")

and trying to load it back with

#include <torch/script.h>
#include <torch/nn/functional/activation.h>

#include <iostream>
#include <vector>
#include <string>

using namespace std;

int main(int argc, const char* argv[]) {

    std::cout << "Loading model...\n";

    torch::jit::script::Module module;
    try {
        module = torch::jit::load(argv[1]);
    } catch (const c10::Error& e) {
        return -1;
    }

    torch::NoGradGuard no_grad; // ensures that autograd is off

    namespace F = torch::nn::functional;
    
    torch::jit::IValue tokens_ivalue = module.forward("[email protected] 00000001");
    std::cout << "result " << tokens_ivalue << '\n';

    return 0;
}

I'm assuming I have to link the torchtext c++ code correctly somewhere in my CMakeLists.txt but I'm not sure how to do that. I tried adding the following but it didn't help :

set_target_properties(TorchText PROPERTIES IMPORTED_LOCATION <path_to_torchtext.so>)
target_link_libraries(project TorchText)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions