jinaai/xlm-roberta-flash-implementation-onnx

Hello!

I have a little problem trying to re-run your code on ONNX-conversion and got an error.
I have cloned the jina-embeddings-v3 repo.
Changed the config.py file as follows:

"auto_map": { 
  "AutoConfig": "jinaai/xlm-roberta-flash-implementation-onnx--configuration_xlm_roberta.XLMRobertaFlashConfig", 
   "AutoModel": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_lora.XLMRobertaLoRA", 
   "AutoModelForMaskedLM": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForMaskedLM", 
   "AutoModelForPreTraining": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForPreTraining" 
 },

After this I'm running your code and got an error which ends like this:

File ~/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation-onnx/4f519a4108bcdaf12aad3768cffc52f0c847e7e1/modeling_lora.py:196, in LoRAParametrization.add_to_layer.<locals>.new_forward(self, input, task_id, residual)
    193 else:
    194     weights = self.weight
--> 196 out = F.linear(input, weights, self.bias)
    198 if residual:
    199     return out, input

RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16

So, my question is - what am I doing wrong?

In general my bigger goal is to make a model version quantized to int8

jinaai
/

xlm-roberta-flash-implementation-onnx

Errors on rerun your code