Tom Aarsen commited on
Commit
008f257
·
2 Parent(s): c0c6d64 c5227da

Merge branch 'main' into pr/1, resolve merge conflict

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -8983,6 +8983,25 @@ The core training code will be integrated into the rag-retrieval library(https:/
8983
 
8984
  This work was accomplished during my free time; please grant time a little time.
8985
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8986
  ## Usage
8987
  ```python
8988
  import torch
@@ -9048,5 +9067,9 @@ if __name__ == "__main__":
9048
  # [0.3226, 0.3054, 0.7421, 0.5484]])
9049
  ```
9050
 
 
 
 
 
9051
  ## License
9052
  **This model should not be used for any commercial purpose!**
 
8983
 
8984
  This work was accomplished during my free time; please grant time a little time.
8985
 
8986
+
8987
+ Here's a short introduction to the training method:
8988
+
8989
+ The core idea of jasper and stella is distillation: **Let student model learn teacher model's vectors.**
8990
+ The training process of jasper have 4 stage:
8991
+
8992
+ Stage1&2: Distill from teacher vectors. In jasper model the teacher model is nvidia/NV-Embed-v2 and dunzhang/stella_en_1.5B_v5 (Stage1 and Stage2 will freeze different parameters.)
8993
+
8994
+ Stage3: MRL training, I made some modifications to MRL to enable training on unsupervised text
8995
+
8996
+ Stage4: Alignment between *jasper token embeddings from image's detailed caption* and *vision embeddings from google/siglip-so400m-patch14-384*.
8997
+
8998
+ I use a AdaptiveAvgPool2d to do an adjustment on vision tokens' number and dimensions, this method does not need additional parameters.
8999
+
9000
+ **The meaning of distillation is to achieve better results with smaller models or as a way of pre-training, not to hit the top of the leaderboards.**
9001
+ Actually, I've got first place on MTEB (Chinese and English), I will not release the two models, as I said before, it's meaningless.
9002
+
9003
+
9004
+
9005
  ## Usage
9006
  ```python
9007
  import torch
 
9067
  # [0.3226, 0.3054, 0.7421, 0.5484]])
9068
  ```
9069
 
9070
+ ## Evaluation on MTEB
9071
+
9072
+ script: ./scripts/evaluate_en_mteb/run_evaluate_mteb.py
9073
+
9074
  ## License
9075
  **This model should not be used for any commercial purpose!**