SimCSE运行
运行结果

-
先安装好torch和所需库
-
先下载评估数据,并且要安装wegt以便运行bash语句
-
cd SentEval/data/downstream/ bash download_dataset.sh
-
经向作者发邮件咨询如何运行以及参数调试
-
python evaluation.py --model_name_or_path princeton-nlp/sup-simcse-bert-base-uncased
-
运行语句设置
-
python evaluation.py \ --model_name_or_path princeton-nlp/sup-simcse-bert-base-uncased \ --pooler cls \ --task_set sts \ --mode test
Evaluation
Arguments for the evaluation script are as follows,
--model_name_or_path
: The name or path of atransformers
-based pre-trained checkpoint. You can directly use the models in the above table, e.g.,princeton-nlp/sup-simcse-bert-base-uncased
.--pooler
: Pooling method. Now we supportcls
(default): Use the representation of[CLS]
token. A linear+activation layer is applied after the representation (it’s in the standard BERT implementation). If you use SimCSE, you should use this option.cls_before_pooler
: Use the representation of[CLS]
token without the extra linear+activation.avg
: Average embeddings of the last layer. If you use checkpoints of SBERT/SRoBERTa (paper), you should use this option.avg_top2
: Average embeddings of the last two layers.avg_first_last
: Average embeddings of the first and last layers. If you use vanilla BERT or RoBERTa, this works the best.
--mode
: Evaluation modetest
(default): The default test mode. To faithfully reproduce our results, you should use this option.dev
: Report the development set results. Note that in STS tasks, onlySTS-B
andSICK-R
have development sets, so we only report their numbers. It also takes a fast mode for transfer tasks, so the running time is much shorter than thetest
mode (though numbers are slightly lower).fasttest
: It is the same astest
, but with a fast mode so the running time is much shorter, but the reported numbers may be lower (only for transfer tasks).
--task_set
: What set of tasks to evaluate on (if set, it will override--tasks
)sts
(default): Evaluate on STS tasks, includingSTS 12~16
,STS-B
andSICK-R
. This is the most commonly-used set of tasks to evaluate the quality of sentence embeddings.transfer
: Evaluate on transfer tasks.full
: Evaluate on both STS and transfer tasks.na
: Manually set tasks by--tasks
.
--tasks
: Specify which dataset(s) to evaluate on. Will be overridden if--task_set
is notna
. See the code for a full list of tasks.
温馨提示: 遵纪守法, 友善评论!