--- pipeline_tag: token-classification datasets: - conll2003 metrics: - precision - recall - f1 - accuracy tags: - distilbert --- **task**: `token-classification` **Backend:** `sagemaker-training` **Backend args:** `{'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}` **Number of evaluation samples:** `All dataset` Fixed parameters: * **model_name_or_path**: `elastic/distilbert-base-uncased-finetuned-conll03-english` * **dataset**: * **path**: `conll2003` * **eval_split**: `validation` * **data_keys**: `{'primary': 'tokens'}` * **ref_keys**: `['ner_tags']` * **calibration_split**: `train` * **per_channel**: `False` * **calibration**: * **method**: `minmax` * **num_calibration_samples**: `100` * **framework**: `onnxruntime` * **framework_args**: * **opset**: `11` * **optimization_level**: `1` * **aware_training**: `False` Benchmarked parameters: * **quantization_approach**: `dynamic`, `static` * **operators_to_quantize**: `['Add']`, `['Add', 'MatMul']` * **node_exclusion**: `[]`, `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` # Evaluation ## Non-time metrics | quantization_approach | operators_to_quantize | node_exclusion | | precision (original) | precision (optimized) | | recall (original) | recall (optimized) | | f1 (original) | f1 (optimized) | | accuracy (original) | accuracy (optimized) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :------------------: | :-------------------: | :-: | :---------------: | :----------------: | :-: | :-----------: | :------------: | :-: | :-----------------: | :------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 0.936 | 0.934 | \| | 0.944 | 0.942 | \| | 0.940 | 0.938 | \| | 0.988 | 0.988 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 0.936 | 0.934 | \| | 0.944 | 0.942 | \| | 0.940 | 0.938 | \| | 0.988 | 0.988 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 0.936 | 0.936 | \| | 0.944 | 0.944 | \| | 0.940 | 0.940 | \| | 0.988 | 0.988 | | `dynamic` | `['Add']` | `[]` | \| | 0.936 | 0.936 | \| | 0.944 | 0.944 | \| | 0.940 | 0.940 | \| | 0.988 | 0.988 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 0.936 | 0.904 | \| | 0.944 | 0.921 | \| | 0.940 | 0.912 | \| | 0.988 | 0.984 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 0.936 | 0.065 | \| | 0.944 | 0.243 | \| | 0.940 | 0.103 | \| | 0.988 | 0.357 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 0.936 | 0.909 | \| | 0.944 | 0.930 | \| | 0.940 | 0.919 | \| | 0.988 | 0.986 | | `static` | `['Add']` | `[]` | \| | 0.936 | 0.050 | \| | 0.944 | 0.160 | \| | 0.940 | 0.076 | \| | 0.988 | 0.311 | ## Time metrics Time benchmarks were run for 15 seconds per config. Below, time metrics for batch size = 1, input length = 32. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 32.90 | 7.03 | \| | 30.40 | 142.20 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 48.27 | 7.68 | \| | 20.73 | 130.33 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 33.74 | 14.73 | \| | 29.67 | 67.93 | | `dynamic` | `['Add']` | `[]` | \| | 33.49 | 14.17 | \| | 29.87 | 70.60 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 47.72 | 8.20 | \| | 21.00 | 121.93 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 47.87 | 10.58 | \| | 20.93 | 94.60 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 45.77 | 19.00 | \| | 21.87 | 52.67 | | `static` | `['Add']` | `[]` | \| | 44.67 | 18.77 | \| | 22.40 | 53.33 | Below, time metrics for batch size = 1, input length = 64. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 59.15 | 13.60 | \| | 16.93 | 73.53 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 44.01 | 12.60 | \| | 22.73 | 79.40 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 60.50 | 29.87 | \| | 16.53 | 33.53 | | `dynamic` | `['Add']` | `[]` | \| | 45.35 | 24.10 | \| | 22.07 | 41.53 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 59.98 | 16.08 | \| | 16.73 | 62.20 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 43.23 | 19.02 | \| | 23.20 | 52.60 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 43.15 | 32.96 | \| | 23.20 | 30.40 | | `static` | `['Add']` | `[]` | \| | 44.01 | 31.68 | \| | 22.80 | 31.60 | Below, time metrics for batch size = 1, input length = 128. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 55.20 | 25.72 | \| | 18.13 | 38.93 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 73.52 | 26.70 | \| | 13.67 | 37.47 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 71.60 | 53.26 | \| | 14.00 | 18.80 | | `dynamic` | `['Add']` | `[]` | \| | 70.39 | 56.68 | \| | 14.27 | 17.67 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 71.34 | 31.75 | \| | 14.07 | 31.53 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 73.55 | 37.95 | \| | 13.60 | 26.40 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 70.28 | 62.70 | \| | 14.27 | 16.00 | | `static` | `['Add']` | `[]` | \| | 63.86 | 61.64 | \| | 15.67 | 16.27 | Below, time metrics for batch size = 4, input length = 32. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 70.41 | 22.67 | \| | 14.27 | 44.13 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 71.65 | 21.44 | \| | 14.00 | 46.67 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 71.72 | 55.16 | \| | 14.00 | 18.13 | | `dynamic` | `['Add']` | `[]` | \| | 55.56 | 43.87 | \| | 18.00 | 22.80 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 55.45 | 27.83 | \| | 18.07 | 36.00 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 66.57 | 34.45 | \| | 15.07 | 29.07 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 55.23 | 59.31 | \| | 18.13 | 16.87 | | `static` | `['Add']` | `[]` | \| | 58.80 | 66.03 | \| | 17.07 | 15.20 | Below, time metrics for batch size = 4, input length = 64. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 117.71 | 43.93 | \| | 8.53 | 22.80 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 90.01 | 43.27 | \| | 11.13 | 23.13 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 94.34 | 107.02 | \| | 10.60 | 9.40 | | `dynamic` | `['Add']` | `[]` | \| | 119.11 | 82.46 | \| | 8.40 | 12.13 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 120.57 | 54.70 | \| | 8.33 | 18.33 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 120.00 | 57.85 | \| | 8.40 | 17.33 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 119.57 | 92.50 | \| | 8.40 | 10.87 | | `static` | `['Add']` | `[]` | \| | 117.35 | 102.09 | \| | 8.53 | 9.80 | Below, time metrics for batch size = 4, input length = 128. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 220.69 | 94.33 | \| | 4.53 | 10.67 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 170.04 | 81.68 | \| | 5.93 | 12.27 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 188.59 | 171.79 | \| | 5.33 | 5.87 | | `dynamic` | `['Add']` | `[]` | \| | 219.80 | 163.62 | \| | 4.60 | 6.13 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 220.25 | 94.05 | \| | 4.60 | 10.67 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 222.90 | 135.06 | \| | 4.53 | 7.47 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 177.41 | 211.89 | \| | 5.67 | 4.73 | | `static` | `['Add']` | `[]` | \| | 168.30 | 201.88 | \| | 6.00 | 5.00 | Below, time metrics for batch size = 8, input length = 32. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 106.46 | 42.35 | \| | 9.47 | 23.67 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 88.68 | 43.33 | \| | 11.33 | 23.13 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 91.32 | 92.08 | \| | 11.00 | 10.87 | | `dynamic` | `['Add']` | `[]` | \| | 88.33 | 94.18 | \| | 11.33 | 10.67 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 107.47 | 44.74 | \| | 9.33 | 22.40 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 118.39 | 64.56 | \| | 8.47 | 15.53 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 87.05 | 111.36 | \| | 11.53 | 9.00 | | `static` | `['Add']` | `[]` | \| | 116.96 | 98.82 | \| | 8.60 | 10.13 | Below, time metrics for batch size = 8, input length = 64. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 165.67 | 87.71 | \| | 6.07 | 11.47 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 214.59 | 87.88 | \| | 4.67 | 11.40 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 216.06 | 163.75 | \| | 4.67 | 6.13 | | `dynamic` | `['Add']` | `[]` | \| | 176.69 | 209.28 | \| | 5.67 | 4.80 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 215.12 | 86.90 | \| | 4.67 | 11.53 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 215.99 | 130.39 | \| | 4.67 | 7.73 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 213.87 | 224.50 | \| | 4.73 | 4.47 | | `static` | `['Add']` | `[]` | \| | 211.16 | 193.01 | \| | 4.80 | 5.20 | Below, time metrics for batch size = 8, input length = 128. | quantization_approach | operators_to_quantize | node_exclusion | | latency_mean (original, ms) | latency_mean (optimized, ms) | | throughput (original, /s) | throughput (optimized, /s) | | :-------------------: | :-------------------: | :------------------------------------------------------: | :-: | :-------------------------: | :--------------------------: | :-: | :-----------------------: | :------------------------: | | `dynamic` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 391.16 | 183.35 | \| | 2.60 | 5.47 | | `dynamic` | `['Add', 'MatMul']` | `[]` | \| | 414.42 | 154.52 | \| | 2.47 | 6.53 | | `dynamic` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 314.12 | 323.94 | \| | 3.20 | 3.13 | | `dynamic` | `['Add']` | `[]` | \| | 408.15 | 325.03 | \| | 2.47 | 3.13 | | `static` | `['Add', 'MatMul']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 337.57 | 205.59 | \| | 3.00 | 4.87 | | `static` | `['Add', 'MatMul']` | `[]` | \| | 375.10 | 225.09 | \| | 2.67 | 4.47 | | `static` | `['Add']` | `['layernorm', 'gelu', 'residual', 'gather', 'softmax']` | \| | 409.68 | 493.00 | \| | 2.47 | 2.07 | | `static` | `['Add']` | `[]` | \| | 397.28 | 397.74 | \| | 2.53 | 2.53 |