1.6 KiB
1.6 KiB
Performance tips
Below are some general recommendations to further improve performance. Many of these recommendations were used in the WNGT 2020 efficiency task submission.
- Set the compute type to "auto" to automatically select the fastest execution path on the current system
- Reduce the beam size to the minimum value that meets your quality requirement
- When using a beam size of 1, keep
return_scoresdisabled if you are not using prediction scores: the final softmax layer can be skipped - Set
max_batch_sizeand pass a larger batch to*_batchmethods: the input sentences will be sorted by length and split by chunk ofmax_batch_sizeelements for improved efficiency - Prefer the "tokens"
batch_typeto make the total number of elements in a batch more constant - Consider using {ref}
translation:dynamic vocabulary reductionfor translation
On CPU
- Use an Intel CPU supporting AVX512
- If you are processing a large volume of data, prefer increasing
inter_threadsoverintra_threadsand use stream methods (methods whose name ends with_fileor_iterable) - Avoid the total number of threads
inter_threads * intra_threadsto be larger than the number of physical cores - For single core execution on Intel CPUs, consider enabling packed GEMM (set the environment variable
CT2_USE_EXPERIMENTAL_PACKED_GEMM=1)
On GPU
- Use a larger batch size
- Use a NVIDIA GPU with Tensor Cores (Compute Capability >= 7.0)
- Pass multiple GPU IDs to
device_indexto execute on multiple GPUs