What is an actorder group and what are the advantages of running this in vLLM?
#1
by
nickandbro
- opened
Would really like to know as I am curious to how this may benefit the Pixtral setup my team uses. Thanks!
Hi @nickandbro , you can see all the available activation ordering strategies here.
In short, compressing your model with "group" activation ordering lead to better accuracy recovery, but can lead to slightly higher latency. If latency is a concern for your application, consider compressing your model with "weight" activation ordering.
For additional information on multimodal model compression using llm-compressor for vLLM, see the associated VLM support PR
kylesayrs
changed discussion status to
closed
NOTE: We typically suggest using WEIGHT for the strategy
Thanks!