Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo
•
28
timm
release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers()
and new way to register optimizers and their attributes. As always you can use an timm
optimizer like a torch
one, just replace torch.optim
with timm.optim
adfactorbv
adopt
/ adoptw
(decoupled decay)mars
laprop
c
as well as cadamw
, cnadamw
, csgdw
, clamb
, crmsproptf
top_k
arbitrarily discarding high-quality continuations? Or top_p
forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p
flag in generate
, fresh from a PR merged today! 🥬min_p
flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?min_p
to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.