The sample code gives an output of reward = [-6.5], what is the numerical significance of this. What is the range of outputs for this model?
· Sign up or log in to comment