About the output of the sample code

#3
by Tomie0506 - opened

The sample code gives an output of reward = [-6.5], what is the numerical significance of this. What is the range of outputs for this model?

Sign up or log in to comment