deepseek-ai/DeepSeek-V3 · Training problem

3 days ago

•

Good afternoon. Does anyone know about neural network training? I’m working on an architecture for audio completion based on DeepSeek V3 (deepseek-ai/DeepSeek-V3), and I’ve encountered this issue: In the code, in the DeepSeekV3MoE class, within the forward method, there is a condition if not self.training, where experts in moe_infer are used with the no_grad annotation, but if the model is in training mode, a single shared expert is used.

Here’s my question: Is this normal? I’m concerned about the fact that the experts and the gate don’t seem to influence the result. What should I do?

DonGan13 changed discussion title from Training model to Training problem 3 days ago

21world

3 days ago

DonGan13

3 days ago

I was asking for an answer from a knowledgeable person, not from some machine.

21world

3 days ago

we know now what are you asking