How has DeepSeek improved the Transformer architecture?

This Gradient Updates issue goes over the major changes that went into DeepSeek’s most recent model.