NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1

NVIDA Nemotron Nano 2 is a new hybrid Mamba-Transformer reasoning model that achieves on-par or better accuracies compared to comparably sized leading open models at up to 6x higher throughput. Nemotron Pretraining Dataset v1 is a 6.6 trillion token dataset collection of web crawl, math, code, SFT,…