fastGPT: Faster than PyTorch in 300 lines of Fortran

Yes, so is tanh(x), but the fast_tanh(x) in the code is a lot faster, even at full accuracy, and as @oscardssmith said, it looks like we might get away with a lower accuracy version as well.