GitHub - FMInference/FlexGen: Running large language models on a single GPU for throughput-oriented scenarios.
Running large language models on a single GPU for throughput-oriented scenarios. - FMInference/FlexGen