GitHub - FMInference/FlexGen: Running large language models on a single GPU for throughput-oriented scenarios.

Running large language models on a single GPU for throughput-oriented scenarios. - FMInference/FlexGen