GitHub - FMInference/FlexLLMGen: Running large language models on a single GPU for throughput-oriented scenarios.
Running large language models on a single GPU for throughput-oriented scenarios. - FMInference/FlexLLMGen