Moves the logic of auto-offloading to the GPU when processing large batches to ggml_backend_sched. Currently only CUDA and Vulkan support this, this will allow any backend to support this feature. ...

backend : offload large batches to GPU by slaren · Pull Request #6083 · ggerganov/llama.cpp