Draft: Use the batch API to have ollama generate multiple embeddings in a single call (!19) · Merge requests · hive / hivesense

Instead of making one API call per embedding, batch them into larger chunks (configurable, default 100). This should increase GPU utilization (and in my not-particularly-well-controlled tests, was about 8% faster.

Draft: Use the batch API to have ollama generate multiple embeddings in a single call

Merge request reports