Pricing

There's no subscription required to use GLHF. Instead, we charge based on usage: if you don't use the product, you don't get charged.

On-demand pricing

GPU TypePrice
80GB3 cents/min, per GPU
48GB1.5 cents/min, per GPU
24GB1.2 cents/min, per GPU

Our on-demand GPU rates are very competitive: for example, an 80GB GPU is ~2x cheaper on GLHF than on competing services like Replicate or Modal Labs.

We automatically calculate the type and number of GPUs required for a model repository for you. We don't quantize on-demand models: they're launched in whatever precision the underlying repo uses; typically, BF16, with the exception of Jamba-based models which are launched in FP8. Quantizing past FP8 can significantly harm model performance.

On-demand model context length is capped to a maximum of 32k tokens.

Always-on pricing

ModelProviderPrice
deepseek-ai/DeepSeek-V3Together$1.25/mtok
google/gemma-2-27b-itTogether$0.80/mtok
google/gemma-2-9b-itTogether$0.20/mtok
meta-llama/Llama-3.1-405B-InstructFireworks$3.00/mtok
meta-llama/Llama-3.1-70B-InstructFireworks$0.90/mtok
meta-llama/Llama-3.1-8B-InstructFireworks$0.20/mtok
meta-llama/Llama-3.2-11B-Vision-InstructFireworks$0.20/mtok
meta-llama/Llama-3.2-3B-InstructFireworks$0.10/mtok
meta-llama/Llama-3.2-90B-Vision-InstructFireworks$0.90/mtok
meta-llama/Llama-3.3-70B-InstructFireworks$0.90/mtok
mistralai/Mistral-7B-Instruct-v0.3Together$0.20/mtok
mistralai/Mixtral-8x22B-Instruct-v0.1Together$1.20/mtok
mistralai/Mixtral-8x7B-Instruct-v0.1Together$0.60/mtok
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPOTogether$0.60/mtok
nvidia/Llama-3.1-Nemotron-70B-Instruct-HFTogether$0.90/mtok
Qwen/Qwen2.5-72B-InstructFireworks$0.90/mtok
Qwen/Qwen2.5-7B-InstructTogether$0.18/mtok
Qwen/Qwen2.5-Coder-32B-InstructFireworks$0.90/mtok
upstage/SOLAR-10.7B-Instruct-v1.0Together$0.30/mtok

Always-on models run in whatever precision the underlying API provider supports: typically, either BF16 or FP8. Always-on models support their full context length.