đĢSupported LMM
Currently our interface supports three major open source models, we will be adding more open source models eventually.
Model | Current Speed |
---|---|
Llama 2 70B (4096 Context Length) | ~300 tokens/s |
Llama 2 7B (2048 Context Length) | ~750 tokens/s |
Mixtral, 8x7B SMoE (32K Context Length) | ~480 tokens/s |
Gemma 7B (8K Context Length) | ~820 tokens/s |
Last updated