đĢSupported LMM
Currently our interface supports three major open source models, we will be adding more open source models eventually.
Model
Current Speed
Llama 2 70B (4096 Context Length)
~300 tokens/s
Llama 2 7B (2048 Context Length)
~750 tokens/s
Mixtral, 8x7B SMoE (32K Context Length)
~480 tokens/s
Gemma 7B (8K Context Length)
~820 tokens/s
Last updated