đŸŽĢSupported LMM

Currently our interface supports three major open source models, we will be adding more open source models eventually.

ModelCurrent Speed

Llama 2 70B (4096 Context Length)

~300 tokens/s

Llama 2 7B (2048 Context Length)

~750 tokens/s

Mixtral, 8x7B SMoE (32K Context Length)

~480 tokens/s

Gemma 7B (8K Context Length)

~820 tokens/s

Last updated