Mistral

PyTorch

Model Details

Mistral 7B

Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.


Mistral 7B in short


Mistral 7B is a 7.3B parameter model that:

  1. Outperforms Llama 2 13B on all benchmarks
  2. Outperforms Llama 1 34B on many benchmarks
  3. Approaches CodeLlama 7B performance on code, while remaining good at English tasks
  4. Uses Grouped-query attention (GQA) for faster inference
  5. Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

Performance in details


We compared Mistral 7B to the Llama 2 family, and re-run all model evaluations ourselves for fair comparison.  

Performance of Mistral 7B and different Llama models on a wide range of benchmarks. For all metrics, all models were re-evaluated with our evaluation pipeline for accurate comparison. Mistral 7B significantly outperforms Llama 2 13B on all metrics, and is on par with Llama 34B (since Llama 2 34B was not released, we report results on Llama 34B). It is also vastly superior in code and reasoning benchmarks.

The benchmarks are categorized by their themes:

  1. Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
  2. World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.
  3. Reading Comprehension: 0-shot average of BoolQ and QuAC.
  4. Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
  5. Code: Average of 0-shot Humaneval and 3-shot MBPP
  6. Popular aggregated results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)


An interesting metric to compare how models fare in the cost/performance plane is to compute “equivalent model sizes”. On reasoning, comprehension and STEM reasoning (MMLU), Mistral 7B performs equivalently to a Llama 2 that would be more than 3x its size. This is as much saved in memory and gained in throughput.  

Results on MMLU, Commonsense Reasoning, World Knowledge and Reading comprehension for Mistral 7B and Llama 2 (7B/13/70B). Mistral 7B largely outperforms Llama 2 13B on all evaluations, except on knowledge benchmarks, where it is on par (this is likely due to its limited parameter count, which restricts the amount of knowledge it can compress).

Note: Important differences between our evaluation and the LLaMA2 paper’s:

  1. For MBPP, we use the hand-verified subset
  2. For TriviaQA, we do not provide Wikipedia contexts

Source: https://mistral.ai/news/announcing-mistral-7b/

Results on MMLU, Commonsense Reasoning, World Knowledge and Reading comprehension for Mistral 7B and Llama 2 (7B/13/70B). Mistral 7B largely outperforms Llama 2 13B on all evaluations, except on knowledge benchmarks, where it is on par (this is likely due to its limited parameter count, which restricts the amount of knowledge it can compress).


Note: Important differences between our evaluation and the LLaMA2 paper’s:

  1. For MBPP, we use the hand-verified subset
  2. For TriviaQA, we do not provide Wikipedia contexts


Source: https://mistral.ai/news/announcing-mistral-7b/


Example Use


Model Card for Mistral-7B-Instruct-v0.1


The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

For full details of this model please read our release blog post


Model Architecture

This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices:

  1. Grouped-Query Attention
  2. Sliding-Window Attention
  3. Byte-fallback BPE tokenizer


Instruction format

In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

E.g.


text = " "
"[INST] Do you have mayonnaise recipes? [/INST]"

This format is available as a chat template via the apply_chat_template() method:

!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("/input/mistral/pytorch/7b-instruct-v0.1-hf/1")
tokenizer = AutoTokenizer.from_pretrained("/input/mistral/pytorch/7b-instruct-v0.1-hf/1")

messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds#.to(device)
#model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])


Limitations

The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.




Model Files

index.py
                                            !pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("/input/mistral/pytorch/7b-instruct-v0.1-hf/1")
tokenizer = AutoTokenizer.from_pretrained("/input/mistral/pytorch/7b-instruct-v0.1-hf/1")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds#.to(device)
#model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

                                        

Model Comments

0 comments

No comments yet.