out-2

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Passthrough merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

dtype: bfloat16
merge_method: passthrough

slices:
  # untouched intro
  - sources:
      - layer_range: [0, 8]
        model: mistralai/Mistral-Nemo-Base-2407

  - sources:
      - layer_range: [8, 12]
        model: mistralai/Mistral-Nemo-Base-2407
  # 8โ€“16 baseline
  - sources:
      - layer_range: [8, 16]
        model: mistralai/Mistral-Nemo-Base-2407
  # 8โ€“16 duplicate with projections nulled
  - sources:
      - layer_range: [8, 16]
        model: mistralai/Mistral-Nemo-Base-2407
        parameters:
          scale:
            - filter: o_proj
              value: 0.0
            - filter: down_proj
              value: 0.0
            - value: 1.0

  # 16โ€“24 duplicate
  - sources:
      - layer_range: [16, 24]
        model: mistralai/Mistral-Nemo-Base-2407
        parameters:
          scale:
            - filter: o_proj
              value: 0.0
            - filter: down_proj
              value: 0.0
            - value: 1.0
  # 16โ€“24 baseline
  - sources:
      - layer_range: [16, 24]
        model: mistralai/Mistral-Nemo-Base-2407
  # 16โ€“24 duplicate
  - sources:
      - layer_range: [16, 24]
        model: mistralai/Mistral-Nemo-Base-2407
        parameters:
          scale:
            - filter: o_proj
              value: 0.0
            - filter: down_proj
              value: 0.0
            - value: 1.0

  # 24โ€“32 baseline
  - sources:
      - layer_range: [24, 32]
        model: mistralai/Mistral-Nemo-Base-2407
  # 24โ€“32 duplicate
  - sources:
      - layer_range: [24, 32]
        model: mistralai/Mistral-Nemo-Base-2407
        parameters:
          scale:
            - filter: o_proj
              value: 0.0
            - filter: down_proj
              value: 0.0
            - value: 1.0

  # untouched tail
  - sources:
      - layer_range: [32, 40]
        model: mistralai/Mistral-Nemo-Base-2407
Downloads last month
2
Safetensors
Model size
22B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for allura-forge/nemo-upscaled-2

Finetuned
(81)
this model