AuriStream - Speech Language Model

AuriStream is a speech language model by Greta Tuckute and Klemen Kotar.

This repository contains the shared model code for AuriStream models.

Overview

AuriStream is a GPT-like transformer model for cochlear token prediction with optional multi-token prediction (MTP) heads.

This model predicts cochlear tokens from a tokenizer such as WavCochCausalV8192.

Usage

This repository is not meant to be used directly. Instead, use one of the checkpoint repositories that reference this base code:

To load a checkpoint:

from transformers import AutoModel, AutoConfig

model = AutoModel.from_pretrained(
    "TuKoResearch/AuriStream7B_40Pred_BigAudioDataset_500k",
    trust_remote_code=True,
)

Model Architecture

The AuriStream model includes:

  • RMSNorm for layer normalization
  • Rotary Position Embeddings (RoPE)
  • SiLU activation in MLP layers
  • Multi-token prediction heads

Configuration Options

Parameter Description Default
vocab_size Number of cochlear tokens 8192
n_embd Hidden dimension 768
n_layer Number of transformer layers 12
n_head Number of attention heads 12
n_pred_steps Number of prediction steps (MTP) 1

Files

  • configuration_auristream.py - Configuration class
  • modeling_auristream.py - Model implementation

Tokenizer

This model uses cochlear tokens from WavCochCausalV8192.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support