tfs-mt

This project implements the Transformer architecture from scratch considering Machine Translation as the usecase. It's mainly intended as an educational resource and a functional implementation of the architecture and the training/inference logic.

Here you can find the weights of the trained small size Transformer and the pretrained tokenizers.

Quick Start

pip install tfs-mt
import torch

from tfs_mt.architecture import build_model
from tfs_mt.data_utils import WordTokenizer
from tfs_mt.decoding_utils import greedy_decoding

base_url = "https://huggingface.co/giovo17/tfs-mt/resolve/main/"
src_tokenizer = WordTokenizer.from_pretrained(base_url + "src_tokenizer_word.json")
tgt_tokenizer = WordTokenizer.from_pretrained(base_url + "tgt_tokenizer_word.json")

model = build_model(
    config="https://huggingface.co/giovo17/tfs-mt/resolve/main/config-lock.yaml",
    from_pretrained=True,
    model_path="https://huggingface.co/giovo17/tfs-mt/resolve/main/model.safetensors",
)

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model.to(device)
model.eval()

input_tokens, input_mask = src_tokenizer.encode("Hi, how are you?")

output = greedy_decoding(model, tgt_tokenizer, input_tokens, input_mask)[0]
print(output)

Model Architecture

Model Size: small

  • Encoder Layers: 6
  • Decoder Layers: 6
  • Model Dimension: 100
  • Attention Heads: 6
  • FFN Dimension: 400
  • Normalization Type: postnorm
  • Dropout: 0.1
  • Pretrained Embeddings: GloVe
  • Positional Embeddings: sinusoidal
  • GloVe Version: glove.2024.wikigiga.100d

Tokenizer

  • Type: word
  • Max Sequence Length: 131
  • Max Vocabulary Size: 70000
  • Minimum Frequency: 2

Dataset

  • Task: machine-translation
  • Dataset ID: Helsinki-NLP/europarl
  • Dataset Name: en-it
  • Source Language: en
  • Target Language: it
  • Train Split: 0.95

Full training configuration

Click to expand complete config-lock.yaml
seed: 42
log_every_iters: 1000
save_every_iters: 10000
eval_every_iters: 10000
update_pbar_every_iters: 100
time_limit_sec: -1
checkpoints_retain_n: 5
model_base_name: tfs_mt
model_parameters:
  dropout: 0.1
model_configs:
  pretrained_word_embeddings: GloVe
  positional_embeddings: sinusoidal
  nano:
    num_encoder_layers: 4
    num_decoder_layers: 4
    d_model: 50
    num_heads: 4
    d_ff: 200
    norm_type: postnorm
    glove_version: glove.2024.wikigiga.50d
    glove_filename: wiki_giga_2024_50_MFT20_vectors_seed_123_alpha_0.75_eta_0.075_combined
  small:
    num_encoder_layers: 6
    num_decoder_layers: 6
    d_model: 100
    num_heads: 6
    d_ff: 400
    norm_type: postnorm
    glove_version: glove.2024.wikigiga.100d
    glove_filename: wiki_giga_2024_100_MFT20_vectors_seed_2024_alpha_0.75_eta_0.05.050_combined
  base:
    num_encoder_layers: 8
    num_decoder_layers: 8
    d_model: 300
    num_heads: 8
    d_ff: 800
    norm_type: postnorm
    glove_version: glove.2024.wikigiga.300d
    glove_filename: wiki_giga_2024_300_MFT20_vectors_seed_2024_alpha_0.75_eta_0.05_combined
  original:
    num_encoder_layers: 6
    num_decoder_layers: 6
    d_model: 512
    num_heads: 8
    d_ff: 2048
    norm_type: postnorm
training_hp:
  num_epochs: 5
  distributed_training: false
  use_amp: true
  amp_dtype: bfloat16
  torch_compile_mode: max-autotune
  loss:
    type: KLdiv-labelsmoothing
    label_smoothing: 0.1
  optimizer:
    type: AdamW
    weight_decay: 0.0001
    beta1: 0.9
    beta2: 0.999
    eps: 1.0e-08
  lr_scheduler:
    type: original
    min_lr: 0.0003
    max_lr: 0.001
    warmup_iters: 25000
    stable_iters_prop: 0.7
  max_gradient_norm: 5.0
  early_stopping:
    enabled: false
    patience: 40000
    min_delta: 1.0e-05
tokenizer:
  type: word
  sos_token: <s>
  eos_token: </s>
  pad_token: <PAD>
  unk_token: <UNK>
  max_seq_len: 131
  max_vocab_size: 70000
  vocab_min_freq: 2
dataset:
  dataset_task: machine-translation
  dataset_id: Helsinki-NLP/europarl
  dataset_name: en-it
  train_split: 0.95
  src_lang: en
  tgt_lang: it
  max_len: -1
train_dataloader:
  batch_size: 64
  num_workers: 8
  shuffle: true
  drop_last: true
  prefetch_factor: 2
  pad_all_to_max_len: true
test_dataloader:
  batch_size: 128
  num_workers: 8
  shuffle: false
  drop_last: false
  prefetch_factor: 2
  pad_all_to_max_len: true
backend: none
chosen_model_size: small
model_name: tfs_mt_small_251104-1748
exec_mode: dev
src_tokenizer_vocab_size: 70000
tgt_tokenizer_vocab_size: 70000
num_train_iters_per_epoch: 28889
num_test_iters_per_epoch: 761

Citation

If you use tfs-mt in your research or project, please cite:

@software{Spadaro_tfs-mt,
author = {Spadaro, Giovanni},
license = {MIT},
title = {{tfs-mt}},
url = {https://github.com/Giovo17/tfs-mt}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train giovo17/tfs-mt

Space using giovo17/tfs-mt 1