2015 — 2025

Model training

From MATLAB, manual features, and classic MLPs through BERT- and T5-era NLP to Polish LLMs and audio baselines—what I have actually trained and shipped.

Deep Learning NLP Computer Vision Audio Open Source

Over the years I have trained and fine-tuned many models—not only quick experiments, but also long-running research and production releases. The sections below split that work by modality and task; the Hugging Face block is the public, named catalog of Polish NLP models we built at VoiceLab.

My earliest pipelines were in MATLAB: manual feature selection, classical descriptors, and plain MLPs, then CNN stacks for vision—ResNet-style backbones, dense blocks, whatever the benchmark called for—before today’s huge transformers in every modality. Later NLP moved through BERT-family encoders, T5-style text-to-text models, and on to LLMs at scale—same scientific habits, different parameter counts and tooling.

Frameworks

Day-to-day training is mostly PyTorch, with PyTorch Lightning for larger and longer-lived runs. When LLMs went mainstream I used whatever fit the problem—Hugging Face stacks, hosted APIs, and LangChain when orchestration and retrieval-style glue was the right trade-off—alongside the usual BERT and T5 fine-tunes where a smaller model was enough.

Computer vision

Image classification with CNN encoders (classic residual and efficient blocks, custom heads) for biomedical settings (dermoscopy, microscopy, related benchmarks) and for other vision problems in my research line—see the bibliography and context on biomedical imaging.

Object detection and instance segmentation on litter and merged waste benchmarks—YOLO-family single-shot models, two-stage Faster R-CNN, Mask R-CNN, EfficientDet, DETR (transformer detector), PyTorch training loops, competition-style evaluation—covered on waste detection.

Generative image models in the GAN era—convolutional generator / discriminator pairs for e.g. 64×64 pixel-character generation and dataset work on Tiny Hero.

NLP and text

Along the way: HerBERT-style classification, T5-style keyword heads, and full decoder-only LLMs—sequence models for punctuation restoration, keyword-style text-to-text generation, and later large-scale Polish generative LMs (TRURL family) with quantised variants—see PolEval punctuation, Reedy for the publisher-metadata thread, and the Hugging Face section on this page for public checkpoints.

Audio

Environmental sound and species classification from spectrograms with CNN front ends and review-driven baselines— bird song classification.

Sign language and video as recognition stacks over pose and temporal models— HearAI.

Public Hugging Face models

We trained and released Poland’s first large-scale generative model TRURL— 7B and 13B variants, 8-bit quantizations, and an academic edition—alongside production Polish NLP models still in daily use: vlt5-base-keywords for keyword extraction (11k+ downloads) and herbert-base-cased-sentiment (24k+). Everything is open on Hugging Face with accompanying datasets.

Model cards: trurl-2-13b-academic · trurl-2-7b · trurl-2-13b-8bit · trurl-2-13b · trurl-2-7b-8bit.

All VoiceLab models → TRURL 2 13B → TRURL 2 7B → TRURL 2 13B (academic) → vlt5 keywords → HerBERT sentiment → VoiceLab datasets →

← All projects