-
Attention Is All You Need
Paper • 1706.03762 • Published • 109 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
Taufiq Dwi Purnomo
taufiqdp
AI & ML interests
SLM, VLM
Recent Activity
upvoted
a
paper
about 4 hours ago
LTX-2: Efficient Joint Audio-Visual Foundation Model
upvoted
a
paper
about 4 hours ago
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
upvoted
a
paper
9 days ago
mHC: Manifold-Constrained Hyper-Connections