interesting architecture
updated
FAN: Fourier Analysis Networks
Paper
•
2410.02675
•
Published
•
29
Tensor Product Attention Is All You Need
Paper
•
2501.06425
•
Published
•
90
Scalable-Softmax Is Superior for Attention
Paper
•
2501.19399
•
Published
•
24
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
•
2502.09509
•
Published
•
8
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper
•
2502.12524
•
Published
•
12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
156
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
123
ObjectMover: Generative Object Movement with Video Prior
Paper
•
2503.08037
•
Published
•
5
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
74
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
170
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
•
2503.14456
•
Published
•
153
Scaling Vision Pre-Training to 4K Resolution
Paper
•
2503.19903
•
Published
•
41
Paper
•
2504.00927
•
Published
•
56
TransMamba: Flexibly Switching between Transformer and Mamba
Paper
•
2503.24067
•
Published
•
21
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
31
MMaDA: Multimodal Large Diffusion Language Models
Paper
•
2505.15809
•
Published
•
97
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
•
2506.07900
•
Published
•
93
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for
Long Video Generation
Paper
•
2506.19852
•
Published
•
42
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper
•
2508.11598
•
Published
•
17
Paper
•
2508.10104
•
Published
•
291
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Paper
•
2509.01964
•
Published
•
7
Sequential Diffusion Language Models
Paper
•
2509.24007
•
Published
•
45
Paper
•
2510.13998
•
Published
•
55
AnyUp: Universal Feature Upsampling
Paper
•
2510.12764
•
Published
•
11
Latent Diffusion Model without Variational Autoencoder
Paper
•
2510.15301
•
Published
•
49
Stronger Normalization-Free Transformers
Paper
•
2512.10938
•
Published
•
19
Bolmo: Byteifying the Next Generation of Language Models
Paper
•
2512.15586
•
Published
•
14