Vaswani et al. (Google Brain)
Proposed the Transformer architecture, dispensing with recurrence and convolutions entirely. This paper laid the groundwork for modern LLMs like ChatGPT and BERT.
Kaiming He et al.
Introduced ResNet, solving the vanishing gradient problem in deep neural networks by utilizing identity shortcut connections (skip connections).
Ian Goodfellow et al.
A framework for estimating generative models via an adversarial process, in which two models are trained simultaneously: a generative model G, and a discriminative model D.
Diederik P. Kingma, Jimmy Ba
Introduced Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. One of the most widely used optimizers today.
Jacob Devlin et al.
Introduced a new language representation model designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context.
Volodymyr Mnih et al. (DeepMind)
The first successful deep learning model to automatically learn control policies directly from high-dimensional sensory input using reinforcement learning.