🐰 Down the Rabbit Hole of Never Ending Research
If you are looking for the most influential research papers on AI and machine learning you're in luck!
Keeping up with the ever-changing advances in artificial intelligence can be a challenge. We have experienced a flood of new datasets and research in the last few years thanks to the boom with Generative AI pushing more interest and resources into the market. We've curated a list of the Top 20 Artificial Intelligence and Natural Language Processing (NLP) papers to save you time.
We feature papers with code, research papers by industry leading researchers in AI, and papers hosted on arXiv.org - the free distribution service and open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, statistics, quantitative biology, quantitative finance, electrical engineering and systems science, economics and more.
We highly recommend those serious about research in AI consider a quick review of this paper:
NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?
They do a fantastic job reviewing the top 40 papers on AI research and summarize their key findings. A quick read packed with links to more influential research papers on artificial intelligence to intrigue your curiosity with.
Another trove of research papers and other learning resources that are fundamental or include milestones related to Large Language Models (LLM) & Natural Language Processing (NLP) can be found here: Awesome-LLM Gitbhub.
This repository includes quick links to milestone papers like:
1. Attention is All You Need
2. Google's BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding
3. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
4. OpenAI's Improving Language Understanding by Generative Pre-training (GPT 1.0)
5. OPT: Open Pre-trained Transformer Language Models by Meta
They also share LLM leaderboards for comparing and benchmarking, a list of tools for deploying LLMs, and more.
One of the key take-aways from our review of trending research is that LLM foundation models are the hottest topic right now. Research focused on efficient model building, multimodal problem-solving, computer vision, and embodied agents that can interact with the real world also sparked a lot of attention. There seems to be a lot of interest in surveying and comparing current open-source models. We've also seen this with the development of new communities around AI and foundation models and public datasets.
HuggingFace.co has become a leading open-source platform and repository for the machine learning community to collaborate on models, datasets, and applications. Speaking of open-source collaboration, The AI ARMY is planning to launch a digital lab for the community on the HuggingFace platform for our members to join and directly learn with shared projects and adapt to AI together. If you are interested in hands on learning with peers, join the ranks for free resources and access!
Check out The List of Top 20 Influential Research Papers on AI in 2023 below:
Title of Paper | Link | Subject(s) | Cite as | Authors | |
1 | LLaMA: Open and Efficient Foundation Models | https://arxiv.org/abs/2302.13971v1 | Computation and Language | arXiv:2302.13971 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample |
2 | GPT-4 Technical Report | https://arxiv.org/abs/2303.08774v3 | Computation and Language | arXiv:2303.08774 | OpenAI |
3 | PaLM 2 Technical Report | https://arxiv.org/abs/2305.10403v1 | Computation and Language | https://arxiv.org/abs/2305.10403v1 | There are over 30 authors - see site for full list. |
4 | Sparks of Artificial General Intelligence: Early experiments with GPT-4 | https://arxiv.org/abs/2303.12712v5 | Computation and Language | arXiv:2303.12712 | Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang |
5 | PaLM-E: An Embodied Multimodal Language Model | https://arxiv.org/abs/2303.03378v1 | Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. Our largest model, PaLM-E-562B with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale. | https://arxiv.org/abs/2303.03378v1 | Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence |
6 | QLoRA: Efficient Finetuning of Quantized LLMs | https://arxiv.org/abs/2305.14314v1 | We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. | https://arxiv.org/abs/2305.14314v1 | Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer |
7 | Segment Anything | arXiv:2304.02643v1 | We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. | arXiv:2304.02643v1 | Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick |
8 | Judging LLM-as-a-judge with MT-Bench and Chatbot Arena | https://arxiv.org/abs/2306.05685v2 | Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. | https://arxiv.org/abs/2306.05685v2 | Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica |
9 | A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | https://arxiv.org/abs/2302.04023v2 | This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. | https://arxiv.org/abs/2302.04023v2 | Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung |
10 | A Survey of Large Language Models | https://arxiv.org/abs/2303.18223v11 | Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. | https://arxiv.org/abs/2303.18223v11 | Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen |
11 | Visual Instruction Tuning | https://arxiv.org/abs/2304.08485v1 | Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding. | https://arxiv.org/abs/2304.08485v1 | Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee |
12 | Tree of Thoughts: Deliberate Problem Solving with Large Language Models | https://arxiv.org/abs/2305.10601v1 | Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. | https://arxiv.org/abs/2305.10601v1 | Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan |
13 | Voyager: An Open-Ended Embodied Agent with Large Language Models | https://arxiv.org/abs/2305.16291v1 | We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. | https://arxiv.org/abs/2305.16291v1 | Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar |
14 | Toolformer: Language Models Can Teach Themselves to Use Tools | https://arxiv.org/abs/2302.04761v1 | Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. | https://arxiv.org/abs/2302.04761v1 | Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom |
15 | How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection | https://arxiv.org/abs/2301.07597v1 | The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. | https://arxiv.org/abs/2301.07597v1 | Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu |
16 | Extracting Training Data from Diffusion Models | https://arxiv.org/abs/2301.13188v1 | Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. | https://arxiv.org/abs/2301.13188v1 | Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace |
17 | Large Language Models are not Fair Evaluators | https://arxiv.org/abs/2305.17926v1 | We uncover a systematic bias in the evaluation paradigm of adopting large language models~(LLMs), e.g., GPT-4, as a referee to score the quality of responses generated by candidate models. We find that the quality ranking of candidate responses can be easily hacked by simply altering their order of appearance in the context. This manipulation allows us to skew the evaluation result, making one model appear considerably superior to the other, e.g., vicuna could beat ChatGPT on 66 over 80 tested queries. | https://arxiv.org/abs/2305.17926v1 | Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui |
18 | HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | https://arxiv.org/abs/2303.17580v3 | Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. | https://arxiv.org/abs/2303.17580v3 | Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang |
19 | A Watermark for Large Language Models | https://arxiv.org/abs/2301.10226v3 | Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. | https://arxiv.org/abs/2301.10226v3 | John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein |
20 | DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | https://arxiv.org/abs/2301.11305v2 | The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. | https://arxiv.org/abs/2301.11305v2 | Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn |
That's it. Those are the Top 20 Current Most Cited Research Papers on AI this year.
💬 Join the Conversation: If you think we missed a great paper, please share it in the comments below.
The community would love to hear your opinions and we encourage others to start talking more about AI.
It's a critical conversation right now and we all have a stake in what's next!