Top AI & Machine Learning Research Papers to Read Right Now

Written by Megan Anderson | Nov 27, 2023 1:48:41 AM

🐰 Down the Rabbit Hole of Never Ending Research

If you are looking for the most influential research papers on AI and machine learning you're in luck!
Keeping up with the ever-changing advances in artificial intelligence can be a challenge. We have experienced a flood of new datasets and research in the last few years thanks to the boom with Generative AI pushing more interest and resources into the market. We've curated a list of the Top 20 Artificial Intelligence and Natural Language Processing (NLP) papers to save you time.

We feature papers with code, research papers by industry leading researchers in AI, and papers hosted on arXiv.org - the free distribution service and open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, statistics, quantitative biology, quantitative finance, electrical engineering and systems science, economics and more.

We highly recommend those serious about research in AI consider a quick review of this paper:
NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?

They do a fantastic job reviewing the top 40 papers on AI research and summarize their key findings. A quick read packed with links to more influential research papers on artificial intelligence to intrigue your curiosity with.

Another trove of research papers and other learning resources that are fundamental or include milestones related to Large Language Models (LLM) & Natural Language Processing (NLP) can be found here: Awesome-LLM Gitbhub.

This repository includes quick links to milestone papers like:
1. Attention is All You Need
2. Google's BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding
3. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
4. OpenAI's Improving Language Understanding by Generative Pre-training (GPT 1.0)
5. OPT: Open Pre-trained Transformer Language Models by Meta

They also share LLM leaderboards for comparing and benchmarking, a list of tools for deploying LLMs, and more.

One of the key take-aways from our review of trending research is that LLM foundation models are the hottest topic right now. Research focused on efficient model building, multimodal problem-solving, computer vision, and embodied agents that can interact with the real world also sparked a lot of attention. There seems to be a lot of interest in surveying and comparing current open-source models. We've also seen this with the development of new communities around AI and foundation models and public datasets.

HuggingFace.co has become a leading open-source platform and repository for the machine learning community to collaborate on models, datasets, and applications. Speaking of open-source collaboration, The AI ARMY is planning to launch a digital lab for the community on the HuggingFace platform for our members to join and directly learn with shared projects and adapt to AI together. If you are interested in hands on learning with peers, join the ranks for free resources and access!

Check out The List of Top 20 Influential Research Papers on AI in 2023 below:

	Title of Paper	Link	Subject(s)	Cite as	Authors
1	LLaMA: Open and Efficient Foundation Models	https://arxiv.org/abs/2302.13971v1	Computation and Language	arXiv:2302.13971	Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
2	GPT-4 Technical Report	https://arxiv.org/abs/2303.08774v3	Computation and Language	arXiv:2303.08774	OpenAI
3	PaLM 2 Technical Report	https://arxiv.org/abs/2305.10403v1	Computation and Language	https://arxiv.org/abs/2305.10403v1	There are over 30 authors - see site for full list.
4	Sparks of Artificial General Intelligence: Early experiments with GPT-4	https://arxiv.org/abs/2303.12712v5	Computation and Language	arXiv:2303.12712	Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
5	PaLM-E: An Embodied Multimodal Language Model	https://arxiv.org/abs/2303.03378v1	Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. Our largest model, PaLM-E-562B with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale.	https://arxiv.org/abs/2303.03378v1	Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
6	QLoRA: Efficient Finetuning of Quantized LLMs	https://arxiv.org/abs/2305.14314v1	We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.	https://arxiv.org/abs/2305.14314v1	Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer
7	Segment Anything	arXiv:2304.02643v1	We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.	arXiv:2304.02643v1	Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
8	Judging LLM-as-a-judge with MT-Bench and Chatbot Arena	https://arxiv.org/abs/2306.05685v2	Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them.	https://arxiv.org/abs/2306.05685v2	Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica
9	A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity	https://arxiv.org/abs/2302.04023v2	This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks.	https://arxiv.org/abs/2302.04023v2	Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung
10	A Survey of Large Language Models	https://arxiv.org/abs/2303.18223v11	Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.	https://arxiv.org/abs/2303.18223v11	Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
11	Visual Instruction Tuning	https://arxiv.org/abs/2304.08485v1	Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.	https://arxiv.org/abs/2304.08485v1	Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
12	Tree of Thoughts: Deliberate Problem Solving with Large Language Models	https://arxiv.org/abs/2305.10601v1	Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.	https://arxiv.org/abs/2305.10601v1	Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
13	Voyager: An Open-Ended Embodied Agent with Large Language Models	https://arxiv.org/abs/2305.16291v1	We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement.	https://arxiv.org/abs/2305.16291v1	Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, Anima Anandkumar
14	Toolformer: Language Models Can Teach Themselves to Use Tools	https://arxiv.org/abs/2302.04761v1	Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds.	https://arxiv.org/abs/2302.04761v1	Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
15	How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection	https://arxiv.org/abs/2301.07597v1	The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts.	https://arxiv.org/abs/2301.07597v1	Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
16	Extracting Training Data from Diffusion Models	https://arxiv.org/abs/2301.13188v1	Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos.	https://arxiv.org/abs/2301.13188v1	Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace
17	Large Language Models are not Fair Evaluators	https://arxiv.org/abs/2305.17926v1	We uncover a systematic bias in the evaluation paradigm of adopting large language models~(LLMs), e.g., GPT-4, as a referee to score the quality of responses generated by candidate models. We find that the quality ranking of candidate responses can be easily hacked by simply altering their order of appearance in the context. This manipulation allows us to skew the evaluation result, making one model appear considerably superior to the other, e.g., vicuna could beat ChatGPT on 66 over 80 tested queries.	https://arxiv.org/abs/2305.17926v1	Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui
18	HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face	https://arxiv.org/abs/2303.17580v3	Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this.	https://arxiv.org/abs/2303.17580v3	Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang
19	A Watermark for Large Language Models	https://arxiv.org/abs/2301.10226v3	Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters.	https://arxiv.org/abs/2301.10226v3	John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom Goldstein
20	DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature	https://arxiv.org/abs/2301.11305v2	The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM.	https://arxiv.org/abs/2301.11305v2	Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn

That's it. Those are the Top 20 Current Most Cited Research Papers on AI this year.

💬 Join the Conversation: If you think we missed a great paper, please share it in the comments below.
The community would love to hear your opinions and we encourage others to start talking more about AI.
It's a critical conversation right now and we all have a stake in what's next!

View full post