Everything Emerging Everywhere: The Future of Work & Generative AI

Top AI & Machine Learning Research Papers to Read Right Now

Written by Megan Anderson | Nov 27, 2023 1:48:41 AM

🐰 Down the Rabbit Hole of Never Ending Research

If you are looking for the most influential research papers on AI and machine learning you're in luck!
Keeping up with the ever-changing advances in artificial intelligence can be a challenge. We have experienced a flood of new datasets and research in the last few years thanks to the boom with Generative AI pushing more interest and resources into the market. We've curated a list of the Top 20 Artificial Intelligence and Natural Language Processing (NLP) papers to save you time. 

We feature papers with code, research papers by industry leading researchers in AI, and papers hosted on arXiv.org - the free distribution service and open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, statistics, quantitative biology, quantitative finance, electrical engineering and systems science, economics and more.  

We highly recommend those serious about research in AI consider a quick review of this paper:
NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?

They do a fantastic job reviewing the top 40 papers on AI research and summarize their key findings. A quick read packed with links to more influential research papers on artificial intelligence to intrigue your curiosity with. 

Another trove of research papers and other learning resources that are fundamental or include milestones related to Large Language Models (LLM) & Natural Language Processing (NLP) can be found here: Awesome-LLM Gitbhub.

This repository includes quick links to milestone papers like:
1. Attention is All You Need
2. Google's BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding
3. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism 
4. OpenAI's Improving Language Understanding by Generative Pre-training (GPT 1.0)
5. OPT: Open Pre-trained Transformer Language Models by Meta

They also share LLM  leaderboards for comparing and benchmarking, a list of tools for deploying LLMs, and more. 

One of the key take-aways from our review of trending research is that LLM foundation models are the hottest topic right now. Research focused on efficient model building, multimodal problem-solving, computer vision, and embodied agents that can interact with the real world also sparked a lot of attention. There seems to be a lot of interest in surveying and comparing current open-source models. We've also seen this with the development of new communities around AI and foundation models and public datasets. 

HuggingFace.co has become a leading open-source platform and repository for the machine learning community to collaborate on models, datasets, and applications. Speaking of open-source collaboration, The AI ARMY is planning to launch a digital lab for the community on the HuggingFace platform for our members to join and directly learn with shared projects and adapt to AI together. If you are interested in hands on learning with peers, join the ranks for free resources and access! 

Check out The List of Top 20 Influential Research Papers on AI in 2023 below: 

  Title of Paper Link Subject(s) Cite as Authors
1 LLaMA: Open and Efficient Foundation Models https://arxiv.org/abs/2302.13971v1  Computation and Language arXiv:2302.13971 Hugo TouvronThibaut LavrilGautier IzacardXavier MartinetMarie-Anne LachauxTimothée LacroixBaptiste RozièreNaman GoyalEric HambroFaisal AzharAurelien RodriguezArmand JoulinEdouard GraveGuillaume Lample
2 GPT-4 Technical Report https://arxiv.org/abs/2303.08774v3  Computation and Language arXiv:2303.08774  OpenAI
3 PaLM 2 Technical Report https://arxiv.org/abs/2305.10403v1  Computation and Language https://arxiv.org/abs/2305.10403v1  There are over 30 authors - see site for full list.
4 Sparks of Artificial General Intelligence: Early experiments with GPT-4 https://arxiv.org/abs/2303.12712v5  Computation and Language arXiv:2303.12712 Sébastien BubeckVarun ChandrasekaranRonen EldanJohannes GehrkeEric HorvitzEce KamarPeter LeeYin Tat LeeYuanzhi LiScott LundbergHarsha NoriHamid PalangiMarco Tulio RibeiroYi Zhang
5 PaLM-E: An Embodied Multimodal Language Model https://arxiv.org/abs/2303.03378v1  Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. Our largest model, PaLM-E-562B with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale. https://arxiv.org/abs/2303.03378v1  Danny DriessFei XiaMehdi S. M. SajjadiCorey LynchAakanksha ChowdheryBrian IchterAyzaan WahidJonathan TompsonQuan VuongTianhe YuWenlong HuangYevgen ChebotarPierre SermanetDaniel DuckworthSergey LevineVincent VanhouckeKarol HausmanMarc ToussaintKlaus GreffAndy ZengIgor MordatchPete Florence
6 QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314v1  We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. https://arxiv.org/abs/2305.14314v1  Tim DettmersArtidoro PagnoniAri HoltzmanLuke Zettlemoyer
7 Segment Anything arXiv:2304.02643v1  We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.  arXiv:2304.02643v1  Alexander KirillovEric MintunNikhila RaviHanzi MaoChloe RollandLaura GustafsonTete XiaoSpencer WhiteheadAlexander C. BergWan-Yen LoPiotr DollárRoss Girshick
8 Judging LLM-as-a-judge with MT-Bench and Chatbot Arena https://arxiv.org/abs/2306.05685v2  Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them.  https://arxiv.org/abs/2306.05685v2  Lianmin ZhengWei-Lin ChiangYing ShengSiyuan ZhuangZhanghao WuYonghao ZhuangZi LinZhuohan LiDacheng LiEric. P XingHao ZhangJoseph E. GonzalezIon Stoica
9 A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity https://arxiv.org/abs/2302.04023v2  This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. https://arxiv.org/abs/2302.04023v2  Yejin BangSamuel CahyawijayaNayeon LeeWenliang DaiDan SuBryan WilieHoly LoveniaZiwei JiTiezheng YuWilly ChungQuyet V. DoYan XuPascale Fung
10 A Survey of Large Language Models https://arxiv.org/abs/2303.18223v11  Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. https://arxiv.org/abs/2303.18223v11  Wayne Xin ZhaoKun ZhouJunyi LiTianyi TangXiaolei WangYupeng HouYingqian MinBeichen ZhangJunjie ZhangZican DongYifan DuChen YangYushuo ChenZhipeng ChenJinhao JiangRuiyang RenYifan LiXinyu TangZikang LiuPeiyu LiuJian-Yun NieJi-Rong Wen
11 Visual Instruction Tuning https://arxiv.org/abs/2304.08485v1  Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding. https://arxiv.org/abs/2304.08485v1  Haotian LiuChunyuan LiQingyang WuYong Jae Lee
12 Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601v1  Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.  https://arxiv.org/abs/2305.10601v1  Shunyu YaoDian YuJeffrey ZhaoIzhak ShafranThomas L. GriffithsYuan CaoKarthik Narasimhan
13 Voyager: An Open-Ended Embodied Agent with Large Language Models https://arxiv.org/abs/2305.16291v1  We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. https://arxiv.org/abs/2305.16291v1  Guanzhi WangYuqi XieYunfan JiangAjay MandlekarChaowei XiaoYuke ZhuLinxi FanAnima Anandkumar
14 Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761v1  Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds.  https://arxiv.org/abs/2302.04761v1  Timo SchickJane Dwivedi-YuRoberto DessìRoberta RaileanuMaria LomeliLuke ZettlemoyerNicola CanceddaThomas Scialom
15 How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection https://arxiv.org/abs/2301.07597v1  The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. https://arxiv.org/abs/2301.07597v1  Biyang GuoXin ZhangZiyuan WangMinqi JiangJinran NieYuxuan DingJianwei YueYupeng Wu
16 Extracting Training Data from Diffusion Models https://arxiv.org/abs/2301.13188v1  Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos.  https://arxiv.org/abs/2301.13188v1  Nicholas CarliniJamie HayesMilad NasrMatthew JagielskiVikash SehwagFlorian TramèrBorja BalleDaphne IppolitoEric Wallace
17 Large Language Models are not Fair Evaluators https://arxiv.org/abs/2305.17926v1  We uncover a systematic bias in the evaluation paradigm of adopting large language models~(LLMs), e.g., GPT-4, as a referee to score the quality of responses generated by candidate models. We find that the quality ranking of candidate responses can be easily hacked by simply altering their order of appearance in the context. This manipulation allows us to skew the evaluation result, making one model appear considerably superior to the other, e.g., vicuna could beat ChatGPT on 66 over 80 tested queries. https://arxiv.org/abs/2305.17926v1  Peiyi WangLei LiLiang ChenDawei ZhuBinghuai LinYunbo CaoQi LiuTianyu LiuZhifang Sui
18 HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face https://arxiv.org/abs/2303.17580v3  Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. https://arxiv.org/abs/2303.17580v3  Yongliang ShenKaitao SongXu TanDongsheng LiWeiming LuYueting Zhuang
19 A Watermark for Large Language Models https://arxiv.org/abs/2301.10226v3  Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. https://arxiv.org/abs/2301.10226v3  John KirchenbauerJonas GeipingYuxin WenJonathan KatzIan MiersTom Goldstein
20 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature https://arxiv.org/abs/2301.11305v2  The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM.  https://arxiv.org/abs/2301.11305v2  Eric MitchellYoonho LeeAlexander KhazatskyChristopher D. ManningChelsea Finn

That's it. Those are the Top 20 Current Most Cited Research Papers on AI this year.

💬 Join the Conversation: If you think we missed a great paper, please share it in the comments below.
The community would love to hear your opinions and we encourage others to start talking more about AI.
It's a critical conversation right now and we all have a stake in what's next!