近期大语言模型迅速发展,让大家看得眼花缭乱,感觉现在LLM的快速发展堪比寒武纪大爆炸,各个模型之间的关系也让人看的云里雾里。最近一些学者整理出了 ChatGPT 等语言模型的发展历程的进化树图,让大家可以对LLM之间的关系一目了然。

论文:https://arxiv.org/abs/2304.13712

Github(相关资源):https://github.com/Mooler0410/LLMsPracticalGuide

最重要的进化树图:

进化的树图

现代语言模型的进化树追溯了近年来语言模型的发展,并强调了一些最著名的模型。同一分支上的模型关系更近。基于Transformer的模型显示为非灰色颜色:仅解码器模型显示为蓝色分支,仅编码器模型显示为粉红色分支,而编码器-解码器模型显示为绿色分支。时间轴上模型的垂直位置代表其发布日期。开源模型由实心方块表示,而闭源模型由空心方块表示。右下角的堆叠条形图显示来自各公司和机构的模型数量。

然后是按年进化的动图,主要内容和上图相同。

论文内容简介(Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

论文地址:https://arxiv.org/abs/2304.13712

趋势

a) 仅解码器模型在语言模型的发展中逐渐占主导地位。在语言模型发展的早期阶段,仅解码器模型不如仅编码器模型和编码器-解码器模型流行。然而,在2021年之后,随着改变游戏规则的语言模型GPT-3的推出,仅解码器模型经历了显著的繁荣。与此同时,在BERT带来的初始爆炸性增长之后,仅编码器模型逐渐开始消失。

b) OpenAI在语言模型领域始终保持领先地位,无论当前还是未来。其他公司和机构正在努力追赶OpenAI开发可与GPT-3和当前的GPT-4相媲美的模型。这一领先地位可归因于OpenAI即使在最初并不广泛认可的情况下也坚定地坚持其技术路线。

c) Meta对开源语言模型做出了重大贡献,并促进了语言模型的研究。在考虑对开源社区的贡献时,特别是与语言模型相关的贡献,Meta作为最慷慨的商业公司之一脱颖而出,因为Meta开发的所有语言模型都是开源的。

d) 语言模型表现出倾向于闭源的趋势。在语言模型发展的早期阶段(2020年之前),大多数模型都是开源的。然而,随着GPT-3的推出,公司越来越倾向于闭源其模型,如PaLM、LaMDA和GPT-4。因此,学术研究人员更难进行语言模型训练的实验。因此,基于API的研究可能成为学术界的主流方法。

e) 编码器-解码器模型仍然具有前景,因为这种体系结构仍在积极探索中,并且大多数都是开源的。Google对开源编码器-解码器架构做出了重大贡献。然而,仅解码器模型的灵活性和多功能性似乎使得Google坚持这一方向的前景不太乐观。

总之,仅解码器模型和开源模型在近年来占据主导地位,而OpenAI和Meta在推动语言模型创新和开源方面做出了重大贡献。与此同时,编码器-解码器模型和闭源模型也在一定程度上推动了发展。各家公司和机构在技术发展路径上面临不同的前景。

模型实用指南(Practical Guide for Models)

LLM实用指南资源的精选(仍在积极更新)列表。它基于调查论文:在实践中利用LLM的力量:关于ChatGPT及其他的调查。这些资源旨在帮助从业者驾驭大型语言模型(LLM)及其在自然语言处理(NLP)应用程序中的应用。

BERT风格的语言模型:编码器-解码器或仅编码器(Encoder-Decoder or Encoder-only)

  • BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018, Paper
  • RoBERTa ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 2019, Paper
  • DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019, Paper
  • ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, 2019, Paper
  • UniLM Unified Language Model Pre-training for Natural Language Understanding and Generation, 2019 Paper
  • ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS, 2020, Paper
  • T5 "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"Colin Raffel et al. JMLR 2019. Paper
  • GLM "GLM-130B: An Open Bilingual Pre-trained Model". 2022. Paper
  • AlexaTM "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model"Saleh Soltan et al. arXiv 2022. Paper
  • ST-MoE ST-MoE: Designing Stable and Transferable Sparse Expert Models. 2022 Paper

GPT 风格语言模型:仅解码器(Decoder-only)

  • GPT Improving Language Understanding by Generative Pre-Training. 2018. Paper
  • GPT-2 Language Models are Unsupervised Multitask Learners. 2018. Paper
  • GPT-3 "Language Models are Few-Shot Learners". NeurIPS 2020. Paper
  • OPT "OPT: Open Pre-trained Transformer Language Models". 2022. Paper
  • PaLM "PaLM: Scaling Language Modeling with Pathways"Aakanksha Chowdhery et al. arXiv 2022. Paper
  • BLOOM "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". 2022. Paper
  • MT-NLG "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". 2021. Paper
  • GLaM "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". ICML 2022. Paper
  • Gopher "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". 2021. Paper
  • chinchilla "Training Compute-Optimal Large Language Models". 2022. Paper
  • LaMDA "LaMDA: Language Models for Dialog Applications". 2021. Paper
  • LLaMA "LLaMA: Open and Efficient Foundation Language Models". 2023. Paper
  • GPT-4 "GPT-4 Technical Report". 2023. Paper
  • BloombergGPT BloombergGPT: A Large Language Model for Finance, 2023, Paper
  • GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". 2022. Paper

数据实用指南

预训练数据

  • RedPajama, 2023. Repo
  • The Pile: An 800GB Dataset of Diverse Text for Language Modeling, Arxiv 2020. Paper
  • How does the pre-training objective affect what large language models learn about linguistic properties?, ACL 2022. Paper
  • Scaling laws for neural language models, 2020. Paper
  • Data-centric artificial intelligence: A survey, 2023. Paper
  • How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, 2022. Blog

微调数据

  • Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach, EMNLP 2019. Paper
  • Language Models are Few-Shot Learners, NIPS 2020. Paper
  • Does Synthetic Data Generation of LLMs Help Clinical Text Mining? Arxiv 2023 Paper

测试数据/用户数据

  • Shortcut learning of large language models in natural language understanding: A survey, Arxiv 2023. Paper
  • On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective Arxiv, 2023. Paper
  • SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Arxiv 2019. Paper

NLP 任务实用指南

研究者为用户的NLP应用程序构建了一个选择LLM或微调模型的决策流程~\protect\footnotemark。决策流程可帮助用户评估其手头的下游NLP应用程序是否满足特定条件,并根据该评估确定LLM或微调模型是否最适合其应用程序。

传统的非语言处理任务( NLU tasks)

  • A benchmark for toxic comment classification on civil comments dataset Arxiv 2023 Paper
  • Is chatgpt a general-purpose natural language processing task solver? Arxiv 2023Paper
  • Benchmarking large language models for news summarization Arxiv 2022 Paper

生成任务

  • News summarization and evaluation in the era of gpt-3 Arxiv 2022 Paper
  • Is chatgpt a good translator? yes with gpt-4 as the engine Arxiv 2023 Paper
  • Multilingual machine translation systems from Microsoft for WMT21 shared task, WMT2021 Paper
  • Can ChatGPT understand too? a comparative study on chatgpt and fine-tuned bert, Arxiv 2023, Paper

知识密集型任务

  • Measuring massive multitask language understanding, ICLR 2021 Paper
  • Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, Arxiv 2022 Paper
  • Inverse scaling prize, 2022 Link
  • Atlas: Few-shot Learning with Retrieval Augmented Language Models, Arxiv 2022 Paper
  • Large Language Models Encode Clinical Knowledge, Arxiv 2022 Paper

缩放能力

  • Training Compute-Optimal Large Language Models, NeurIPS 2022 Paper
  • Scaling Laws for Neural Language Models, Arxiv 2020 Paper
  • Solving math word problems with process- and outcome-based feedback, Arxiv 2022 Paper
  • Chain of thought prompting elicits reasoning in large language models, NeurIPS 2022 Paper
  • Emergent abilities of large language models, TMLR 2022 Paper
  • Inverse scaling can become U-shaped, Arxiv 2022 Paper
  • Towards Reasoning in Large Language Models: A Survey, Arxiv 2022 Paper

具体任务

  • Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, Arixv 2022 Paper
  • PaLI: A Jointly-Scaled Multilingual Language-Image Model, Arxiv 2022 Paper
  • AugGPT: Leveraging ChatGPT for Text Data Augmentation, Arxiv 2023 Paper
  • Is gpt-3 a good data annotator?, Arxiv 2022 Paper
  • Want To Reduce Labeling Cost? GPT-3 Can Help, EMNLP findings 2021 Paper
  • GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation, EMNLP findings 2021 Paper
  • LLM for Patient-Trial Matching: Privacy-Aware Data Augmentation Towards Better Performance and Generalizability, Arxiv 2023 Paper
  • ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks, Arxiv 2023 Paper
  • G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment, Arxiv 2023 Paper
  • GPTScore: Evaluate as You Desire, Arxiv 2023 Paper
  • Large Language Models Are State-of-the-Art Evaluators of Translation Quality, Arxiv 2023 Paper
  • Is ChatGPT a Good NLG Evaluator? A Preliminary Study, Arxiv 2023 Paper

现实世界的“任务”

  • Sparks of Artificial General Intelligence: Early experiments with GPT-4, Arxiv 2023 Paper

效率

1.成本

  • Openai’s gpt-3 language model: A technical overview, 2020. Blog Post
  • Measuring the carbon intensity of ai in cloud instances, FaccT 2022. Paper
  • In AI, is bigger always better?, Nature Article 2023. Article
  • Language Models are Few-Shot Learners, NeurIPS 2020. Paper
  • Pricing, OpenAI. Blog Post

2.延迟

  • HELM: Holistic evaluation of language models, Arxiv 2022. Paper

3.参数高效微调

  • LoRA: Low-Rank Adaptation of Large Language Models, Arxiv 2021. Paper
  • Prefix-Tuning: Optimizing Continuous Prompts for Generation, ACL 2021. Paper
  • P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks, ACL 2022. Paper
  • P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks, Arxiv 2022. Paper

4.预训练系统

  • ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, Arxiv 2019. Paper
  • Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Arxiv 2019. Paper
  • Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM, Arxiv 2021. Paper
  • Reducing Activation Recomputation in Large Transformer Models, Arxiv 2021. Paper

信用

  1. 稳健性和校准
  • Calibrate before use: Improving few-shot performance of language models, ICML 2021. Paper
  • SPeC: A Soft Prompt-Based Calibration on Mitigating Performance Variability in Clinical Notes Summarization, Arxiv 2023. Paper

2. 杂散偏置(Spurious biases)

  • Shortcut learning of large language models in natural language understanding: A survey, 2023 Paper
  • Mitigating gender bias in captioning system, WWW 2020 Paper
  • Calibrate Before Use: Improving Few-Shot Performance of Language Models, ICML 2021 Paper
  • Shortcut Learning in Deep Neural Networks, Nature Machine Intelligence 2020 Paper
  • Do Prompt-Based Models Really Understand the Meaning of Their Prompts?, NAACL 2022 Paper

3. 安全问题

  • GPT-4 System Card, 2023 Paper
  • The science of detecting llm-generated texts, Arxiv 2023 Paper
  • How stereotypes are shared through language: a review and introduction of the aocial categories and stereotypes communication (scsc) framework, Review of Communication Research, 2019 Paper
  • Gender shades: Intersectional accuracy disparities in commercial gender classification, FaccT 2018 Paper

基准指导调优(Benchmark Instruction Tuning)

  • FLAN: Finetuned Language Models Are Zero-Shot Learners, Arxiv 2021 Paper
  • T0: Multitask Prompted Training Enables Zero-Shot Task Generalization, Arxiv 2021 Paper
  • Cross-task generalization via natural language crowdsourcing instructions, ACL 2022 Paper
  • Tk-INSTRUCT: Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks, EMNLP 2022 Paper
  • FLAN-T5/PaLM: Scaling Instruction-Finetuned Language Models, Arxiv 2022 Paper
  • The Flan Collection: Designing Data and Methods for Effective Instruction Tuning, Arxiv 2023 Paper
  • OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, Arxiv 2023 Paper

校准(Alignment)

  • Deep Reinforcement Learning from Human Preferences, NIPS 2017 Paper
  • Learning to summarize from human feedback, Arxiv 2020 Paper
  • A General Language Assistant as a Laboratory for Alignment, Arxiv 2021 Paper
  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Arxiv 2022 Paper
  • Teaching language models to support answers with verified quotes, Arxiv 2022 Paper
  • InstructGPT: Training language models to follow instructions with human feedback, Arxiv 2022 Paper
  • Improving alignment of dialogue agents via targeted human judgements, Arxiv 2022 Paper
  • Scaling Laws for Reward Model Overoptimization, Arxiv 2022 Paper
  • Scalable Oversight: Measuring Progress on Scalable Oversight for Large Language Models, Arxiv 2022 Paper

安全校准(无害)

  • Red Teaming Language Models with Language Models, Arxiv 2022 Paper
  • Constitutional ai: Harmlessness from ai feedback, Arxiv 2022 Paper
  • The Capacity for Moral Self-Correction in Large Language Models, Arxiv 2023 Paper
  • OpenAI: Our approach to AI safety, 2023 Blog

真实性一致性(诚实)

  • Reinforcement Learning for Language Models, 2023 Blog

提示实用指南(有用)

  • OpenAI CookbookBlog
  • Prompt EngineeringBlog
  • ChatGPT Prompt Engineering for Developers!Course

开源社区的工作

  • Self-Instruct: Aligning Language Model with Self Generated Instructions, Arxiv 2022 Paper
  • AlpacaRepo
  • VicunaRepo
  • DollyBlog
  • DeepSpeed-ChatBlog
  • GPT4AllRepo
  • OpenAssitantRepo
  • ChatGLMRepo
  • MOSSRepo
  • LaminiRepo/Blog

感觉有帮助的朋友,欢迎赞同、关注、分享三连。^-^

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐