通义千问模型使用text-generation-webui搭建webui页面

text-generation-webui 适用于大型语言模型的 Gradio Web UI。支持transformers、GPTQ、AWQ、EXL2、llama.cpp (GGUF)、Llama 模型。它的特点如下，3种界面模式：default (two columns), notebook, chat。

yygr

2854人浏览 · 2024-03-18 09:59:41

yygr · 2024-03-18 09:59:41 发布

https://blog.csdn.net/engchina/article/details/135241598

0. 背景

一直喜欢用 FastChat 本地部署大语言模型，今天试一试 text-generation-webui 这个项目。

在这里插入图片描述

1. text-generation-webui 介绍

text-generation-webui 适用于大型语言模型的 Gradio Web UI。支持transformers、GPTQ、AWQ、EXL2、llama.cpp (GGUF)、Llama 模型。

它的特点如下，

3种界面模式：default (two columns), notebook, chat
支持多个模型后端：Transformers、llama.cpp（通过 llama-cpp-python）、ExLlama、ExLlamaV2、AutoGPTQ、AutoAWQ、GPTQ-for-LLaMa、CTransformers、QuIP#。
下拉菜单可在不同模型之间快速切换。
大量扩展（内置和用户贡献），包括用于真实语音输出的 Coqui TTS、用于语音输入的 Whisper STT、翻译、多模式管道、向量数据库、Stable Diffusion集成等等。有关详细信息，请参阅 wiki 和扩展目录。
与自定义角色聊天。
适用于指令跟踪模型的精确聊天模板，包括 Llama-2-chat、Alpaca、Vicuna、Mistral。
LoRA：使用您自己的数据训练新的 LoRA，动态加载/卸载 LoRA 以进行生成。
Transformers 库集成：通过 bitsandbytes 以 4 位或 8 位精度加载模型，将 llama.cpp 与 Transformers 采样器（ llamacpp_HF 加载器）结合使用，使用 PyTorch 以 32 位精度进行 CPU 推理。
具有 OpenAI 兼容的 Chat 和 Completions API 服务器 - 请参阅示例。

2. 克隆代码

git clone https://github.com/oobabooga/text-generation-webui.git; 
cd text-generation-webui

3. 创建虚拟环境

(Optional)安装 Conda，

curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > "Miniconda3.sh"
bash Miniconda3.sh

创建虚拟环境，

conda create -n textgen python=3.11 -y
conda activate textgen

4. 安装 pytorch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

确认 pytorch 是否安装成功，

python -c "import torch;print(torch.cuda.is_available()):

--- 安装成功输出应该为 True
True

5. 安装 CUDA 运行时库

conda install -y -c "nvidia/label/cuda-12.1.1" cuda-runtime

如果您需要 nvcc 手动编译某些库，请将上面的命令替换为，

conda install -y -c "nvidia/label/cuda-12.1.1" cuda

6. 安装依赖库

pip install -r requirements.txt
pip install transformers_stream_generator
pip install tiktoken

7. 下载通义千问语言模型

https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/files

下载后的模型文件路径，比如我的是F:\llm\Qwen\Qwen1.5-0.5B-Chat，下一步会用到。

8. 启动 Web UI

--model-dir是模型文件的父文件夹路径
--model 是模型文件夹名称
--listen-port 是webui的访问端口

python server.py --model-dir F:\llm\Qwen --model Qwen1.5-4B-Chat --listen --listen-port 3304

9. 访问 Web UI

使用浏览器打开 http://localhost:3304

在这里插入图片描述

因为我下的模型是instruct, Chat页面的mode选择instruct, 不然选择chat会无法显示回复

在这里插入图片描述

10. OpenAI 兼容 API

pip install -r extensions/openai/requirements.txt

启动，

python server.py --trust-remote-code --api --api-port 8000 --listen

refer:https://github.com/oobabooga/text-generation-webui/wiki/12—OpenAI-API

完结！

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

【Spring Boot 】Spring Boot + HikariCP 连接池使用示例

文章目录示例工具版本HikariCP 依赖HikariCP 配置1. connectionTimeout2. minimumIdle3. maximumPoolSize4. idleTimeout5. maxLifetime6. autoCommitSpring Boot Data + HikariCP + MySQL示例测试应用程序1. 使用 Maven 命令2. 使用 Eclipse3. 使用