私人AI翻译助理来啦！取代谷歌翻译，让翻译不再受限。使用 Hugging Face LLM 和 Python，部署简单易操作

引言：

在当今全球化的背景下，与来自不同国家和文化的人交流变得越来越重要。对于技术人员而言，阅读英文资料是常见需求，但语言障碍可能降低学习效率。尽管谷歌翻译是流行工具，但对于较长的文本，它存在局限性。繁琐的复制粘贴操作让人感到厌烦，为何不使用流行的人工智能助理解决这一问题呢？考虑到隐私和安全，我们可以借助Hugging Face构建个人翻译应用程序，确保数据的隐私并实现准确翻译。这样一来，我们不再重复操作，获得更好的翻译体验。

一、安装依赖项

需要使用包包含访问 Hugging Face 模型、创建大块的长文本和图形界面。

第一步，创建一个全新的目录AI_translator并运行venv，执行命令：

mkdir AI_translatorcd AI_translatorpython3 -m venv venv

第二步，激活虚拟环境，执行命令：

source venv/bin/activate #for macvenv\Scripts\activate #for windows

第三步，激活 venv 后，依次安装下面依赖性：

pip install mkl mkl-include # Mac 用户的 CPU 使用率需要pip install torch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 # 核心pip install transformerspip install langchain==0.0.173pip install streamlitpip install streamlit-extras

第四步，若使用的模型使用Tensorflow来创建权重，还需安装 Tensorflow：

pip install tensorflow

二、下载语言模型

应用的核心是语言翻译模型，我们想要从英文到中文的翻译，那么选择的目标语言（中文）和原始文本语言（English），翻译模型特定顺序训练为english to 中文：

https://huggingface.co/Helsinki-NLP/opus-mt-en-zh?text=My+name+is+Sarah+and+I+live+in+London

也可以在Hugging Face 库中选择一组合适的翻译模型，满足自己的翻译需求：

https://huggingface.co/Helsinki-NLPbr

1. 创建子目录model_zh

2. 点击files and version，下载以下文件，放置于model_zh目录

README.mdconfig.jsongeneration_config.jsonpytorch_model.binsource.spmtarget.spmtokenizer_config.jsonvocab.json

全部下载完成后，准备工作就完成了

三、测试模型

1.创建名为 test_en2zh.py的文件，模型测试代码如下：import torchfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLMfrom transformers import pipelineimport datetime#LOCAL MODEL EN-IT#———————————# Helsinki-NLP/opus-mt-en-zhModel_ZH = ./model_zh/ #torch#———————————English = “Imagine a world where AI-driven technologies enable us to communicate more effectively, analyze enormous amounts of textual data, and make informed decisions in just seconds. A world where chatbots comprehend our intentions and respond with human-like clarity. This world is no longer a far–off dream, but an approaching reality, due to the remarkable advancements in AI technologies such as ChatGPT and LangChain. In this article, we will dive into the groundbreaking innovations of ChatGPT and LangChain, examine their potential applications, and uncover how they are transforming the AI landscape.“from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer_tt0zh = AutoTokenizer.from_pretrained(Model_ZH) print(===>初始化AI语言模型…)# 调用模型时需要指定 tensorflow 框架，使用参数 from_tf=True# repo_id = “Helsinki-NLP/opus-mt-en-zh“# model_tt0zh = AutoModelForSeq2SeqLM.from_pretrained(repo_id, from_tf=True)model_tt0zh = AutoModelForSeq2SeqLM.from_pretrained(Model_ZH) #Helsinki-NLP/opus-mt-en-zhprint(“===>pipeline“)TToZH = pipeline(“translation“, model=model_tt0zh, tokenizer=tokenizer_tt0zh)print(“===>翻译正在进行中“)start = datetime.datetime.now() finaltext = TToZH(English)stop = datetime.datetime.now() elapsed = stop – startprint(f===>翻译完成于：{elapsed}…\n)print(finaltext[0][translation_text])print(f”\n===>翻译内容包含单词 {len(English.split())} 个“)2.终端执行测试代码python3 test_en2zh.py

可以看到以下内容：

至此，模型测试完成。

四、创建webUI

1.创建名为 translationer.py的文件，使用 Streamlit 库来创建 Web 界面

代码如下：

import streamlit as stimport torchfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLMfrom transformers import pipelinefrom langchain.text_splitter import CharacterTextSplitterimport datetime############# 在前端显示图像 #################st.set_page_config(page_title=“私人AI翻译助理”, page_icon=♾️, layout=“centered”, #or wide initial_sidebar_state=“expanded”, menu_items={ Get Help: https://docs.streamlit.io/library/api-reference, Report a bug: “https://www.extremelycoolapp.com/bug”, About: “一个懂你的AI翻译助理” }, )# #LOCAL MODEL EN-ZH#———————————# Helsinki-NLP/opus-mt-en-zhModel_ZH = ./model_zh/ #torch#———————————### HEADER sectionst.header(“私人AI翻译助理：帮你把英文翻译成中文”)English = st.text_area(“”, height=240, key=“original”,placeholder=“请输入或者黏贴英文内容…”)col1, col2, col3 = st.columns([2,5,2])btn_translate = col2.button(“✅ 开始翻译”, use_container_width=True, type=“primary”, key=start)if btn_translate: if English: Model_ZH = ./model_zh/ #torch with st.spinner(AI翻译助理准备中…): st.success( AI翻译助理开始翻译, icon=“?”) # 用于分块的文本分离器函数 text_splitter = CharacterTextSplitter( separator = “\n\n”, chunk_size = 300, chunk_overlap = 0, length_function = len, ) # 将文档分块 st.success( 文档块文本…, icon=“?”) texts = text_splitter.create_documents([English]) from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # 初始化翻译从英文到中文 tokenizer_tt0zh = AutoTokenizer.from_pretrained(Model_ZH) st.success( 初始AI语言模型…, icon=“?”) model_tt0zh = AutoModelForSeq2SeqLM.from_pretrained(Model_ZH) #Helsinki-NLP/opus-mt-en-zh or #Helsinki-NLP/opus-mt-it-zh TToZH = pipeline(“translation”, model=model_tt0zh, tokenizer=tokenizer_tt0zh) # 遍历块并连接翻译 finaltext = start = datetime.datetime.now() print([bold yellow]翻译进行中…) for item in texts: line = TToZH(item.page_content)[0][translation_text] finaltext = finaltext+line+\n stop = datetime.datetime.now() elapsed = stop – start st.success(f翻译完成于 {elapsed}, icon=“?”) print(f[bold underline green1] Translation generated in [reverse dodger_blue2]{elapsed}[/reverse dodger_blue2]…) st.text_area(label=“中文翻译：”, value=finaltext, height=350) st.markdown(f翻译完成于：**{elapsed}**) st.markdown(f”翻译内容包含单词 {len(English.split())} 个”) else: st.warning(“请输入您需要翻译的文本内容！”, icon=“⚠️”)

2. 终端执行命令，启动程序

streamlit run translationer.py控制台打印访问地址

至此，项目启动成功。

五、DEMO演示

翻译体验开始(不需要联网环境也可以使用)

翻译中：

翻译结果输出：

感兴趣的同学，记得点个赞?

小编会经常更新有趣的AI工具

➕关注不迷路！！！