DeepSeek V4 文檔翻譯實測：對比 V3.2、GPT-5.4、Claude 4.7、Gemini 3 Pro

前言：DeepSeek V4 真的能翻譯好你的文檔嗎？

DeepSeek V4 一發布就衝上各大技術社群熱搜，跑分漂亮、價格也沒漲太多。但跑分和真實文檔翻譯是兩回事——我們這邊的使用者問得最多的就是：「V4 值不值得換？PDF、合約、學術論文這類實際檔案上，V4 比 V3 好多少？和 GPT-5.4、Claude 4.7、Gemini 3 Pro 這些一線旗艦比起來又如何？」

於是我們第一時間拿到 DeepSeek V4 的 API（包含 deepseek-v4-pro 和 deepseek-v4-flash 兩個版本），做了一場嚴謹的橫向評測：

6 個模型同場 PK：DeepSeek V4 Pro、V4 Flash、V3.2、GPT-5.4、Claude Opus 4.7、Gemini 3 Pro Preview
5 種真實文檔場景：學術論文、法律合約、技術文檔（含程式碼）、文學作品、漫畫對白
雙 LLM 裁判 + 盲評：GPT-5.4 與 Claude Opus 4.7 各自在不同的匿名標籤順序下評分
5 個維度評分：忠實度、流暢度、術語準確、風格匹配、格式保留（1–5 分制）

接下來就把結論、方法、每一條原文的 6 個譯文、以及延遲 / 成本數據全部攤開給你看。

一圖流結論（給沒時間看完的朋友）

排名	模型	綜合分	忠實度	流暢度	術語	風格	格式	平均延遲
🥇 1	GPT-5.4	4.68	4.7	4.7	4.6	4.5	4.9	4.5 s
🥈 2	Claude Opus 4.7	4.62	4.2	4.8	4.4	4.7	5.0	—
🥉 3	Gemini 3 Pro Preview	4.56	4.4	4.7	4.5	4.4	4.8	14.2 s
4	DeepSeek V4 Pro	4.38	4.4	4.4	4.4	4.3	4.4	17.1 s
5	DeepSeek V4 Flash	4.38	4.2	4.3	4.4	4.0	5.0	4.7 s
6	DeepSeek V3.2	4.26	4.3	4.1	4.3	4.0	4.6	4.6 s

三句話總結：

DeepSeek V4 相對 V3.2 確實有提升，但幅度有限（+0.12 分 / 滿分 5），還追不上 GPT-5.4 和 Claude 4.7 這一檔。
V4 Pro 和 V4 Flash 綜合分打平，Pro 強在推理帶來的語義理解，Flash 快了 4 倍、也便宜得多——大多數使用者用 Flash 就夠了。
DeepSeek 在中譯外、尤其是文學與漫畫場景上仍有明顯差距；反過來，它在中文技術文檔場景很強——V3.2 甚至幹掉了所有旗艦模型。

一、評測方法：我們是怎麼做到公平的？

1. 6 個參評模型

模型 ID	類型	呼叫路徑
`deepseek-v4-pro`	新發布旗艦（帶推理）	DeepSeek 官方 API
`deepseek-v4-flash`	新發布輕量（淺推理）	DeepSeek 官方 API
`deepseek-v3.2`	上一代	代理 API
`gpt-5.4`	OpenAI 當前主流旗艦	代理 API
`claude-opus-4-7`	Anthropic 旗艦	對話內呼叫
`gemini-3-pro-preview-r`	Google 最新旗艦預覽	代理 API

2. 5 個文檔場景（短片段，每個聚焦一項難點）

場景	方向	難點
學術論文摘要	EN → ZH	專業術語、被動語態、正式書面語
法律合約條款	EN → ZH	長句、精確性、法律用語
技術文檔（含程式碼）	EN → ZH	保留 inline code / 識別符 / 數值
文學作品（魯迅《故鄉》）	ZH → EN	語感、節奏、古典意象
漫畫對白（少年漫畫）	JA → EN	口語、角色語氣、日文句末助詞

3. 統一的最小化 Prompt

為了排除 prompt engineering 的差異，我們用了完全一致的最小化指令：

System:
You are a professional document translator. Translate the following text
from {SRC} to {TGT}. Preserve all inline code snippets (text inside backticks),
identifiers, numbers, mathematical notation, and paragraph breaks exactly
as they appear in the source. Output only the translation text, with no
explanations, no notes, and no additional commentary.

User: {原文}

所有模型 temperature=0.3、max_tokens=4096，完全一樣。

4. 雙裁判 + 盲評

單一模型當裁判會偏袒自己陣營的譯文，所以我們設了兩個裁判：

裁判 1：GPT-5.4（temperature=0，JSON 輸出）
裁判 2：Claude Opus 4.7

每個場景的 6 份譯文分別用兩套不同的隨機種子打亂標籤順序再交給裁判——裁判不知道誰是誰，只能看到 A / B / C / D / E / F。最終分數 = 兩位裁判在 5 個維度上的平均。

二、5 個場景逐一實戰

場景一：學術論文摘要（EN → ZH）

原文：We propose a novel sparse attention mechanism that reduces memory complexity from O(n^2) to O(n log n) while preserving the expressiveness of full self-attention. Empirical results on GLUE benchmarks show our method retains 98.7% of the baseline performance at 42% of the memory cost, with negligible impact on downstream task accuracy.

模型	譯文
DeepSeek V4 Pro	我们提出了一种新颖的稀疏注意力机制，将内存复杂度从 O(n^2) 降低至 O(n log n)，同时保留了全自注意力的表达能力。在 GLUE 基准上的实证结果表明，我们的方法以 42% 的内存开销保持了 98.7% 的基线性能，且对下游任务准确率的影响可忽略不计。
DeepSeek V4 Flash	我们提出了一种新颖的稀疏注意力机制，将内存复杂度从 O(n^2) 降低至 O(n log n)，同时保持完整自注意力的表达能力。在 GLUE 基准上的实验结果表明，我们的方法在仅消耗 42% 内存成本的情况下，保留了基线模型 98.7% 的性能，且对下游任务准确率的影响可忽略不计。
DeepSeek V3.2	我们提出了一种新颖的稀疏注意力机制，将内存复杂度从O(n²)降低至O(n log n)，同时保持了完整自注意力的表达能力。在GLUE基准测试上的实证结果表明，该方法仅以42%的内存成本即保留了基线模型98.7%的性能，对下游任务准确率的影响微乎其微。
GPT-5.4	我们提出了一种新颖的稀疏注意力机制，在保留完整自注意力表达能力的同时，将内存复杂度从 O(n^2) 降低到 O(n log n)。在 GLUE 基准测试上的实证结果表明，我们的方法以 42% 的内存成本保留了基线性能的 98.7%，对下游任务准确率的影响可以忽略不计。
Claude Opus 4.7	我们提出一种新颖的稀疏注意力机制，将内存复杂度从 O(n^2) 降低至 O(n log n)，同时保留了完整自注意力的表达能力。在 GLUE 基准测试上的实证结果表明，我们的方法在仅占基线 42% 内存开销的情况下保留了 98.7% 的性能，对下游任务精度的影响可以忽略不计。
Gemini 3 Pro Preview	我们提出了一种新颖的稀疏注意力机制，将内存复杂度从 O(n^2) 降低至 O(n log n)，同时保留了完整自注意力的表达能力。在 GLUE 基准测试上的实验结果表明，我们的方法在仅需 42% 内存开销的情况下保留了 98.7% 的基线性能，且对下游任务准确率的影响可忽略不计。

🏆 並列冠軍：DeepSeek V4 Pro / GPT-5.4 / Gemini 3 Pro（皆 4.9/5）

學術寫作是 DeepSeek V4 Pro 的主場，在此和 GPT-5.4 打成平手。
唯一被明確扣分的是 V3.2——它把 O(n^2) 自作主張改成了 Unicode 上標 O(n²)，對格式保留是個小扣分項。這也是 DeepSeek V4 相對 V3.2 最明顯的進步：它學會了乖乖按原樣保留數學符號。
Claude 用「精度」而非「準確率」是合理同義，未扣分。

場景二：法律合約條款（EN → ZH）

原文：The Licensor hereby grants to the Licensee a non-exclusive, non-transferable, revocable license to use the Software solely for internal business purposes, provided that the Licensee shall not, directly or indirectly, reverse-engineer, decompile, or disassemble the Software in whole or in part, except to the extent such restriction is expressly prohibited by applicable law.

模型	譯文
DeepSeek V4 Pro	许可方特此授予被许可方一项非排他性、不可转让、可撤销的许可，允许其仅为内部业务目的使用本软件，但被许可方不得直接或间接对本软件的全部或部分进行逆向工程、反编译或反汇编，除非适用法律明确禁止此类限制。
DeepSeek V4 Flash	许可方特此授予被许可方一项非独占、不可转让、可撤销的许可，允许其仅为内部业务目的使用本软件，但被许可方不得直接或间接地对本软件的全部或部分进行反向工程、反编译或反汇编，除非适用法律明确禁止此类限制。
DeepSeek V3.2	许可方在此授予被许可方一项非排他性、不可转让、可撤销的许可，允许被许可方仅出于内部业务目的使用本软件，但被许可方不得直接或间接对本软件进行全部或部分的反向工程、反编译或反汇编，除非适用法律明确禁止此类限制。
GPT-5.4	许可方特此授予被许可方一项非排他性、不可转让、可撤销的许可，仅可将本软件用于内部业务目的，前提是被许可方不得直接或间接对本软件全部或部分进行逆向工程、反编译或反汇编，但适用法律明确禁止此类限制的范围除外。
Claude Opus 4.7	许可方特此授予被许可方一项非独占性、不可转让、可撤销的许可，允许其仅出于内部业务目的使用本软件；但被许可方不得直接或间接地对本软件的全部或任何部分进行反向工程、反编译或反汇编，除非适用法律明确禁止作出此类限制。
Gemini 3 Pro Preview	许可方特此授予被许可方一项非排他、不可转让、可撤销的许可，允许其仅为内部业务目的使用软件，前提是被许可方不得直接或间接地对软件的全部或部分进行逆向工程、反编译或反汇编，除非适用法律明确禁止此类限制。

🏆 冠軍：GPT-5.4（滿分 5.0/5）

法律翻譯的核心是精確還原條件結構。原文是 "provided that ... except to the extent ..."——兩層條件巢狀。
只有 GPT-5.4 一個模型把 "provided that" 譯成「前提是」、"except to the extent" 譯成「但……除外」，完整保留了條件授予 + 例外條款的雙層結構。
其他模型都把 "provided that" 簡化成了「但」，丟掉了條件授予的法律含義——一般閱讀無傷大雅，但合約律師看了會皺眉。
DeepSeek V4 Pro 和 Flash 在術語使用上都合格（「非排他性 / 非獨占」都是法律上有效的譯法），但在條件結構保留上輸給了 GPT-5.4。

場景三：技術文檔含程式碼（EN → ZH）

原文：Set cache.ttl to a value greater than 0 (in seconds) to enable persistent caching. When cache.ttl=-1, the cache never expires. Note: enabling cache.strict_mode bypasses the TTL and forces revalidation on every request. See docs/caching.md for advanced patterns.

模型	譯文
DeepSeek V4 Pro	将 `cache.ttl` 设置为大于 `0` 的值（单位：秒）以启用持久缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 将绕过 TTL 并强制每次请求进行重新验证。请参阅 `docs/caching.md` 了解高级模式。
DeepSeek V4 Flash	将 `cache.ttl` 设置为大于 `0` 的值（以秒为单位）以启用持久缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 会绕过 TTL 并强制在每个请求时重新验证。高级模式请参见 `docs/caching.md`。
DeepSeek V3.2	将 `cache.ttl` 设置为大于 `0` 的值（单位为秒）以启用持久化缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 将绕过 TTL 设置，强制每次请求时重新验证缓存。高级用法请参阅 `docs/caching.md`。
GPT-5.4	将 `cache.ttl` 设置为大于 `0` 的值（以秒为单位）以启用持久缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 会绕过 TTL，并在每次请求时强制重新验证。有关高级模式，请参阅 `docs/caching.md`。
Claude Opus 4.7	将 `cache.ttl` 设置为大于 `0` 的值（单位：秒）可启用持久化缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 会绕过 TTL，并在每次请求时强制重新校验。进阶用法请参见 `docs/caching.md`。
Gemini 3 Pro Preview	将 `cache.ttl` 设置为大于 `0` 的值（以秒为单位）以启用持久化缓存。当 `cache.ttl=-1` 时，缓存永不过期。注意：启用 `cache.strict_mode` 会绕过 TTL 并在每次请求时强制重新验证。有关高级模式，请参阅 `docs/caching.md`。

🏆 冠軍：DeepSeek V3.2（4.8/5）

這是本次評測最意外的結果：上一代的 V3.2 居然在技術文檔場景上贏過了所有旗艦模型。原因是：

6 個模型都完美保留了 inline code（cache.ttl 等），這個環節沒拉開差距。
V3.2 的中文流暢度和技術術語使用最自然、最像對岸工程師真正寫出來的文檔。
V4 Pro / V4 Flash 反而在流暢度上各丟了一分——V4 Pro 的「强制每次请求进行重新验证」稍有生硬，Flash 的「在每个请求时」則是翻譯腔（自然表達是「每次请求」）。
我們的解讀：DeepSeek 家族的中文技術寫作能力早在 V3.2 就到位了，V4 的改進點可能在別處（推理、程式碼生成），在這個場景的自然度反而被推理鏈條「拉高了溫度」。

場景四：文學作品 · 魯迅《故鄉》（ZH → EN）

原文：我冒了严寒，回到相隔二千余里，别了二十余年的故乡去。时候既然是深冬；渐近故乡时，天气又阴晦了，冷风吹进船舱中，呜呜的响，从篷隙向外一望，苍黄的天底下，远近横着几个萧索的荒村，没有一些活气。

模型	譯文
DeepSeek V4 Pro	Braving the bitter cold, I traveled back to my hometown, which lay over two thousand li away and from which I had been separated for more than twenty years. It was deep winter. As I approached my hometown, the weather turned gloomy again. A cold wind blew into the cabin of the boat, moaning. Through a chink in the awning, I looked out and saw a few desolate, lifeless villages lying scattered far and near under a pale yellow sky.
DeepSeek V4 Flash	Braving the bitter cold, I traveled back to my hometown, a place I had left over twenty years ago, more than a thousand li away. It was already deep winter; as we drew near my hometown, the weather turned gloomy, and the cold wind whistled through the cabin. Peering out from under the awning, I could see a bleak, yellowish sky, with several desolate villages scattered horizontally here and there, devoid of any signs of life.
DeepSeek V3.2	Braving the bitter cold, I traveled more than seven hundred miles back to the old home I had left over twenty years ago. It was late winter. As we drew near my former home the day became overcast and a cold wind blew into the cabin of our boat, while all one could see through the chinks in our bamboo awning were a few desolate villages, void of any sign of life, scattered far and near under the somber yellow sky.
GPT-5.4	Braving the severe cold, I returned to my hometown, more than two thousand li away and from which I had been separated for more than twenty years. It was already deep winter; and as I gradually drew near my hometown, the weather grew overcast again. The cold wind blew into the cabin, making a whistling sound. Looking out through a crack in the awning, beneath the dusky yellow sky, there lay here and there a few desolate villages, barren and lifeless.
Claude Opus 4.7	Braving the bitter cold, I returned to the home I had not seen in over twenty years, more than two thousand li away. It was already deep winter; as I drew near, the sky grew overcast, and a cold wind moaned through the cabin. Peering out through a gap in the awning, I saw, beneath the sallow sky, a few desolate villages scattered near and far, without a trace of life.
Gemini 3 Pro Preview	Braving the bitter cold, I returned to my hometown, over two thousand li away, which I had left more than twenty years ago. It was late winter; as I drew closer to my hometown, the weather turned gloomy. Cold winds blew into the cabin, howling. Looking out through the chinks in the awning, under a pale yellow sky, scattered far and near lay a few desolate, barren villages, devoid of any sign of life.

🏆 冠軍：Claude Opus 4.7（4.7/5）

文學翻譯考的是語感、意象、節奏。Claude 的 "sallow sky"（蒼黃）是本場最講究的一個選詞，抓到了中文「蒼黃」特有的病態感。"moaned through the cabin"（嗚嗚的響）也比 "whistling" / "howling" 更貼近中文原文的淒涼感。
DeepSeek V4 Pro 表現也很好（4.6）——"moaning"、"chink in the awning" 都是文學性的選詞，證明 V4 在文學場景相較 V3 有實質提升。
DeepSeek V4 Flash 翻車最嚴重（3.7 分，墊底）：
- 把「二千余里」譯成 "more than a thousand li"——這是硬傷事實錯誤。
- "scattered horizontally" 直譯「橫著」，英文語感怪異。
DeepSeek V3.2 拿了 4.4 分，高度疑似重現了楊憲益夫婦的經典譯本（"seven hundred miles... void of any sign of life... bamboo awning" 基本是原話）——這在 SEO 上可能是「記憶」而不是「翻譯」，讀者自行判斷。

場景五：漫畫對白（JA → EN）

原文：「くそっ…まだだ、まだ終わっちゃいねぇ！」「お前はもう限界だろう。素直に引き下がれ。」「うるせぇ！守りたいもんがある限り、俺は何度でも立ち上がるんだよ！」

模型	譯文
DeepSeek V4 Pro	「Damn it... Not yet, it's not over yet!」「You're already at your limit. Just back down gracefully.」「Shut up! As long as I have something to protect, I'll get back up as many times as it takes!」
DeepSeek V4 Flash	"Dammit... Not yet, it's not over yet!" "You're at your limit. Just back down obediently." "Shut up! As long as there's something I want to protect, I'll get up as many times as it takes!"
DeepSeek V3.2	"Damn it... not yet, it's not over yet!" "You've already reached your limit. Just back down honestly." "Shut up! As long as there's something I want to protect, I'll get back up as many times as it takes!"
GPT-5.4	"Damn it... not yet, it's not over yet!" "You've already reached your limit. Back down quietly." "Shut up! As long as there's something I want to protect, I'll get back up as many times as it takes!"
Claude Opus 4.7	"Damn it... not yet—it's not over yet!" "You're already past your limit. Just back down, nice and quiet." "Shut up! As long as there's something I want to protect, I'll stand back up as many times as it takes!"
Gemini 3 Pro Preview	"Damn it... not yet, it's not over yet!" "You're already at your limit. Just back down." "Shut up! As long as I have something to protect, I'll stand up as many times as it takes!"

🏆 並列冠軍：GPT-5.4 / Gemini 3 Pro（皆 4.5/5）

本場有一個非常有意思的 DeepSeek V4 Pro 現象：

🚨 V4 Pro 把日文原文的直角引號「」原封不動保留到英文譯文裡。這是一個明顯的格式錯誤——翻譯到英文時應該換成英文的 "，而 V4 Pro 可能是推理階段「太認真」了，把「保留格式」錯誤地理解成了「保留引號字元」，結果 V4 Pro 在本場的格式分只有 2/5，綜合分墊底到 3.1。

這是一個值得 DeepSeek 團隊關注的 bug：reasoning 模型過度謹慎地保留源格式，把不該保留的標點也留下了。

其餘模型格式都正常。
句末助詞的處理上，Claude 的 "nice and quiet" 最貼近日文「素直に引き下がれ」那種「你就乖乖退下吧」的感覺，V4 Flash 和 V3.2 用了 "obediently/honestly" 字面直譯，英文讀起來有翻譯腔。
「うるせぇ！」所有模型都用了 "Shut up!"，合格。

三、延遲、tokens 與成本

模型	平均延遲	平均輸出 tokens	推理 tokens	特點
DeepSeek V4 Flash	4.7 s	247	174	淺推理，V4 家族 CP 值之選
DeepSeek V3.2	4.6 s	73	0	不推理，老牌穩定
GPT-5.4	4.5 s	85	0	不暴露推理，最均衡
Gemini 3 Pro Preview	14.2 s	844	767	重度推理，慢但穩
DeepSeek V4 Pro	17.1 s	562	488	重度推理，本場最慢
Claude Opus 4.7	—	—	—	本評測未走 API，資料按官方公布參考

幾個讀數：

DeepSeek V4 Pro 的延遲比 V4 Flash 慢約 4 倍，但綜合分並沒有更高（4.38 vs 4.38）——對絕大多數翻譯場景用 Flash 就夠了，Pro 只適合需要長程推理的複雜任務。
Gemini 3 Pro Preview 的推理代價最重（平均 767 推理 tokens），但品質回報確實在——綜合分第 3。
GPT-5.4 是延遲 / 品質最均衡的：4.5 秒延遲、不暴露推理消耗、綜合分第 1。

⚠️ 一個 Bun 測試的插曲：我們一開始用 bun 的 fetch 跑腳本，DeepSeek V4 的延遲一直顯示 170–250ms，離譜地快。換成 Node 的 fetch 後就恢復到 9–35 秒的合理區間。我們懷疑 Bun 在某些串流回應下對 performance.now() 的測量有異常。本文所有延遲資料都是 Node 實測。

四、選型建議：你到底該用哪個？

基於 6 模型實測，給出按場景選型的建議：

📜 法律合約、監管文書

首選：GPT-5.4。唯一能穩定保留「條件巢狀結構」的模型。條件句一錯，合約法律效力就變了。

🎓 學術論文、技術報告

三選一：GPT-5.4 / Gemini 3 Pro / DeepSeek V4 Pro 打成平手。如果你對成本敏感又是中文輸出，DeepSeek V4 Pro 在本場景 CP 值最高。

💻 中文技術文檔、API 手冊、Markdown

DeepSeek V3.2 / Flash 夠用。中文技術寫作 DeepSeek 家族一直在線，反而 V4 Pro 因為推理鏈路更長、措辭稍有生硬。這是一個「降級用更老版本省錢」的正回饋場景。

📖 文學翻譯、小說、散文

首選：Claude Opus 4.7。選詞、語感、意象最講究。DeepSeek V4 Pro 排第 2，這已經是 DeepSeek 在文學賽道歷史最高。DeepSeek V4 Flash 的字面直譯（"more than a thousand li" 的事實錯誤）就別用了。

🎌 日漫、輕小說、ACGN 內容

首選：GPT-5.4 / Gemini 3 Pro。DeepSeek V4 Pro 在這一場存在明確的「直角引號 bug」，在 DeepSeek 修復前不建議用於 JP → EN 漫畫譯製。

五、總結：DeepSeek V4 到底值不值得換？

✅ 值得換的情況：

你主要做中文文檔的學術 / 法律翻譯——V4 Pro 的綜合分和 GPT-5.4 只差 0.3，但價格是後者的一小部分。
你預算緊張、對延遲敏感——V4 Flash 以 4.7 秒的延遲拿到了和 V4 Pro 一樣的綜合分，是本評測的隱形贏家。
你在做長推理 / 複雜任務——V4 Pro 的推理鏈路相較 V3.2 有真實提升。

⚠️ 暫緩的情況：

你的核心場景是日漫 / 輕小說——等 DeepSeek 修復引號保留 bug 再說。
你做高端文學翻譯——Claude 和 V4 Pro 都能用，但 Claude 在選詞上還是更講究一檔。
你對「最穩」的要求 >「最便宜」——GPT-5.4 綜合分第 1，延遲和品質都最均衡。

在 BelinDoc 測試你自己的文檔

這篇評測用的是 5 個短片段。你的檔案可能更長、更特殊——合約裡夾雜條款編號、論文裡有公式和圖表、漫畫裡有頁邊註……短樣本的結論不一定 1:1 對映到你的真實場景。

所以最好的方式是：上傳自己的文檔，實際比較。

👉 點擊這裡，上傳 PDF / EPUB / Word 開始翻譯

BelinDoc 支援隨時切換翻譯模型，保留原文排版，一份檔案上傳一次就能多模型對比。

🔗 相關閱讀

[模型評測] 📊 GPT-5.2 文檔翻譯評測
[模型評測] 📊 Gemini 3 Pro 翻譯效能評測
[模型評測] 📊 GPT-5 vs Gemini 2.5 五場景橫評
[選型指南] 🎯 不同文檔場景的 AI 模型選型指南