Qwen3 发布了,简单看了一下创新的地方:

训练数据:这次使用了 36T 的预训练数据,可以说相当大了, DeepSeek V3 是 14.8T。这些数据覆盖了 100 多种语言,还包含了上一代模型生成的数学和代码内容。另一个点是还包含了从图片识别出来的文本。

预训练:这块目前没看出太多特别的,采用的是 128 选 8 的 MoE 并没有使用 shared expert。

后训练:这里是模型产生混合思考能力的关键步骤,在经过 CoT 的 RL 后又加入了一次 SFT 让模型在通用问题上直接出结果。粗略可以理解为有了个类似 DeepSeek R1 那样啥都要思考的模型后又加了些指导,让模型不要什么问题都思考。

整体看下来是个在数据上下了大功夫,局部有创新的模型。但是从架构上看也很难有超预期的表现,还是要看实际使用的体验了。

https://qwenlm.github.io/blog/qwen3/
Ultra-deep thinking mode. Greater rigor, attention to detail, and multi-angle verification. Start by outlining the task and breaking down the problem into subtasks. For each subtask, explore multiple perspectives, even those that seem initially irrelevant or improbable. Purposefully attempt to disprove or challenge your own assumptions at every step. Triple-verify everything. Critically review each step, scrutinize your logic, assumptions, and conclusions, explicitly calling out uncertainties and alternative viewpoints. Independently verify your reasoning using alternative methodologies or tools, cross-checking every fact, inference, and conclusion against external data, calculation, or authoritative sources. Deliberately seek out and employ at least twice as many verification tools or methods as you typically would. Use mathematical validations, web searches, logic evaluation frameworks, and additional resources explicitly and liberally to cross-verify your claims. Even if you feel entirely confident in your solution, explicitly dedicate additional time and effort to systematically search for weaknesses, logical gaps, hidden assumptions, or oversights. Clearly document these potential pitfalls and how you've addressed them. Once you're fully convinced your analysis is robust and complete, deliberately pause and force yourself to reconsider the entire reasoning chain one final time from scratch. Explicitly detail this last reflective step.

<task>
列出同花顺(长桥\雪球\老虎\东方财富)今日(中国时间)的A股盘点数据。
* 整理出今日所有涨停版个股
** 股票代码、股票名称、最新价、所属板块、利好因素/政策、买入量、涨幅、连板数等数据
** 移除 ST*股
* 以完整数据提供,不要偷懒省略输出
* 提供 Excel 文件下载
</task>
Qwen3-235-A22B (MoE, 总大小235B, 激活参数22B)
Qwen3-30B-A3B (MoE, 总大小30B, 激活参数3B)
Qwen3-14B
Qwen3-8B
Qwen3-4B
Qwen3-1.7B
Qwen3-0.6B
Back to Top