๐Ÿ” Search

Open
๐Ÿ”ฅ DeepSeek unveils open math LLM rivaling Google’s Deep Think results

๐Ÿ”ฅ DeepSeek unveils open math LLM rivaling Google’s Deep Think results

Alibaba wins best paper, Stanford's Open Research System, Anthropic fixes Claude, OpenAI's breach, Alibaba's Image Model
Stay updated with today's top AI news, papers, and repos.
Signup | Work With Us | Follow on X |Read on Web
AlphaSignal Logo

Hey James,

Your daily briefing is ready. You can finally take a break from the AI firehose.

Our algos spent the night splitting signal from noise and pulled the top news, models, papers, and repos.

Here's the must-read:

Summary

Read time: 3 min 21 sec

Top News

▸ DeepSeek releases open-source math model achieving gold-level results in IMO 2025

Top Paper

▸ Alibaba's Qwen wins NeurIPS Best Paper for gated attention breakthrough

Top News

▸ Berkeley and Stanford researchers present a faster open DeepResearch system

Signals

▸ Anthropic adds new feature to stop Claude conversations from collapsing
▸ OpenAI confirms third-party Mixpanel breach leaking restricted API analytics fields
▸ Anthropic introduces a feature-driven workflow to keep agents consistent
▸ Alibaba's Tongyi presents an open-source photorealistic image model
▸ Perplexity rolls out cross-model memory that stores preferences and recent history
Top Repo
DeepSeek launches open-source weights for a 685B math model that solves formal competition-level problems
10,493 Likes
Grok 4 Fast Benchmark

DeepSeekMath-V2 addresses a core limitation in mathematical LLMs: final-answer accuracy does not ensure correct reasoning. Many models guess well but fail to maintain valid intermediate steps. DeepSeek solves this with a two-part training system that forces the model to generate, check, and revise full proofs.

The architecture pairs a high-capacity generator with a dedicated verifier that scores each step of a derivation. The generator rewrites its proofs until the verifier accepts them, using verifier feedback as a reward.

DeepSeek scales verification compute to label difficult proofs, which strengthens the verifier as the generator improves. This pipeline enables the 685B-parameter model to score 118/120 on a Putnam-style benchmark and reach gold-level performance on IMO 2025 and CMO 2024.

Key features and results:

  • Verifier scores correctness across complete proof traces.

  • Generator optimizes proofs using verifier-based rewards.

  • Scaled verification compute labels hard proofs.

  • Achieves 118/120 Putnam and gold-level IMO/CMO results.

You can download the ~700 GB Apache 2.0 weights and run them on distributed multi-GPU setups.

TRY NOW
Top Paper
Alibaba's Qwen earns NeurIPS Best Paper for showing sigmoid-gated SDPA improves scaling and stability
2,529 Likes
sigmoid-gated SDPA improves scaling

Alibaba's Qwen team arrives at NeurIPS 2025 with a focused question: can a small architectural change fix long-standing attention issues? Transformers depend on Scaled Dot-Product Attention, but it often produces activation spikes and "attention-sink" tokens that distort long-context behavior.

The insight arrives when a head-specific, query-dependent sigmoid gate added after attention starts outperforming every other variant. That single modification reshapes the attention output with a small nonlinear adjustment and introduces sparse, per-query scaling.

It stabilizes training, scales reliably from 1.7B to 15B parameters, and improves long-context handling.

Key results:

  • Consistent performance gains, including a 0.2 perplexity improvement across benchmarks.

  • Reduced activation spikes and mitigation of attention-sink tokens.

  • Stable training under larger learning rates across dense and MoE models.

  • Verified performance across models trained on 3.5T tokens.

LEARN MORE
Top News
Berkeley and Stanford researchers introduce DeepScholar, a LOTUS-powered system that accelerates large-scale research synthesis
2,539 Likes
Grok 4 Fast Benchmark

The researchers set out with a simple question: can open systems match the research-synthesis engines locked behind major labs? Real research workflows often need summaries built from hundreds of papers, and most tools slow down or lose structure when you push them that far.

That gap set the stage for DeepScholar, an openly accessible system that processes huge document sets and delivers structured long-form reports at 2× the speed of OpenAI's DeepResearch.

The key insight came from LOTUS, a semantic query engine that treats unstructured text the way databases treat rows and tables. By turning "read these 200 papers" into optimized operators, the team removed redundant LLM calls and kept the pipeline predictable.

Features

  • Handles dozens to hundreds of documents with consistent throughput across steps.

  • Uses semantic operators to filter, cluster, and analyze text at scale.

  • Achieves 100×-400× pipeline optimizations through LOTUS rewrites.

  • Generates long-form cited summaries from live web retrieval.

  • Scores competitively on DeepScholar-bench across synthesis, retrieval, and verifiability.

TRY NOW
Signals
1 Anthropic updates Claude to auto-compact older messages and prevent context limit interruptions mid-chat 3,183 Likes
2 OpenAI clarifies the Mixpanel breach, noting only basic profile information was exposed 2,159 Likes
3 Anthropic presents an initializer-plus-coder harness that keeps Claude agents stable across many context windows 3,259 Likes
4 Alibaba's Tongyi unveils Z-Image, an open-source compact model matching larger generators on realism and text rendering 1,484 Likes
5 Perplexity introduces assistant memory that securely stores preferences and retrieves them for personalized answers 1,038 Likes
Looking to promote your company, product, service, or event to 250,000+ AI developers?
WORK WITH US
unsubscribe_me(): return True
{"AlphaSignal": "214 Barton Springs Rd, Austin, USA"}

Post a Comment

0 Comments

Users_Online! ๐ŸŸข

FOUNDER/AUTHOR

FOUNDER/AUTHOR VHAVENDA I.T SOLUTIONS