The evolution of AI training paradigms: from centralized control to Decentralization collaboration

2025-07-09 09:11:04

Evolution of AI Training Paradigms: From Centralized Control to Decentralization and Collaborative Technological Revolution

In the full value chain of AI, model training is the most resource-intensive and technically challenging stage, directly determining the capability ceiling of the model and its practical application effects. Compared to the lightweight calls in the inference stage, the training process requires continuous large-scale computing power investment, complex data processing workflows, and high-intensity optimization algorithm support, making it the true "heavy industry" of AI system construction. From an architectural paradigm perspective, training methods can be divided into four categories: centralized training, distributed training, federated learning, and the focus of this article, Decentralization training.

Centralized training is the most common traditional method, where a single institution completes the entire training process within a local high-performance cluster, coordinating all components from hardware, underlying software, cluster scheduling systems, to training frameworks through a unified control system. This deeply collaborative architecture optimizes the efficiency of memory sharing, gradient synchronization, and fault tolerance mechanisms, making it very suitable for training large-scale models like GPT and Gemini, with advantages of high efficiency and controllable resources. However, it also faces issues such as data monopoly, resource barriers, energy consumption, and single-point risks.

Distributed training is the mainstream method for training large models, where the core idea is to decompose the model training tasks and distribute them to multiple machines for collaborative execution, in order to overcome the bottlenecks of single-machine computation and storage. Although it has "distributed" characteristics physically, it is still controlled, scheduled, and synchronized by a centralized institution, often operating in a high-speed local area network environment, utilizing NVLink high-speed interconnect bus technology, with the main node coordinating various sub-tasks uniformly. Mainstream methods include:

Data parallelism: Each node trains on different data with shared parameters, requiring matching model weights.
Model parallelism: Deploy different parts of the model on different nodes to achieve strong scalability;
Pipeline parallelism: Staged serial execution to improve throughput;
Tensor Parallelism: Refined segmentation of matrix computations to enhance parallel granularity.

Distributed training is a combination of "centralized control + distributed execution", analogous to the same boss remotely directing multiple "office" employees to collaborate on tasks. Currently, almost all mainstream large models are trained using this method.

Decentralization training represents a future path with greater openness and anti-censorship characteristics. Its core feature lies in: multiple untrusted nodes (which could be home computers, cloud GPUs, or edge devices) collaboratively completing training tasks without a central coordinator, usually driven by protocols for task distribution and collaboration, and using encryption incentive mechanisms to ensure the honesty of contributions. The main challenges faced by this model include:

Heterogeneous devices and partitioning difficulties: High difficulty in coordinating heterogeneous devices and low efficiency in task partitioning;
Communication efficiency bottleneck: Network communication is unstable, and gradient synchronization bottleneck is obvious;
Lack of Trusted Execution: The absence of a trusted execution environment makes it difficult to verify whether nodes are genuinely participating in the computation;
Lack of unified coordination: No central scheduler, task distribution and exception rollback mechanisms are complex.

Decentralization training can be understood as: a group of global volunteers each contributing computing power to collaboratively train models, but "truly feasible large-scale decentralization training" is still a systematic engineering challenge, involving multiple aspects such as system architecture, communication protocols, cryptographic security, economic mechanisms, and model verification. However, whether it can achieve "collaborative effectiveness + incentive honesty + correct results" is still in the early prototype exploration stage.

Federated learning, as a transitional form between distributed and Decentralization, emphasizes local data retention and centralized aggregation of model parameters, making it suitable for privacy-compliant scenarios (such as healthcare and finance). Federated learning has the engineering structure of distributed training and local collaborative capabilities, while also possessing the data dispersion advantages of Decentralization training. However, it still relies on trustworthy coordinating parties and does not have fully open and censorship-resistant characteristics. It can be seen as a "controlled Decentralization" solution in privacy-compliant scenarios, with relatively mild training tasks, trust structures, and communication mechanisms, making it more suitable as a transitional deployment architecture in the industry.

Decentralization training's boundaries, opportunities, and realistic paths

From the perspective of training paradigms, Decentralization training is not suitable for all types of tasks. In certain scenarios, due to the complexity of task structure, extremely high resource demands, or significant collaboration difficulties, it is inherently unsuitable for efficient completion among heterogeneous, trustless nodes. For example, large model training often relies on high memory, low latency, and high bandwidth, making it difficult to effectively partition and synchronize in an open network; tasks with strong data privacy and sovereignty restrictions (such as medical, financial, or classified data) are constrained by legal compliance and ethical constraints, preventing open sharing; and tasks lacking a foundation for collaborative incentives (such as enterprise closed-source models or internal prototype training) lack external participation motivation. These boundaries together constitute the current realistic limitations of Decentralization training.

However, this does not mean that Decentralization training is a false proposition. In fact, in lightweight, easily parallelizable, and incentivizable task types, Decentralization training shows clear application prospects. This includes, but is not limited to: LoRA fine-tuning, behavior alignment type post-training tasks (such as RLHF, DPO), data crowdsourcing training and labeling tasks, resource-controllable small foundational model training, and collaborative training scenarios involving edge devices. These tasks generally possess characteristics of high parallelism, low coupling, and tolerance for heterogeneous computing power, making them very suitable for collaborative training through P2P networks, Swarm protocols, distributed optimizers, and other methods.

Decentralization training classic project analysis

Currently, in the forefront of decentralized training and federated learning, the representative blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research, and Flock.io. In terms of technological innovation and engineering implementation difficulty, Prime Intellect, Nous Research, and Pluralis.ai have proposed many original explorations in system architecture and algorithm design, representing the forefront of current theoretical research; whereas the implementation paths of Gensyn and Flock.io are relatively clear, showing initial engineering progress. This article will sequentially analyze the core technologies and engineering architectures behind these five projects, and further discuss their differences and complementary relationships in the decentralized AI training system.

Prime Intellect: A pioneer in training trajectory verifiable reinforcement learning collaborative networks.

Prime Intellect is committed to building a trustless AI training network that allows anyone to participate in training and receive credible rewards for their computational contributions. Prime Intellect aims to create a verifiable, open, and fully incentivized AI Decentralization training system through the three major modules: PRIME-RL + TOPLOC + SHARDCAST.

Prime Intellect released INTELLECT-2 in May 2025, which is the world's first large-scale reinforcement learning model trained by asynchronous, trustless decentralized node collaboration, with a parameter scale of 32B. The INTELLECT-2 model was collaboratively trained by over 100 GPU heterogeneous nodes across three continents, using a fully asynchronous architecture, with a training duration of over 400 hours, demonstrating the feasibility and stability of asynchronous collaborative networks. This model is not only a breakthrough in performance but also represents the first systematic implementation of the "training is consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL (asynchronous training structure), TOPLOC (training behavior validation), and SHARDCAST (asynchronous weight aggregation), marking the first achievement of an open, verifiable, and economically incentivized closed-loop training process in decentralized training networks.

In terms of performance, INTELLECT-2 is based on QwQ-32B training and has undergone specialized RL training in code and mathematics, placing it at the forefront of current open-source RL fine-tuning models. Although it has not yet surpassed closed-source models like GPT-4 or Gemini, its true significance lies in the fact that it is the world's first fully trainable, reproducible, verifiable, and auditable Decentralization model experiment. Prime Intellect not only open-sourced the model, but more importantly, it open-sourced the training process itself - the training data, policy update trajectories, validation processes, and aggregation logic are all transparent and traceable, creating a decentralized training network prototype in which everyone can participate, trust in collaboration, and share profits.

Prime Intellect completed a $15 million seed round financing in February 2025, led by Founders Fund, with participation from several industry leaders including Menlo Ventures, Andrej Karpathy, Clem Delangue, Dylan Patel, Balaji Srinivasan, Emad Mostaque, and Sandeep Nailwal. Previously, the project completed a $5.5 million early round financing in April 2024, co-led by CoinFund and Distributed Global, with participation from institutions such as Compound VC, Collab + Currency, and Protocol Labs. To date, Prime Intellect has raised over $20 million in total.

The co-founders of Prime Intellect are Vincent Weisser and Johannes Hagemann, and the team members come from various backgrounds in AI and Web3. The core members hail from Meta AI, Google Research, OpenAI, Flashbots, Stability AI, and the Ethereum Foundation, possessing a deep capability in system architecture design and the practical implementation of distributed engineering. They are one of the very few operational teams that have successfully completed the training of a truly decentralized large model.

Pluralis: An Exploration of Paradigms for Asynchronous Model Parallelism and Structural Compression Collaborative Training

Pluralis is a Web3 AI project focused on "trustworthy collaborative training networks," with the core goal of promoting a model training paradigm that is decentralized, open to participation, and possesses long-term incentive mechanisms. Unlike the current mainstream centralized or closed training paths, Pluralis proposes a new concept called Protocol Learning: to "protocolize" the model training process, building an open training system with an endogenous incentive feedback loop through verifiable collaboration mechanisms and model ownership mapping.

Protocol Learning proposed by Pluralis includes three key pillars:

Unextractable Model (Unmaterializable Models): The model is distributed in fragments across multiple nodes, and no single node can restore the complete weights, maintaining closed-source. This design inherently makes the model a "protocol asset," enabling access credential control, leak protection, and profit attribution binding.
Model-parallel Training over Internet (: Through an asynchronous Pipeline model parallel mechanism (SWARM architecture), different nodes only hold partial weights and collaborate over a low bandwidth network to complete training or inference.
Ownership distribution model based on contribution )Partial Ownership for Incentives(: All participating nodes obtain partial ownership of the model based on their training contributions, thereby enjoying future profit sharing and governance rights of the protocol.

Pluralis clearly defines "asynchronous model parallelism" as its core direction, emphasizing the following advantages over data parallelism:

Support low bandwidth networks and non-consistent nodes;
Adapt to heterogeneous devices, allowing consumer-grade GPUs to participate;
Naturally possesses elastic scheduling capability, supporting frequent online/offline of nodes;
Three major breakthroughs: structural compression + asynchronous updates + non-extractable weights.

Currently, based on the six technical blog documents published on the official website, the logical structure is integrated into the following three main lines:

Philosophy and Vision: "A Third Path: Protocol Learning" "Why Decentralized Training Matters"
Technical Mechanism Details: 《SWARM Parallel》《Beyond Top-K》《Asynchronous Updates》
Institutional Innovation Exploration: 《Unmaterializable Models》《Partial Ownership Protocols》

Currently, Pluralis has not launched its products, test network, or open-sourced code. The reason lies in the extremely challenging technical path it has chosen: it must first resolve system-level issues such as underlying system architecture, communication protocols, and non-exportable weights before it can encapsulate product services.

In a new paper published by Pluralis Research in June 2025, its Decentralization training framework will be pre-trained from the model.

PRIME3.19%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

20 Likes