Abstract
Large Language Models (LLMs) have revolutionized natural language processing and artificial intelligence (AI) applications. Despite remarkable progress, scaling LLMs further is neither sustainable nor sufficient to meet the growing demands for quality, speed, and domain specificity. LLMs cater to solving for breadth of knowledge and general intelligence rather than depth of knowledge within a specific domain.
This paper argues that the future of LLM development lies in combining domain-specific foundational models with cognitive search capabilities. Using robotics, biotechnology, and physics as case studies, it demonstrates how specialization and enhanced retrieval mechanisms can overcome the inherent limitations of generalist LLMs to make progress and unlock new revenue streams. These advances can ensure efficiency, relevance, and depth in addressing complex challenges across disciplines.
Introduction to ASI<Train/>
The advent of LLMs, such as OpenAI's GPT series, Meta's Llama, Anthropic's Claude, and Google's Gemini, has fundamentally changed how AI interacts with humans and has made AI assistants mainstream. By training on vast, diverse datasets, these models achieve general-purpose utility. However, this generality is also their limitation.
More sustainable solutions exist than scaling LLMs by increasing their training parameters and datasets. The path forward lies in developing domain-specific foundational models that prioritize specialized knowledge and cognitive search to enhance retrieval and synthesis capabilities. ASI<Train/> is a Web3 platform that is one of the first to target the training of domain-specific foundational models to solve monumental challenges within such domains and democratize access and ownership of advanced AI capabilities from these models.
Why is ASI<Train/> needed?
In our view, generalist LLMs will co-exist with domain-specific foundational models because of the limitations of the LLMs.
LLM limitations
Generalist limitations for domain-specific challenges
Contextual Depth: They often fail to address domain-specific nuances, producing outputs that lack precision or are factually incorrect in specialized fields.
Relevance: Responses may be verbose or irrelevant due to overgeneralized training datasets.
Biases: Training on diverse datasets introduces biases, which can be detrimental in critical domains like biotechnology or physics.
Multimodality: While LLMs are expanding into multimodality, they struggle to scale across all possible modalities due to the vast diversity of data and challenges in aligning outputs consistently across different types.
Scaling limitations with diminishing returns
As model sizes grow, performance improvements on benchmarks like GLUE or SuperGLUE plateau. Studies, such as Kaplan et al.’s work on scaling laws (2020), show that larger models require exponentially extensive resources for marginal gains. This diminishing return is especially evident in tasks requiring domain expertise, where generalist LLMs produce surface-level responses.
Unsustainable resource requirements
Scaling LLMs is computationally expensive and environmentally unsustainable:
OpenAI’s GPT-3 reportedly consumed 1,287 MWh of energy during training, equivalent to the annual electricity consumption of 120 U.S. homes (Patterson et al., 2021).
Inference latency increases with model size, making real-time applications like robotics and biotech simulations impractical.
Case for domain-specific foundational models
Domain-specific foundational models are pre-trained and optimized for a particular discipline. Unlike generalist LLMs, these models use curated datasets, specialized architectures, and require domain expertise to address the unique challenges. Advantages of specialization can be attributed to three areas:
1. Enhanced accuracy
Domain-specific foundational models demonstrate remarkable potential within their respective domains because of unparalleled accuracy and precision in complex applications. For example,
In robotics, models trained on robotic process datasets can produce accurate kinematics or path optimization predictions.
In biotech, foundational models can analyze protein structures or genomic sequences with unmatched precision.
2. Efficiency gains
Accrue from dealing with a significantly smaller subset of domain data rather than an entire corpus of internet data. Specifically, efficiency gains include:
Smaller, specialized models require fewer computational resources.
Real-time applications like robotics control systems benefit from reduced latency.
3. Tailored inputs and outputs
Specialized models can integrate with the regulatory, ethical, and practical requirements of the domains. For example,
Models related to robotics need to adhere to safety requirements in a human-robot hybrid environment.
Models related to drug discovery must conform to ethical and regulatory requirements from local governing bodies.
Models related to quantum mechanics should reflect fundamental aspects of quantum reality, including entanglement, superposition, measurement effects, quantum decoherence, as well as practical considerations like material limitations and the impact of observation.
What is ASI<Train/>?
An agent-based training framework for specialist foundation models
As industries increasingly adopt specialized AI models, traditional centralized inference serving systems face scalability, cost, and latency challenges. By leveraging Fetch.ai’s Autonomous Inference Model (AIM) Agents and FET tokenomics, ASI<Train/> can overcome these limitations and provide a decentralized, incentivized, and efficient framework for inference serving across diverse domains such as robotics, material science, molecular design, and physics simulations.
Framework components
AIM agents
AIMs are lightweight, autonomous software agents that host and manage trained ASI<Train/> models.
They are specialized for tasks in distinct domains, e.g., robotics, drug design, or physics simulations.
Inference nodes
Distributed network nodes where AIMs are deployed.
Inference nodes process user requests, ensuring low-latency inference serving.
Validation nodes
Validate inference outputs to ensure accuracy and quality.
Validators are rewarded in FET tokens for successful and fair validations.
FET token system
Facilitates transactions between users, AIMs, and validators.
Rewards are distributed based on computational resources, model quality, and inference accuracy.
Dynamic subnetworks
AIMs form dynamic subnetworks based on domain-specific requirements, optimizing resource allocation and communication.
An AIMs-based inference serving layered architecture
Application layer
Hosts user-facing interfaces for submitting inference requests.
Manages subscription plans and integrates payment systems for FET transactions.
Agent layer (AIMs)
Each AIM agent specializes in hosting domain-specific ASI<Train/> models (e.g., robotics, PINNs, molecular design).
AIMs handle model deployment, request processing, and result generation.
Resource management layer
Allocates computational resources across the network, optimizing workload distribution for inference nodes.
Implements demand-based scaling to match network capacity with request volume.
Validation layer
A decentralized layer of validators ensures inference accuracy and model performance.
Uses reputation systems to prioritize validators with a history of reliable assessments.
Blockchain layer
Secures all transactions, model updates, and validation records using Fetch.ai’s blockchain.
Ensures transparency and immutability of FET token flows and inference outcomes.
Workflow for inference serving
Inference request submission Users submit requests through the Application Layer, specifying the desired model and parameters.
Request routing via AIM agent The Resource Management Layer routes the request to the most suitable AIM agent based on performance metrics and computational availability.
Model execution The AIM agent processes the request using the hosted ASI<Train/> model and generates the inference output.
Validation Validation nodes assess the quality of the inference result, ensuring correctness and reliability.
Result delivery and reward distribution
Results are returned to the user.
FET tokens are distributed among AIMs, validators, and node operators based on contribution and performance.
Additional features of the framework
Interoperability across domains - Fetch.ai AIMs enable seamless integration of diverse domain-specific models, fostering collaboration across industries.
Real-time collaboration - Decentralized AIMs allow researchers and industries to collaborate and share models without relying on centralized infrastructures.
On-demand scalability - Dynamic subnetworks ensure the system scales efficiently based on real-time demand for domain-specific inference services.
Incentivization for specialization - FET rewards encourage the creation and maintenance of high-quality, specialized models, ensuring continuous innovation.
Enhanced tokenomics model
Multi-tier reward system
The tokenomics are structured to incentivize various stakeholders in the network, ensuring fairness and fostering collaboration.
Model providers (AIMs)
Revenue Model: Earn FET tokens based on the number and quality of inference requests served.
Performance Bonuses: Additional FET rewards for high accuracy, low latency, and model optimization.
Validators
Primary Rewards: Validators assess inference outputs for accuracy and reliability and are rewarded with FET for successful validations.
Staking Incentives: Validators stake FET tokens to ensure commitment and are penalized for invalid or biased validations.
Users
Pay-per-Inference: Users pay in FET tokens based on the complexity and resource intensity of their requests.
Subscription Model: Users can subscribe to domain-specific subnets for frequent access to inference services at a reduced cost.
Network operators
Infrastructure Incentives: Nodes providing computational resources to AIMs are compensated in FET for hosting and maintaining the network.
Dynamic pricing mechanism
Demand-based pricing: FET token costs for inference services fluctuate based on demand and resource availability within the network.
Complexity-based pricing: Inference requests with higher computational requirements are priced higher, ensuring equitable distribution of resources.
Governance and staking
Governance tokens: A portion of staked FET tokens can be converted to governance tokens, enabling participants to vote on subnet parameters, model priorities, and token distribution policies.
Revenue redistribution: A percentage of FET fees is reinvested into the network for research, development, and community growth.
Sustainability funds
Green incentives: A sustainability fund ensures that a portion of FET tokens is allocated to optimizing energy efficiency and supporting environmentally friendly computation.
First candidate model for ASI<Train/>
Cortex: a brain inspired robotics model
For decades, humanity has envisioned robots working alongside people, enabling humans to focus on solving complex problems. However, robots come in many forms—not just humanoid figures. From robotic arms used for item picking in warehouses and robotized assembly lines to advanced medical robots, the field of robotics is vast and diverse. As of now, the global robotics market is expected to reach $165 billion by 2029, with the robotics software market projected to grow to $80 billion by 2032.
The rise of artificial intelligence (AI) has unlocked unprecedented opportunities in robotics. Traditional robotics relied on heuristic methods and preprogrammed instructions, but AI introduces adaptive and learning-based algorithms, enabling robots to better understand and interact with their environments. Despite this progress, creating AI systems capable of processing multiple information sources and navigating the physical world remains a significant challenge.
Recent advancements in Large Language Models (LLMs) have expanded the horizons of robotics. Multimodal LLMs, capable of processing both text and images, open up possibilities for robotics systems that combine environmental visual data with internal states. These advancements pave the way for robots that are not only reactive but also contextually aware and adaptive.
Building on these developments, we plan to take inspiration from existing models as practical starting points. Google has been a leader in this area, releasing RT-1 and RT-2, with the closed-source RT-2-X representing a significant leap in capability. The open-source community has also made strides, such as developing the OpenVLA model in an effort to replicate RT-2-X. The ASI team intends to start with OpenVLA as a foundation and innovate further, creating the next generation of robotic control models inspired by how the human brain operates.
Brain-inspired robotics hold immense potential to transform industries such as manufacturing, healthcare, and hospitality. Collectively, these industries stand to gain an annual financial impact of $1 billion to $3 billion from such advancements.
Future directions
Science domains being explored
Novel material discovery
Challenge: Accurately predicting material properties is a critical yet challenging task in the field of inorganic materials design. While quantum mechanical approaches like Density Functional Theory (DFT) provide a strong theoretical foundation for understanding material behavior at the atomic and subatomic levels, they are computationally intensive and highly dependent on specific functionals. This makes large-scale material screening impractical and creates a bottleneck in advancing high-growth industries such as batteries, semiconductors, and nanotechnology.
Solution: The ASI<Train/> initiative will leverage artificial intelligence (AI) to predict material properties with high accuracy while drastically reducing computational demands. By training AI models on large datasets of DFT-calculated and experimentally verified material properties, ASI<Train/> will accelerate high-throughput screening processes. This integration of AI and physics-based methods should complement traditional approaches, enabling efficient and scalable new material discovery.
Impact: By overcoming the computational limitations of traditional methods, initiatives like ASI<Train/> will drive breakthroughs in materials design, particularly in energy storage, electronics, and nanotechnology. These advancements are pivotal in addressing global challenges such as clean energy and sustainability while contributing to the rapid growth of markets like batteries (projected to reach $680 billion by 2033) and advanced materials (expected to grow to $115 billion by 2030).
Molecule design for drug discovery
Challenge: Accurately predicting effective drug molecules is a critical yet challenging task in pharmaceutical drug discovery. A key bottleneck is molecular docking—a computational method used to identify molecules that bind effectively to target proteins and influence disease progression. However, molecular docking is complicated by the intricate nature of biological interactions and the vast chemical space of potential drug candidates, which consists of billions of possible molecules.
Solution: Data-driven approaches, particularly AI-based models like DiffDock, offer a promising solution to these challenges. These machine learning models leverage advanced algorithms to more accurately predict protein-ligand interactions by accounting for the flexibility of the molecules. By improving the efficiency and accuracy of molecular docking, models like DiffDock can significantly accelerate the identification of viable small-molecule candidates. The ASI team is committed to scaling up these AI models to enhance their robustness and versatility in discovering high-potential drug candidates. This involves implementing strategies such as expanding training datasets to encompass a broader diversity of molecular structures and protein interactions, refining algorithms and model architectures, leveraging state-of-the-art computational resources, and improving scoring functions and conformational sampling techniques. This will enhance the robustness and versatility of AI tools in discovering high-potential drug candidates.
Impact: By accelerating the identification of viable small-molecule candidates, scaling up AI-based models like DiffDock can significantly reduce the time and cost associated with drug discovery. Currently, developing a single new drug ranges from $1 billion to $2 billion due to extensive research, clinical trials, and regulatory hurdles. This efficiency not only reduces financial burdens but also enables researchers to allocate more resources toward exploring complementary therapeutic modalities and tackling diseases with diverse underlying causes. Furthermore, these advancements can help the pharmaceutical industry meet the growing demands of a market valued at $1.5 trillion, with an expected compound annual growth rate of over 6%. Ultimately, the ASI team's efforts will contribute significantly to advancing computational tools in medicine, bridging the gap between discovery and delivery, and bringing life-saving treatments to patients faster and more cost-effectively.
Physics-Informed Neural Networks (PINNs)
Challenge: Solving complex partial differential equations (PDEs) in real-time for physics simulations, fluid dynamics, structural analysis, etc.
Solution: AIMs host PINNs trained for specific physics problems, enabling decentralized inference serving for industries like aerospace, automotive, and civil engineering.
Impact: Efficient simulations reduce computational costs and enable faster prototyping and testing.
Environmental modeling
Challenge: Accurate climate modeling requires handling large-scale, dynamic datasets for phenomena like weather prediction or ocean simulations.
Solution: AIMs host domain-specific PINNs trained on climate datasets, providing real-time predictions and analyses.
Impact: Supports sustainable decision-making and disaster preparedness.
Personalized medicine
Challenge: Identifying personalized treatment options based on patient genomics and phenotypic data.
Solution: AIMs use ASI-trained models to analyze individual patient data and suggest tailored therapeutic options.
Impact: Enhances treatment efficacy and patient outcomes.
Models being explored
ChemBERTa
Description: A transformer-based model for molecular property prediction, inspired by BERT (Bidirectional Encoder Representations from Transformers).
Applications: Predicting bioactivity, toxicity, and other chemical properties of organic molecules.
Key Model Features:
Leverages SMILES (Simplified Molecular Input Line Entry System) for molecular representation.
Employs self-supervised pretraining on a curated dataset of 77M SMILES from PubChem.
Offers competitive performance on MoleculeNet benchmarks.
DiffSBDD (Diffusion-Based Structure-Based Drug Design)
Description: An SE(3)-equivariant diffusion model for structure-based drug design (SBDD), enabling the generation of molecular structures conditioned on protein binding pockets.
Applications: Generating novel ligands for drug discovery, property optimization, explicit negative design, and partial molecular design with inpainting.
Key Model Features:
Formulates SBDD as a 3D-conditional generation task.
Leverages diffusion models to capture the statistical properties of natural ligands.
Allows for additional constraints to optimize generated candidates for specific computational metrics.
Supports a flexible framework for various drug design challenges without retraining.
Alphafold3
Description: AlphaFold 3 is an advanced AI model developed by Google DeepMind and Isomorphic Labs for predicting the 3D structures of protein complexes, including interactions with DNA, RNA, ligands, ions, and chemical modifications.
Applications: Facilitates the study of biomolecular interactions, aiding in drug discovery and molecular biology research.
Key Model Features:
Predicts structures of protein complexes with nucleic acids and small molecules.
Employs the Pairformer architecture for enhanced interaction modeling.
Incorporates a diffusion model to refine 3D structural predictions.
MatBERT
Description: A transformer-based language model framework for materials property prediction, combining human-readable descriptions of crystal structures with AI-powered insights.
Applications: Predicting material properties, explaining structure-property relationships, and facilitating materials design and education.
Key Model Features:
Utilizes transformer models pretrained on 2 million peer-reviewed articles.
Represents materials with text-based descriptions incorporating chemical composition, crystal symmetry, and site geometry.
Demonstrates high accuracy even in ultra-small data scenarios.
Provides transparent and interpretable predictions via local explainability techniques.
CGCNN (Crystal Graph Convolutional Neural Network)
Description: A graph-based neural network framework for predicting material properties directly from the atomic connections in crystalline structures.
Applications: Property prediction, discovering new inorganic materials, and deriving empirical design rules.
Key Model Features:
Utilizes crystal graphs to represent atomic connections, ensuring universality across crystal types.
Predicts diverse properties such as formation energy, elasticity, and electronic characteristics.
Trained on 10⁴ data points to achieve high accuracy for density functional theory (DFT) properties.
Provides interpretability by attributing global properties to local chemical environments.
Facilitates empirical insights for materials design, demonstrated with perovskite materials.
GeoCGNN (Geometric-information-enhanced crystal graph neural network)
Description: A geometric-information-enhanced graph neural network for predicting the properties of crystalline materials by incorporating full topological and spatial geometric structure information.
Applications: Accurate prediction of properties like formation energy and band gap for crystalline materials.
Key Model Features:
Encodes geometric information using distance vectors and mixed basis functions.
Learns both topological and spatial structural details for comprehensive material representation.
Outperforms existing GNN models (e.g., CGCNN, MEGNet, iCGCNN) in prediction accuracy across multiple datasets.
Achieves significant improvements: 25.6% for formation energy and 27.6% for band gap compared to CGCNN.
DeePMD (Deep Potential Molecular Dynamics)
Description: A deep neural network-based method for learning accurate many-body potential energy surfaces, designed for molecular dynamics simulations.
Applications: Simulating the dynamics of both organic and inorganic molecules in materials and chemical systems.
Key Model Features:
Trained with ab initio data to accurately model interatomic forces and potential energy surfaces.
Preserves natural symmetries of the molecular system, ensuring physical consistency.
Scales efficiently with system size, providing results indistinguishable from original data.
Applicable to a wide range of systems, from small molecules to bulk materials.
G-SchNet
Description: A generative neural network designed to create 3D molecular structures while preserving rotational invariance, enabling the generation of molecules with targeted quantum-chemical properties.
Applications: Molecular generation for quantum chemistry, including the design of molecules with specific electronic properties such as a small HOMO-LUMO gap for organic solar cells.
Key Model Features:
Captures 3D geometry and electronic property relationships.
Respects rotational invariance to distinguish spatial isomerism and non-bonded interactions.
Approximates the distribution of equilibrium molecular structures using spatial and chemoinformatics metrics.
Guides molecular generation toward desired properties by biasing the generator's output distribution.
DiffDock
Description: A diffusion generative model for molecular docking that predicts ligand binding structures on a non-Euclidean manifold of poses.
Applications: Drug discovery and design by accurately predicting small molecule-protein binding structures.
Key Model Features:
Frames docking as a generative task, modeling ligand poses in translational, rotational, and torsional degrees of freedom.
Achieves state-of-the-art performance with a 38% top-1 success rate (RMSD < 2Å) on PDBBind, surpassing traditional (23%) and deep learning (20%) methods.
Maintains superior accuracy (21.7%) on computationally folded protein structures compared to prior methods (10.4%).
Offers fast inference times and confidence estimates with high selective accuracy.
Impact vs training effort for the models being explored
Tokenomics and framework
Enhanced tokenomics for multi-tiered contributions
Introduce differentiated FET rewards for tasks like training, hosting, validating, and improving models.
Allow staking mechanisms to ensure long-term engagement and quality assurance.
Adaptive subnetworks
Implement AI-driven subnet formation for optimal resource allocation based on demand patterns and model complexity.
Integration with edge computing
Extend AIMs to edge devices, enabling localized and low-latency inference for applications like autonomous vehicles and IoT.
Ethical and regulatory compliance
Develop mechanisms to ensure hosted models meet ethical guidelines and domain-specific regulations, particularly in sensitive areas like drug discovery and climate modeling.
Conclusion
The enhanced ASI<Train/> x ASI framework combines Fetch.ai’s AIM-based architecture with domain-specific models to provide an efficient, scalable, and incentivized system for decentralized inference serving. By expanding use cases in the domains of novel material discovery, molecule design for drug discovery, physics-informed neural networks (PINNs), environmental model, and personalized medicine, this framework addresses critical challenges across industries, enabling innovation while maintaining accessibility and sustainability. This approach not only democratizes advanced AI capabilities but also fosters a collaborative, decentralized ecosystem poised to tackle global challenges effectively.
Last updated