68 articles tagged Deep Learning
Research
📄 Towards Data Science

Neuro-Symbolic Fraud Detection: Catching Concept Drift Before F1 Drops (Label-Free)

A recent development in neuro-symbolic AI for fraud detection explores the use of symbolic rules embedded within neural networks to monitor concept drift at inference time without relying on labeled data. Specifically, the model encodes fraud detection rules, such as a V14 threshold indicating fraud, and investigates whether deviations in these rules can serve as early warning signalsacting as a "canary"to detect shifts in fraud patterns before a decline in model performance (e.g., F1 score) occurs. This approach leverages hybrid architectures that combine domain knowledge with neural learning, enabling real-time, label-free monitoring of model

Deep Learning
Read More
Research
📄 Towards Data Science

How a Neural Network Learned Its Own Fraud Rules: A Neuro-Symbolic AI Experiment

A novel neuro-symbolic AI approach has been developed that enables neural networks to autonomously discover interpretable rules, rather than relying on human-crafted rules. By integrating a differentiable rule-learning module into a hybrid neural network, the system was able to extract IF-THEN fraud detection rules during training on the Kaggle Credit Card Fraud dataset, which has a 0.17% fraud rate. This advancement demonstrates the potential for neural networks to enhance transparency and interpretability in complex tasks like fraud detection by autonomously deriving logical rules, thereby reducing reliance on manual rule specification. The learned rules, such as

Deep Learning
Read More
Research
📄 Towards Data Science

Optimizing Token Generation in PyTorch Decoder Models

The article discusses a novel technique for optimizing GPU performance in deep learning workflows by hiding host-device synchronization delays through CUDA stream interleaving. This approach allows for more efficient token generation in PyTorch decoder models by overlapping data transfer and computation, thereby reducing latency and improving throughput in large-scale neural network training and inference.

NVIDIA Deep Learning
Read More
Research
📄 Towards Data Science

AI in Multiple GPUs: Gradient Accumulation & Data Parallelism

The article introduces methods to implement gradient accumulation and data parallelism in PyTorch from scratch, enabling efficient training across multiple GPUs. These techniques allow for larger batch sizes and improved resource utilization by aggregating gradients over multiple iterations and distributing computations, respectively, thereby enhancing the scalability and performance of deep learning models.

Deep Learning
Read More
Technology
📄 MarkTechPost

The Statistical Cost of Zero Padding in Convolutional Neural Networks (CNNs)

Zero padding is a fundamental technique in convolutional neural networks (CNNs) that involves adding zero-valued pixels around the borders of an input image. This approach enables convolutional kernels to process edge pixels effectively and helps maintain the spatial dimensions of feature maps, preventing excessive shrinking after multiple convolutional layers. By controlling the amount of padding, researchers and engineers can preserve important spatial information and facilitate the construction of deeper, more complex neural network architectures. Recent analyses highlight the trade-offs associated with zero padding, particularly its impact on the statistical cost and computational efficiency of CNNs. While zero padding allows for better feature

Deep Learning
Read More
Research
📄 Towards Data Science

Teaching a Neural Network the Mandelbrot Set

Fourier features have emerged as a transformative technique in neural network architectures, significantly enhancing the ability of models to learn complex, high-frequency functions by mapping input data into a Fourier basis before processing. This approach addresses limitations in traditional neural networks related to spectral bias, enabling more accurate and efficient representations of intricate patterns such as fractals like the Mandelbrot set, and paving the way for advancements in tasks requiring detailed function approximation and signal processing.

Deep Learning
Read More
Research
📄 Towards Data Science

Breaking the Hardware Barrier: Software FP8 for Older GPUs

Feather introduces a software-based FP8 emulation technique that enables older RTX 30 and 20 series GPUs to overcome memory bandwidth limitations in deep learning workloads. By employing bitwise packing to emulate FP8 precision, this approach achieves nearly fourfold (3.3x measured) improvements in data transfer efficiency, effectively mitigating the memory bottleneck without requiring costly hardware upgrades. This development broadens access to efficient deep learning processing on existing GPU infrastructure, leveraging software solutions to extend hardware longevity and performance.

NVIDIA Deep Learning
Read More
Research
📄 Towards Data Science

The Machine Learning Advent Calendar Day 23: CNN in Excel

A novel implementation of a one-dimensional convolutional neural network (1D CNN) for text analysis has been developed entirely within Microsoft Excel, providing full transparency of its internal components. This approach allows users to visualize and understand each filter, weight, and decision-making process step-by-step, making complex deep learning operations accessible without specialized software.

Microsoft Machine Learning +1
Read More
Research
📄 Towards Data Science

The Machine Learning Advent Calendar Day 17: Neural Network Regressor in Excel

A recent development demonstrates constructing a neural network regressor entirely within Excel, utilizing only spreadsheet formulas to explicitly perform each step of the learning process, including forward propagation and backpropagation. This approach demystifies neural network operations by making the entire training process transparent, illustrating how such models can approximate non-linear functions with a minimal number of parameters. This innovative method serves as an educational tool, providing a clear, step-by-step visualization of neural network mechanics without relying on specialized machine learning frameworks. By translating complex neural network computations into accessible Excel formulas, it enhances understanding of core concepts like parameter updates and non-linear

Machine Learning Deep Learning
Read More
Research
📄 Towards Data Science

Decentralized Computation: The Hidden Principle Behind Deep Learning

Recent insights reveal that the foundational principle underpinning advancements in deep learning, including large language models, is decentralization. Unlike traditional centralized systems, these models thrive because numerous simple units interact locally, enabling complex behaviors without a central controller. This shift towards decentralized computation emphasizes the importance of local interactions among neural network components, which has driven the scalability and effectiveness of modern AI architectures.

Deep Learning
Read More
Research
📄 Towards Data Science

Optimizing PyTorch Model Inference on CPU

The article highlights advancements in deploying PyTorch model inference efficiently on Intel Xeon CPUs, emphasizing optimized performance for AI workloads without relying on GPUs. By leveraging Intel's hardware capabilities and software optimizations, such as oneDNN (Deep Neural Network Library), developers can achieve high throughput and low latency for AI applications directly on CPU infrastructure, enabling scalable and cost-effective deployment in data centers.

Deep Learning
Read More
General
📄 MarkTechPost

How to Implement Functional Components of Transformer and Mini-GPT Model from Scratch Using Tinygrad to Understand Deep Learning Internals

A recent tutorial demonstrates how to construct neural networks from scratch using Tinygrad, a minimalist deep learning framework, by meticulously building components such as tensors, autograd, multi-head attention, transformer blocks, and a mini-GPT model. This hands-on approach emphasizes understanding the internal workings of deep learning models, illustrating how Tinygrad's simplicity facilitates insights into training dynamics, kernel fusion, and optimization processes. By progressively assembling these components, the tutorial provides a clear, technical pathway to grasp complex transformer architectures and language models without relying on high-level libraries. This approach not only enhances comprehension of core AI mechanisms but also

GPT Deep Learning +1
Read More
Research
📄 Towards Data Science

Learning Triton One Kernel at a Time:Softmax

A new softmax kernel developed using Triton offers a significant advancement in speed and readability, optimized for integration with PyTorch. This kernel enhances the efficiency of softmax computations, which are critical in neural network training and inference, by providing a streamlined, high-performance implementation that simplifies deployment and accelerates model performance.

Deep Learning
Read More
Ethics
📈 VentureBeat AI

OpenAI is ending API access to fan-favorite GPT-4o model in February 2026

OpenAI has announced that its GPT-4o model, a significant milestone in multimodal AI architecture, will be retired from the API platform by mid-February 2026, with access ending on February 16, 2026. This decision reflects the model's status as a legacy system with relatively low API usage compared to newer iterations like GPT-5.1, although it remains available to individual users within ChatGPT's consumer tiers. The retirement marks a strategic shift as OpenAI phases out older models in favor of more advanced systems, while providing developers with ample warning before deprecation. GPT

GPT Deep Learning
Read More
Technology
📈 VentureBeat AI

Googles Nested Learning paradigm could solve AI's memory and continual learning problem

Researchers at Google have introduced a novel AI paradigm called Nested Learning, which addresses a key limitation of current large language models (LLMs): their inability to update or learn new information post-training. This approach conceptualizes training as a system of multi-level optimization problems, enabling the development of more expressive learning algorithms that enhance in-context learning and memory capabilities. To demonstrate its potential, the team developed a model named Hope, which has shown superior performance in language modeling, continual learning, and long-context reasoning tasks, indicating a significant step toward adaptable AI systems capable of real-world learning. This innovation tackles the memory and

Google AI Machine Learning +2
Read More
Research
📄 Towards Data Science

PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch

A recent tutorial demonstrates how to construct a three-layer neural network using PyTorch for multiple regression tasks, providing a practical, step-by-step approach for beginners. This development emphasizes the accessibility of deep learning frameworks like PyTorch for building custom models from scratch, enabling users to understand core concepts such as layer design, activation functions, and training procedures in a hands-on manner.

Deep Learning
Read More
Research
📄 Towards Data Science

How Deep Feature Embeddings and Euclidean Similarity Power Automatic Plant Leaf Recognition

Automatic plant leaf detection leverages advanced computer vision and deep learning techniques to identify plant species from leaf photographs. By extracting meaningful features and converting them into numerical embeddings, this approach enables accurate classification based on Euclidean similarity measures, enhancing the precision and efficiency of botanical identification. This innovation holds significant potential for applications in agriculture, biodiversity monitoring, and environmental research by automating and streamlining plant recognition processes.

Machine Learning Deep Learning +1
Read More
Research
📄 Towards Data Science

Understanding Convolutional Neural Networks (CNNs) Through Excel

A novel approach demonstrates how to construct a simplified Convolutional Neural Network (CNN) within Microsoft Excel, enabling a transparent view of the learning process typically regarded as a "black box." By translating core CNN operationssuch as convolution, pattern detection, and feature extractioninto Excel formulas and calculations, this method allows users to observe each step of how images are analyzed and patterns are recognized, fostering a deeper understanding of deep learning fundamentals. This innovative technique leverages Excel's computational capabilities to demystify complex neural network processes, making the mechanics of shape and pattern detection accessible to a broader audience. It

Microsoft Deep Learning
Read More
Research
📄 The Algorithmic Bridge

The Ghost of the Author

Recent advancements in AI have led to the development of sophisticated virtual ghost simulations that leverage deep learning and computer vision to create highly realistic and immersive haunted house experiences. These systems analyze user reactions in real-time, adapting the narrative and visual effects to enhance emotional engagement and fear responses, thereby pushing the boundaries of interactive entertainment and psychological experimentation. This innovation not only enhances entertainment applications but also offers new avenues for psychological research, therapy, and training by providing controlled environments to study fear and anxiety responses. The integration of AI-driven realism in virtual hauntings signifies a significant step forward in immersive technology, blending cultural storytelling with cutting

Deep Learning Computer Vision
Read More
Research
📄 Towards Data Science

The Three Ages of Data Science: When to Use Traditional Machine Learning, Deep Learning, or a LLM (Explained with One Example)

The article explores the evolution of the data scientist role across three generations of machine learning: traditional machine learning, deep learning, and large language models (LLMs). It highlights how each era has shifted the focus of data scientists from feature engineering and classical algorithms to designing neural network architectures and fine-tuning massive pre-trained models, exemplified through a practical use case that demonstrates the appropriate application of each approach depending on the problem complexity and data availability.

Machine Learning Deep Learning
Read More
Technology
📄 MarkTechPost

A Coding Implementation to Build and Train Advanced Architectures with Residual Connections, Self-Attention, and Adaptive Optimization Using JAX, Flax, and Optax

A recent tutorial demonstrates how to construct and train sophisticated neural networks utilizing JAX, Flax, and Optax, emphasizing modularity and efficiency. The core innovation involves integrating residual connections and self-attention mechanisms within a deep architecture to enhance feature learning capabilities, supported by advanced optimization techniques such as learning rate scheduling, gradient clipping, and adaptive weight decay. By leveraging JAX transformations like jit, grad, and vmap, the approach accelerates computation and ensures scalable training across multiple devices, showcasing a robust framework for developing high-performance AI models. This development underscores the growing importance of combining flexible neural network components

Deep Learning Transformers
Read More
General
📄 MarkTechPost

A Coding Implementation to Build Neural Memory Agents with Differentiable Memory, Meta-Learning, and Experience Replay for Continual Adaptation in Dynamic Environments

A recent development in AI involves the creation of neural memory agents capable of continual learning without catastrophic forgetting. By integrating a Differentiable Neural Computer (DNC) with experience replay and meta-learning techniques within a PyTorch framework, researchers have designed a memory-augmented neural network that can adapt rapidly to new tasks while preserving previously acquired knowledge. This approach leverages content-based memory addressing and prioritized replay mechanisms, enabling the model to maintain high performance across multiple learning environments. This innovation addresses a longstanding challenge in neural network trainingretaining past experiences amid ongoing learningby enhancing memory management and task adaptation.

Meta AI Deep Learning
Read More
Research
📈 VentureBeat AI

Large reasoning models almost certainly can think

Recent discourse surrounding large reasoning models (LRMs) has been fueled by Apple's publication "Illusion of Thinking," which argues that LRMs are incapable of genuine thought, asserting they merely perform pattern-matching rather than reasoning. This claim is challenged by the observation that even humans, who can understand algorithms like the Tower-of-Hanoi, often fail to solve complex instances, suggesting that the inability to perform certain calculations does not equate to a lack of thinking. The author contends that the absence of evidence against LRMs' capacity for thought is not proof of their incapacity, and posits that LR

Claude Deep Learning +2
Read More
Research
📄 MarkTechPost

Google AI Research Releases DeepSomatic: A New AI Model that Identifies Cancer Cell Genetic Variants

Google Research and UC Santa Cruz developed DeepSomatic, an AI model that accurately identifies somatic small genetic variants in cancer genomes across multiple sequencing platforms, including Illumina short reads, PacBio HiFi, and Oxford Nanopore long reads. Utilizing a convolutional neural network that processes image-like tensors encoding aligned read data, DeepSomatic distinguishes inherited from acquired variants and supports both tumor-normal and tumor-only workflows, demonstrating superior detection by uncovering previously missed variants in pediatric leukemia.

Google AI Deep Learning
Read More
Research
📄 Towards Data Science

How to Classify Lung Cancer Subtype from DNA Copy Numbers Using PyTorch

A recent development in cancer research involves utilizing PyTorch, a popular deep learning framework, to classify lung cancer subtypes based on DNA copy number variations. This approach leverages advanced machine learning techniques to analyze genomic data, enabling more precise differentiation of cancer subtypes, which is critical for personalized treatment strategies. The methodology exemplifies how data science and deep learning can enhance understanding of cancer genomics, potentially leading to improved diagnostic accuracy and targeted therapies.

Machine Learning Deep Learning
Read More
Research
📄 MarkTechPost

Ivy Framework Agnostic Machine Learning Build, Transpile, and Benchmark Across All Major Backends

Ivy introduces a groundbreaking framework that enables the development of machine learning models to be entirely framework-agnostic, supporting seamless execution across NumPy, PyTorch, TensorFlow, and JAX. This innovation leverages code transpilation, unified APIs, and advanced features like Ivy Containers and graph tracing to facilitate portable, efficient, and backend-independent deep learning workflows, significantly simplifying model creation, optimization, and benchmarking without being tied to a specific ecosystem. By providing a fully compatible neural network implementation that operates uniformly across multiple backends, Ivy demonstrates how developers can write once and deploy everywhere, reducing complexity and increasing

Machine Learning Deep Learning
Read More
Research
📈 VentureBeat AI

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

Researchers at Nvidia have introduced Reinforcement Learning Pre-training (RLP), a novel approach that incorporates reinforcement learning into the initial training phase of large language models (LLMs), encouraging models to develop independent reasoning capabilities early on. Unlike traditional methods that rely on sequential pre-training followed by fine-tuning with curated datasets, RLP enables models to learn complex reasoning directly from plain text, fostering more autonomous and adaptable AI systems. This technique treats reasoning as an action within the pretraining process, allowing models to "think for themselves" before predicting subsequent tokens, which significantly enhances their ability to perform complex reasoning tasks downstream

GPT NVIDIA +3
Read More
Business
📈 VentureBeat AI

Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger on specific problems

Alexia Jolicoeur-Martineau of Samsung's Advanced Institute of Technology has developed the Tiny Recursion Model (TRM), a neural network with only 7 million parameters that rivals or outperforms much larger language models like OpenAI's o3-mini and Google's Gemini 2.5 Pro on challenging reasoning benchmarks. This innovation demonstrates that highly effective AI models can be created affordably through recursive reasoning techniques, challenging the prevailing reliance on massive, resource-intensive foundational models and suggesting a new direction for efficient AI development.

GPT Google AI +3
Read More
Business
📄 AI News

Samsungs tiny AI model beats giant reasoning LLMs

A recent breakthrough from Samsung AI researchers introduces the Tiny Recursive Model (TRM), a 7-million-parameter neural network that outperforms much larger Large Language Models (LLMs) in complex reasoning tasks, such as the ARC-AGI intelligence benchmark. Challenging the industry norm that larger models are inherently more capable, TRM demonstrates that parameter efficiency and innovative architecture can achieve state-of-the-art results, offering a more sustainable and scalable approach to AI development. This development addresses key limitations of traditional LLMs, which often struggle with multi-step reasoning due to their token-by-token generation process,

Deep Learning
Read More
Research
📄 Towards Data Science

Visual Pollen Classification Using CNNs and Vision Transformers

Researchers have developed a novel machine learning framework that leverages convolutional neural networks (CNNs) and vision transformers to enhance pollen identification accuracy in ecological and biotechnological applications. This approach addresses the longstanding data scarcity challenge by improving classification performance through advanced deep learning architectures, enabling more precise monitoring of pollen diversity and distribution.

Machine Learning Deep Learning
Read More
Research
📄 MarkTechPost

AI and the Brain: How DINOv3 Models Reveal Insights into Human Visual Processing

Researchers at Meta AI and cole Normale Suprieure have demonstrated that the self-supervised vision transformer DINOv3, trained on billions of natural images, exhibits internal activation patterns that closely mirror human brain responses to visual stimuli. By comparing DINOv3s neural activations with neuroimaging data from fMRI and MEG, the study reveals significant convergence, suggesting that the model's processing mechanisms resemble those of the human visual system. The study further investigates how factors such as model size, training data volume, and image types influence this brain-model similarity. Variations in these parameters across multiple

Meta AI Deep Learning +2
Read More
Research
📄 Towards Data Science

What is Universality in LLMs? How to Find Universal Neurons

Research indicates that independently trained transformer models develop similar neuron activation patterns, suggesting the presence of universal neurons that underpin core linguistic and cognitive functions across different instances of large language models (LLMs). This discovery highlights a potential intrinsic structure within transformer architectures, where certain neurons consistently encode specific features or concepts, regardless of training variations, thereby advancing our understanding of model interpretability and the fundamental principles of neural network universality.

Deep Learning Transformers
Read More
Research
📄 MarkTechPost

Top 10 AI Blogs and News Websites for AI Developers and Engineers in 2025

The OpenAI Blog remains a pivotal resource for AI developers, offering detailed insights into the latest advancements in large language models, AI safety, and deployment strategies, thereby shaping the future trajectory of AI research and application. Complementing this, the NVIDIA Developer Blog emphasizes GPU-accelerated AI, providing technical guidance on optimizing deep learning workflows through CUDA programming, performance benchmarks, and hardware architecture analysis, which are crucial for maximizing computational efficiency. Together, these platforms highlight the ongoing focus on both innovative model development and hardware optimization, reflecting the industrys dual priorities of advancing AI capabilities while ensuring scalable, high-performance deployment.

GPT NVIDIA +1
Read More
Research
📄 Towards Data Science

Maximizing AI/ML Model Performance with PyTorch Compilation

Since its introduction in PyTorch 2.0 in March 2023, the development of torch.compile has marked a significant advancement in optimizing AI model performance by enabling just-in-time (JIT) graph compilation within the framework. This innovation aims to enhance execution speed and efficiency while maintaining PyTorchs core strengths of ease of use and Pythonic design, addressing longstanding challenges associated with eager execution. The evolution of torch.compile signifies a strategic shift toward integrating JIT compilation seamlessly into PyTorchs dynamic environment, potentially transforming how developers optimize deep learning models without sacrificing flexibility. This development not only improves computational efficiency

Machine Learning Deep Learning
Read More
Research
📄 Towards Data Science

Mechanistic View of Transformers: Patterns, Messages, Residual Stream and LSTMs

A recent development in transformer models proposes shifting from traditional concatenation-based attention mechanisms to a decomposition-based approach, offering a novel perspective on how attention operates within neural networks. This method emphasizes breaking down the attention process into more interpretable components, potentially enhancing the understanding of message passing and residual streams in models like Transformers and LSTMs. By decomposing attention, researchers aim to improve model interpretability and efficiency, paving the way for more transparent and potentially more effective deep learning architectures.

Deep Learning Transformers
Read More
Research
📄 MarkTechPost

MIT Researchers Develop Methods to Control Transformer Sensitivity with Provable Lipschitz Bounds and Muon

MIT researchers have developed a novel approach to stabilize the training of large-scale transformer models by enforcing provable Lipschitz bounds through spectral regulation of weights, eliminating the need for traditional normalization techniques such as activation normalization or QK norm adjustments. This method directly addresses the core issue of activation explosion and loss spikes caused by unconstrained weight and activation norms, ensuring that the model's sensitivity to input perturbations remains bounded and predictable. By mathematically constraining the Lipschitz constant, the approach enhances the robustness, stability, and generalization capabilities of transformers, which are critical for applications requiring adversarial robustness and

Deep Learning Transformers
Read More
Research
📄 Towards Data Science

Physics-Informed Neural Networks for Inverse PDEProblems

Researchers have demonstrated the application of DeepXDE, a deep learning framework, to solve the heat equation through physics-informed neural networks (PINNs). This approach leverages PINNs' ability to incorporate physical laws directly into the training process, enabling accurate solutions to inverse partial differential equations (PDEs) like the heat equation, which has significant implications for scientific computing and engineering simulations.

Deep Learning
Read More
Research
📄 Towards Data Science

Automating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner

AutoKeras and Keras Tuner are two accessible AutoML libraries designed to streamline the process of model development and hyperparameter tuning in deep learning. AutoKeras offers automated neural architecture search, enabling users to quickly identify optimal models without extensive manual experimentation, while Keras Tuner simplifies hyperparameter optimization through an intuitive interface, significantly reducing development time. These tools collectively empower data scientists and developers to enhance model performance efficiently, making advanced deep learning techniques more approachable for a broader audience.

Deep Learning
Read More
Research
📄 Towards Data Science

Grad-CAM from Scratch with PyTorch Hooks

The article explores the implementation of Grad-CAM (Gradient-weighted Class Activation Mapping) from scratch using PyTorch hooks, providing a practical approach to explainable AI (XAI). This technique enhances transparency by visualizing the regions of an input image that influence a CNN's decision, thereby improving interpretability and trust in deep learning models.

Deep Learning
Read More
Research
📄 arXiv cs.AI

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

The paper introduces T-TAME, a novel trainable attention mechanism compatible with Vision Transformers and convolutional neural networks, designed to generate high-quality explanation maps for image classification models efficiently in a single forward pass. Applied to architectures like VGG-16, ResNet-50, and ViT-B-16 on ImageNet, T-TAME outperforms existing explainability methods, enhancing interpretability without the computational cost of perturbation-based techniques.

Deep Learning Transformers
Read More
Research
📄 arXiv cs.AI

Utilizing AI for Aviation Post-Accident Analysis Classification

This paper explores how AI, particularly NLP and deep learning, can automate the analysis of aviation safety reports to improve accuracy and efficiency in identifying safety issues, such as damage levels and flight phases. It also investigates the use of Topic Modeling to uncover recurring themes, with findings indicating these methods can significantly enhance proactive safety management.

Deep Learning NLP
Read More
Research
📄 arXiv cs.AI

Utilizing AI for Aviation Post-Accident Analysis Classification

This paper explores how AI, particularly NLP and deep learning, can automate the analysis of aviation safety reports to improve safety insights, classification of damage, and identification of flight phases. It demonstrates that these techniques, along with Topic Modeling, enhance the efficiency and accuracy of safety data analysis, supporting proactive risk management.

Deep Learning NLP
Read More
Research
📄 arXiv Machine Learning

DeepRTE: Pre-trained Attention-based Neural Network for Radiative Tranfer

Researchers introduced DeepRTE, a neural network method utilizing pre-trained attention mechanisms to accurately and efficiently solve the steady-state Radiative Transfer Equation, which models radiation propagation in various scientific fields. Numerical experiments demonstrate the approach's high accuracy and computational benefits across applications like atmospheric transfer, heat transfer, and optical imaging.

Deep Learning Transformers
Read More
Research
📄 arXiv Machine Learning

Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization

A new framework called Pseudo Multi-source Domain Generalization (PMDG) is proposed to enable single-source domain generalization by generating synthetic pseudo-domains through style transfer and data augmentation, allowing existing multi-source domain generalization algorithms to be applied more practically. Extensive experiments demonstrate that PMDG can match or surpass the performance of actual multi-domain training, offering valuable insights for improving model robustness across varying data distributions.

Deep Learning
Read More