Page 99 of 130 • 1560 Total Articles

createLiveAI

Continue exploring the latest AI breakthroughs, technology insights, and industry analysis. Page 99 of our comprehensive AI news collection.

All Articles 1560 Business 249 Ethics 150 General 142 Policy 12 Research 793 Startups 13 Technology 201

📰 Latest Intelligence

Showing 12 articles on page 99 of 130

Live feed

📱 2-column layout

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

A study analyzing three large language models (Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o) found that, unlike humans, they are less sensitive to task difficulty and tend to exhibit stereotypical biases in confidence estimates based on personas such as race, gender, or expertise, despite consistent answer accuracy. To address overconfidence and improve interpretability, researchers propose Answer-Free Confidence Estimation (AFCE), a two-stage self-assessment method that separates

GPT Claude +1

arXiv cs.AI

Technology

📄 arXiv cs.AI

Jun 4, 2025

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

The article highlights the critical issue of data leakage in machine learning, which can lead to overly optimistic performance estimates and unreliable models, especially when practitioners with limited ML expertise use user-friendly tools. It emphasizes the importance of addressing data leakage across different ML frameworks, including transfer learning, to ensure more robust and trustworthy AI applications.

Machine Learning

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

Evaluation of LLMs for mathematical problem solving

This study evaluates three large language modelsGPT-4o, DeepSeek-V3, and Gemini-2.0on diverse mathematical datasets, assessing their accuracy, reasoning steps, and problem comprehension using a Structured Chain-of-Thought framework. Results indicate GPT-4o's superior stability and performance on complex problems, while each model exhibits specific strengths and weaknesses in reasoning, explanation, and logical understanding.

GPT Google AI

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

A new hybrid framework combines Retrieval-Augmented Generation (RAG) with intent-based canned responses to enhance enterprise conversational AI, improving efficiency and accuracy. This system dynamically routes queries, maintains coherence in multi-turn interactions, and achieves high accuracy (95%) with low latency (180ms), addressing key deployment challenges.

arXiv cs.AI

Business

📄 arXiv cs.AI

Jun 4, 2025

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

The article introduces the Pedagogical Chain-of-Thought (PedCoT), a novel prompting strategy inspired by educational theory, designed to improve large language models' ability to accurately detect reasoning mistakes, especially in mathematical tasks. This approach significantly enhances mistake identification performance, laying the groundwork for more reliable self-correction and automated math grading in LLMs.

business llm

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

MIR: Methodology Inspiration Retrieval for Scientific Research Problems

The paper introduces Methodology Inspiration Retrieval (MIR), a novel approach to retrieving prior research that can inspire solutions for new scientific problems by leveraging a Methodology Adjacency Graph (MAG) capturing citation-based methodological lineage. Their method significantly improves retrieval performance over strong baselines and, combined with LLM-based re-ranking, shows promise for enhancing automated scientific discovery through inspiration-driven literature retrieval.

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

MIRROR: Cognitive Inner Monologue Between Conversational Turns for Persistent Reflection and Reasoning in Conversational LLMs

The MIRROR architecture enhances large language models by mimicking human inner monologue through modular reasoning and reflection, comprising a Thinker and Talker system that maintains an internal narrative for improved multi-turn dialogue. Evaluated on safety-critical and complex scenarios, models with MIRROR achieved up to 156% better performance, addressing key failure modes like sycophancy and inconsistency, and significantly outperforming baseline models.

GPT Claude +2

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

Monitoring Robustness and Individual Fairness

Researchers propose runtime monitoring of black-box AI models to detect input-output robustness violations, such as adversarial or fairness issues, by observing sequences of model executions and raising alarms when similar inputs yield dissimilar outputs. They introduce the tool Clemont, which employs online FRNN algorithms and a novel binary decision diagram-based method, demonstrating effectiveness in real-time detection across standard benchmarks.

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

The study introduces RiOSWorld, a comprehensive benchmark with 492 tasks designed to evaluate safety risks of multimodal large language models (MLLMs) acting as computer-use agents in real-world scenarios across various applications. Experiments reveal that current agents face significant safety challenges, underscoring the urgent need for improved safety alignment in deploying trustworthy computer manipulation agents.

Autonomous Systems

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

Sleep Brain and Cardiac Activity Predict Cognitive Flexibility and Conceptual Reasoning Using Deep Learning

This study introduces CogPSGFormer, a multi-modal deep learning model that predicts individual cognitive performance, such as executive functions, from sleep microstructure using ECG and EEG data. Evaluated on 817 participants, the model achieved 80.3% accuracy in classifying cognitive performance levels, demonstrating the potential of sleep-derived signals for cognitive assessment.

Deep Learning Transformers

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

SMELLNET: A Large-scale Dataset for Real-world Smell Recognition

Researchers have developed SmellNet, a large-scale database of approximately 180,000 samples capturing diverse natural smells, to advance AI's ability to identify substances through scent. Despite promising classification accuracy, the study highlights ongoing technical challenges in creating robust, real-time, on-edge smell recognition models capable of functioning reliably in real-world environments.

arXiv cs.AI

Research

📄 arXiv cs.AI

Jun 4, 2025

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

The paper introduces T-TAME, a novel trainable attention mechanism compatible with Vision Transformers and convolutional neural networks, designed to generate high-quality explanation maps for image classification models efficiently in a single forward pass. Applied to architectures like VGG-16, ResNet-50, and ViT-B-16 on ImageNet, T-TAME outperforms existing explainability methods, enhancing interpretability without the computational cost of perturbation-based techniques.

Deep Learning Transformers

1 2 3 4 5 6 7 ... 130

Page 99 of 130 • Showing articles 1177-1188 of 1560

Quick Navigation

Jump to any page or browse by category

Latest (Page 1) Business 249 Ethics 150 General 142 Policy 12 Research 793 Startups 13