Page 99 of 130 • 1560 Total Articles

createLiveAI

Continue exploring the latest AI breakthroughs, technology insights, and industry analysis. Page 99 of our comprehensive AI news collection.

📰 Latest Intelligence

Showing 12 articles on page 99 of 130

Live feed
Research
📄 arXiv cs.AI

Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs

A study analyzing three large language models (Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o) found that, unlike humans, they are less sensitive to task difficulty and tend to exhibit stereotypical biases in confidence estimates based on personas such as race, gender, or expertise, despite consistent answer accuracy. To address overconfidence and improve interpretability, researchers propose Answer-Free Confidence Estimation (AFCE), a two-stage self-assessment method that separates

GPT Claude +1
Read More
Research
📄 arXiv cs.AI

Evaluation of LLMs for mathematical problem solving

This study evaluates three large language modelsGPT-4o, DeepSeek-V3, and Gemini-2.0on diverse mathematical datasets, assessing their accuracy, reasoning steps, and problem comprehension using a Structured Chain-of-Thought framework. Results indicate GPT-4o's superior stability and performance on complex problems, while each model exhibits specific strengths and weaknesses in reasoning, explanation, and logical understanding.

GPT Google AI
Read More
Business
📄 arXiv cs.AI

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

The article introduces the Pedagogical Chain-of-Thought (PedCoT), a novel prompting strategy inspired by educational theory, designed to improve large language models' ability to accurately detect reasoning mistakes, especially in mathematical tasks. This approach significantly enhances mistake identification performance, laying the groundwork for more reliable self-correction and automated math grading in LLMs.

business llm
Read More
Research
📄 arXiv cs.AI

MIR: Methodology Inspiration Retrieval for Scientific Research Problems

The paper introduces Methodology Inspiration Retrieval (MIR), a novel approach to retrieving prior research that can inspire solutions for new scientific problems by leveraging a Methodology Adjacency Graph (MAG) capturing citation-based methodological lineage. Their method significantly improves retrieval performance over strong baselines and, combined with LLM-based re-ranking, shows promise for enhancing automated scientific discovery through inspiration-driven literature retrieval.

Research
📄 arXiv cs.AI

Monitoring Robustness and Individual Fairness

Researchers propose runtime monitoring of black-box AI models to detect input-output robustness violations, such as adversarial or fairness issues, by observing sequences of model executions and raising alarms when similar inputs yield dissimilar outputs. They introduce the tool Clemont, which employs online FRNN algorithms and a novel binary decision diagram-based method, demonstrating effectiveness in real-time detection across standard benchmarks.

Research
📄 arXiv cs.AI

RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

The study introduces RiOSWorld, a comprehensive benchmark with 492 tasks designed to evaluate safety risks of multimodal large language models (MLLMs) acting as computer-use agents in real-world scenarios across various applications. Experiments reveal that current agents face significant safety challenges, underscoring the urgent need for improved safety alignment in deploying trustworthy computer manipulation agents.

Autonomous Systems
Read More
Research
📄 arXiv cs.AI

SMELLNET: A Large-scale Dataset for Real-world Smell Recognition

Researchers have developed SmellNet, a large-scale database of approximately 180,000 samples capturing diverse natural smells, to advance AI's ability to identify substances through scent. Despite promising classification accuracy, the study highlights ongoing technical challenges in creating robust, real-time, on-edge smell recognition models capable of functioning reliably in real-world environments.

Research
📄 arXiv cs.AI

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

The paper introduces T-TAME, a novel trainable attention mechanism compatible with Vision Transformers and convolutional neural networks, designed to generate high-quality explanation maps for image classification models efficiently in a single forward pass. Applied to architectures like VGG-16, ResNet-50, and ViT-B-16 on ImageNet, T-TAME outperforms existing explainability methods, enhancing interpretability without the computational cost of perturbation-based techniques.

Deep Learning Transformers
Read More

Page 99 of 130 • Showing articles 1177-1188 of 1560