Editing Artificial intelligence (section)

=== Mathematics ===
Large language models, such as [[GPT-4]], [[Gemini (chatbot)|Gemini]], [[Claude (language model)|Claude]], [[Llama (language model)|LLaMa]] or [[Mistral AI|Mistral]], are increasingly used in mathematics. These probabilistic models are versatile, but can also produce wrong answers in the form of [[Hallucination (artificial intelligence)|hallucinations]]. They sometimes need a large database of mathematical problems to learn from, but also methods such as [[Supervised learning|supervised]] [[Fine-tuning (deep learning)|fine-tuning]]<ref>{{Cite journal |date=2024 |title=ReFT: Representation Finetuning for Language Models |journal=NeurIPS |arxiv=2404.03592 |last1=Wu |first1=Zhengxuan |last2=Arora |first2=Aryaman |last3=Wang |first3=Zheng |last4=Geiger |first4=Atticus |last5=Jurafsky |first5=Dan |last6=Manning |first6=Christopher D. |last7=Potts |first7=Christopher }}</ref> or trained [[Statistical classification|classifiers]] with human-annotated data to improve answers for new problems and learn from corrections.<ref>{{Cite web |date=2023-05-31 |title=Improving mathematical reasoning with process supervision |url=https://openai.com/index/improving-mathematical-reasoning-with-process-supervision/ |access-date=2025-01-26 |website=OpenAI |language=en-US}}</ref> A February 2024 study showed that the performance of some language models for reasoning capabilities in solving math problems not included in their training data was low, even for problems with only minor deviations from trained data.<ref>{{Cite arXiv |eprint=2402.19450 |class=cs.AI |first=Saurabh |last=Srivastava |title=Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap |date=2024-02-29}}</ref> One technique to improve their performance involves training the models to produce correct [[Automated reasoning|reasoning]] steps, rather than just the correct result.<ref>{{cite arXiv |eprint=2305.20050v1 |class=cs.LG |first1=Hunter |last1=Lightman |first2=Vineet |last2=Kosaraju |title=Let's Verify Step by Step |date=2023 |last3=Burda |first3=Yura |last4=Edwards |first4=Harri |last5=Baker |first5=Bowen |last6=Lee |first6=Teddy |last7=Leike |first7=Jan |last8=Schulman |first8=John |last9=Sutskever |first9=Ilya |last10=Cobbe |first10=Karl}}</ref> The [[Alibaba Group]] developed a version of its ''[[Qwen]]'' models called ''Qwen2-Math'', that achieved state-of-the-art performance on several mathematical benchmarks, including 84% accuracy on the MATH dataset of competition mathematics problems.<ref name="VentureBeat 8 August 2024">{{cite web |last1=Franzen |first1=Carl |title=Alibaba claims no. 1 spot in AI math models with Qwen2-Math |url=https://venturebeat.com/ai/alibaba-claims-no-1-spot-in-ai-math-models-with-qwen2-math/ |website=VentureBeat |date=2024-08-08|access-date=2025-02-16}}</ref> In January 2025, Microsoft proposed the technique ''rStar-Math'' that leverages [[Monte Carlo tree search]] and step-by-step reasoning, enabling a relatively small language model like ''Qwen-7B'' to solve 53% of the [[American Invitational Mathematics Examination|AIME]] 2024 and 90% of the MATH benchmark problems.<ref>{{Cite web |last=Franzen |first=Carl |date=2025-01-09 |title=Microsoft's new rStar-Math technique upgrades small models to outperform OpenAI's o1-preview at math problems |url=https://venturebeat.com/ai/microsofts-new-rstar-math-technique-upgrades-small-models-to-outperform-openais-o1-preview-at-math-problems/ |access-date=2025-01-26 |website=VentureBeat |language=en-US}}</ref>

Alternatively, dedicated models for mathematical problem solving with higher precision for the outcome including proof of theorems have been developed such as ''AlphaTensor'', ''[[AlphaGeometry]]'' and ''AlphaProof'' all from [[Google DeepMind]],<ref>{{Cite web |last=Roberts |first=Siobhan |date=July 25, 2024 |title=AI achieves silver-medal standard solving International Mathematical Olympiad problems |url=https://www.nytimes.com/2024/07/25/science/ai-math-alphaproof-deepmind.html |access-date=2024-08-07 |website=[[The New York Times]] |archive-date=26 September 2024 |archive-url=https://web.archive.org/web/20240926131402/https://www.nytimes.com/2024/07/25/science/ai-math-alphaproof-deepmind.html |url-status=live }}</ref> ''Llemma'' from [[EleutherAI]]<ref>{{Cite web |last1=Azerbayev |first1=Zhangir |last2=Schoelkopf |first2=Hailey |last3=Paster |first3=Keiran |last4=Santos |first4=Marco Dos |last5=McAleer' |first5=Stephen |last6=Jiang |first6=Albert Q. |last7=Deng |first7=Jia |last8=Biderman |first8=Stella |last9=Welleck |first9=Sean |date=2023-10-16 |title=Llemma: An Open Language Model For Mathematics |url=https://blog.eleuther.ai/llemma/ |access-date=2025-01-26 |website=EleutherAI Blog |language=en}}</ref> or ''Julius''.<ref>{{Cite web |title=Julius AI |url=https://julius.ai/home/ai-math |access-date= |website=julius.ai |language=en}}</ref>

When natural language is used to describe mathematical problems, converters can transform such prompts into a formal language such as [[Lean (proof assistant)|Lean]] to define mathematical tasks.

Some models have been developed to solve challenging problems and reach good results in benchmark tests, others to serve as educational tools in mathematics.<ref>{{Cite web |last=McFarland |first=Alex |date=2024-07-12 |title=8 Best AI for Math Tools (January 2025) |url=https://www.unite.ai/best-ai-for-math-tools/ |access-date=2025-01-26 |website=Unite.AI |language=en-US}}</ref>

[[Topological deep learning]] integrates various [[topology|topological]] approaches.