Interaktivní osnova
👷 Seminar on Machine Learning, Information Retrieval, and Scientific Visualization
[Marek Kadlčík]: Teaching Models to Use a Calculator for Solving Math Word Problems 23. 11. 2023
Abstract
Large language models (LLMs) are commonly used for solving natural language tasks like question answering or generating text. However, their outputs can be outdated, factually incorrect, or untruthful. In particular, LLMs are notoriously bad at arithmetic computation. A promising way to mitigate this problem is to allow LLMs to interact with external tools, such as a calculator, a computer algebra system, or a code interpreter.
In this talk, we will cover the training of calculator-using models, compare their capability of solving math word problems to vanilla LLM baselines, and discuss possible improvements in the training workflow.
Visual Abstract
Slides
Presentation recordings
Readings
- Parisi, Aaron & Zhao, Yao & Fiedel, Noah. (2022). TALM: Tool Augmented Language Models. doi.org/10.48550/arXiv.2205.12255.
- Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom (2022). Toolformer: Language Models Can Teach Themselves to Use Tools. doi.org/10.48550/arXiv.2302.04761
- Luyu Gao et al. PAL (2022): Program-aided Language Models
- Kadlčík et al. (2023): Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems
- LangChain: https://www.langchain.com/
Catering
The talk itself was very foody.