Blue Hours Seattle. 2022
Google Scholar / Github / Twitter / LinkedIn / Blog / CV
Yao Fu 符尧. yao.fu@ed.ac.uk
I am a Ph.D. student at the University of Edinburgh (2020-) with professor Mirella Lapata.
I finished my M.S. at Columbia University (2018-2020) with professor John Cunningham and my B.S. at Peking University (2013-2018) with professor Yansong Feng.
Before Ph.D., I spent great time visiting professor Alexander Rush at Cornell Tech (2019-2020).
I study large-scale generative models for human language.
My research objective is to make large language models the next generation computational platforms, empower developers to create applications, and build a new language model based ecosystem together with the community. My works study on how to inject strong abilities (like complex reasoning) to language models from first principles. My article on tracing language model abilities to their sources is now an important roadmap about large language model evolution. Before the LLM era, I studied generative models, latent variable models, variational inference, and structure prediction.
I am expected to graduate in Dec 2023 and will be on job market for industrial positions of building large language models.
Experiences
Academia
- 2020 - 2023. Ph.D. at University of Edinburgh
- 2018 - 2020. M.S. at Columbia University
- 2013 - 2018. B.S. at Peking University
Industry
- Sep - Dec 2023 (incoming). Google DeepMind. Student Researcher on Large Language Models
- Jun - Sep 2023 (incoming). MIT-IBM Waston AI Lab. Research Intern on Large Language Models
- Jul - Dec 2022. Allen Institute for AI. Research Intern on Language Model Reasoning
- Jan - Oct 2020. Alibaba DAMO Academy. Research Intern on Latent Variable Models
- May - Sep 2019. Tencent AI Lab. Research Intern on Structured Prediction
- Jan - Aug 2018. Bytedance AI Lab. Research Intern on Language Generation
Research in Large Language Models
Roadmap
- [Blog Post 2022] How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources [notion]
- Yao Fu, Hao Peng and Tushar Khot
- Analysing sources of emergent abilities of Large Language Models from first principle.
- Hacker News top 3 trending.
Complex Reasoning
- [Blog Post 2023]. Towards Complex Reasoning: the Polaris of Large Language Models [notion]
- Yao Fu
- A roadmap towards building language models of strong reasoning capabilties. Covers the full development stages: pretraining, continue training, supervised finetuning, reinforcemeng learning, chain-of-thought prompting, and evaluation.
- [ICML 2023] Oral. Specializing Smaller Language Models towards Multi-Step Reasoning. [paper][Code]
- Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, and Tushar Khot
- Trading language model’s generic ability for specialized math chain-of-thought ability.
- [ICLR 2023] Complexity-Based Prompting for Multi-Step Reasoning. [paper][code]
- Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark and Tushar Khot
- State-of-the-art reasoning performance on math word problems by prompting GPT3 with instances of complex reasoning chains.
- [ICLR 2023] Decomposed Prompting: A Modular Approach for Solving Complex Tasks. [paper][code]
- Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark and Ashish Sabharwal
- Decomposing complex task into simpler sub-tasks then solve each of them by prompting language models.
Multi-Agent Game
- [Arxiv 2023] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback [code][paper]
- Yao Fu, Hao Peng, Tushar Khot, and Mirella Lapata
- Two language models negotiate with each other and continuously improve their negotiation strategies by multi-round game playing and iterative in-context learning from AI feedback.
- [Opensource] ChatArena: Multi-Agent Language Game Environments for Large Language Models [code]
- Yuxiang Wu, Zhengyao Jiang, Akbir Khan, Yao Fu, Laura Ruis, Edward Grefenstette, and Tim Rocktäschel
- A library that provides multi-agent language game environments and facilitates research about autonomous LLM agents and their social interactions
Evaluation
- [Opensource] Chain-of-thougth Hub: Measuring LLMs’ Reasoning Performance [website]
- Yao Fu, Litu Ou, Mingyu Chen and Yuhao Wan
- Benchmarking large language models’ complex reasoning performance with chain-of-thought prompting
- [Arxiv 2023] C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models [website] [code] [paper]
- Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He
- A evaluation suite consisting of 52 subjects of STEM/ Social science/ Humanity/ Other testing language models’ Chinese ability (knowledge and reasoning)
Early reseach before large language models
- [ICML 2022] Scaling Structured Inference with Randomization. [paper][code]
- Yao Fu, John P. Cunningham and Mirella Lapata
- A family of randomized dynamic programming algorithms for scaling up classical structured prediction algorithms of different inferences (partition, marginal, entropy, reparameterization) of structures (chains, trees, and general sum-product).
- [EMNLP FigLang 2022] Just DREAM about it: Figurative Language Understanding with DREAM-FLUTE. [paper][code]
- The Third Workshop on Figurative Language Processing. In conjunction with EMNLP 2022
- Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra and Peter Clark
- Ranked top 1 in the task leaderboard. A mental model utilizing scene elaboration for understanding figurative language.
- [Arxiv 2022] Latent Topology Induction for Understanding Contextualized Representations. [paper]
- Yao Fu and Mirella Lapata
- Discovering hidden geometric structures of pretrained language models by unsupervised induction of a latent network.
- [TACL 2022] Data-to-text Generation with Variational Sequential Planning.[paper][code]
- Ratish Puduppully, Yao Fu, Mirella Lapata
- A latent planning model for generating very long document.
- [NAACL 2021] Noisy Labeled NER with Confidence Estimation. [paper][code]
- Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang and Sheng Gao. *Equal contribution.
- A confidence estimation method for estimating label noise in NER annotations and a training method based on partial marginalization according to estimated noise.
- [ICLR 2021] Probing BERT in Hyperbolic Spaces. [paper][code]
- Boli Chen*, Yao Fu*, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing. *Equal contribution.
- A Poincare probe for recovering hierarchical structures from contextualized representations. Applied to probing syntax and sentiment in BERT.
- [ICLR 2021] Prototypical Representation Learning for Relation Extraction. [paper][code]
- Ning Ding, Xiaobin Wang, Yao Fu, Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang
- A representation learning method for embedding relation prototypes on hyperspheres. Applied to supervised, semi-supervised, and few-shot relational learning.
- [AAAI 2021] Nested Named Entity Recognition with Partially Observed TreeCRFs. [paper][code]
- Yao Fu*, Chuanqi Tan*, Mosha Chen, Songfang Huang, Fei Huang. *Equal contribution.
- A Masked Inside algorithm for efficient partial marginalization of TreeCRFs. Applied to Nested NER.
- [NeurIPS 2020] Latent Template Induction with Gumbel-CRFs. [paper][code]
- Yao Fu, Chuanqi Tan, Mosha Chen, Bin Bi, Yansong Feng and Alexander Rush.
- A Gumbel-FFBS algorithm for reparameterizing and relaxing CRFs. Applied to controllable text generation with latent templates.
- [NeurIPS 2019] Paraphrase Generation with Latent Bag of Words. [paper][code]
- Yao Fu, Yansong Feng and John Cunningham.
- A differentiable planning and realization model with latent bag of words by Gumbel-topK reparameterization. Applied to paraphrase generation.
- [INLG 2019] Rethinking Text Attribute Transfer: A Lexical Analysis. [paper][code]
- Yao Fu, Hao Zhou, Jiaze Chen and Lei Li.
- A series of text mining algorithms for discovering words with strong influence on classification. Applied to analysing text attribute transfer models.
- [NAACL 2018] Natural Answer Generation with Heterogeneous Memory. [paper]
- Yao Fu and Yansong Feng.
- An attention mechanism fusing information from different source of knowledge. Applied to answer sentence generation.
Teaching
- University of Edinburgh. Natural Language Understanding. 2023 Spring.
- Teaching Assistant. Tought by Alexandra Birch, Frank Keller, and Mirela Lapata.
- Peking University. Empirical Methods for Natural Language Processing. 2022 Spring.
- Guest lecture on Text Generation. Tought by Yansong Feng.
- University of Edinburgh. Natural Language Understanding. 2022 Spring.
- Teaching Assistant. Tought by Alexandra Birch, Frank Keller, and Laura Perez.
- University of Edinburgh. Probabilistic Modeling and Reasoning. 2022 Spring.
- Teaching Assistant. Tought by Michael Gutmann.
- Peking University. Empirical Methods for Natural Language Processing. 2021 Spring.
- Guest lecture on Text Generation. Tought by Yansong Feng.
- Alibaba DAMO Academy. Advanced Probabilistic Machine Learning Seminar. 2020 Spring.
- Columbia University. COMS 4995 Applied Machine Learning, 2019 Spring.
- Course Assistant. Tought by Andreas Muller.