Hongyuan Mei | publications

* denotes equal contribution

2025

NAACL
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from US Supreme Court Opinions

Mourad Heddaya, Kyle MacMillan, Anup Malani, Hongyuan Mei, and Chenhao Tan

In Findings of NAACL 2025

Abs arXiv Bib

This paper introduces CaseSumm, a novel dataset for long-context summarization in the legal domain that addresses the need for longer and more complex datasets for summarization evaluation. We collect 25.6K U.S. Supreme Court (SCOTUS) opinions and their official summaries, known as "syllabuses." Our dataset is the largest open legal case summarization dataset, and is the first to include summaries of SCOTUS decisions dating back to 1815. We also present a comprehensive evaluation of LLM-generated summaries using both automatic metrics and expert human evaluation, revealing discrepancies between these assessment methods. Our evaluation shows Mistral 7b, a smaller open-source model, outperforms larger models on most automatic metrics and successfully generates syllabus-like summaries. In contrast, human expert annotators indicate that Mistral summaries contain hallucinations. The annotators consistently rank GPT-4 summaries as clearer and exhibiting greater sensitivity and specificity. Further, we find that LLM-based evaluations are not more correlated with human evaluations than traditional automatic metrics. Furthermore, our analysis identifies specific hallucinations in generated summaries, including precedent citation errors and misrepresentations of case facts. These findings demonstrate the limitations of current automatic evaluation methods for legal summarization and highlight the critical role of human evaluation in assessing summary quality, particularly in complex, high-stakes domains. CaseSumm is available at https://huggingface.co/datasets/ChicagoHAI/CaseSumm
@inproceedings{heddaya2025casesumm, abbr = {NAACL}, bibtex_show = {true}, title = {CaseSumm: A Large-Scale Dataset for Long-Context Summarization from US Supreme Court Opinions}, author = {Heddaya, Mourad and MacMillan, Kyle and Malani, Anup and Mei, Hongyuan and Tan, Chenhao}, booktitle = {Findings of NAACL}, year = {2025}, arxiv = {2501.00097} }

2024

arxiv
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering

Siqiao Xue, Tingting Chen, Fan Zhou, Qingyang Dai, Zhixuan Chu, and Hongyuan Mei

arXiv preprint 2024

Abs arXiv Bib

In this paper, we introduce FAMMA, an open-source benchmark for financial multilingual multimodal question answering (QA). Our benchmark aims to evaluate the abilities of multimodal large language models (MLLMs) in answering questions that require advanced financial knowledge and sophisticated reasoning. It includes 1,758 meticulously collected question-answer pairs from university textbooks and exams, spanning 8 major subfields in finance including corporate finance, asset management, and financial engineering. Some of the QA pairs are written in Chinese or French, while a majority of them are in English. These questions are presented in a mixed format combining text and heterogeneous image types, such as charts, tables, and diagrams. We evaluate a range of state-of-the-art MLLMs on our benchmark, and our analysis shows that FAMMA poses a significant challenge for these models. Even advanced systems like GPT-4o and Claude-35-Sonnet achieve only 42% accuracy. Additionally, the open-source Qwen2-VL lags notably behind its proprietary counterparts. Lastly, we explore GPT o1-style reasoning chains to enhance the models’ reasoning capabilities, which significantly improve error correction. Our FAMMA benchmark will facilitate future research to develop expert systems in financial QA. The leaderboard is available at https://famma-bench.github.io/famma/.
@article{xue2024famma, abbr = {arxiv}, bibtex_show = {true}, title = {FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering}, author = {Xue, Siqiao and Chen, Tingting and Zhou, Fan and Dai, Qingyang and Chu, Zhixuan and Mei, Hongyuan}, journal = {arXiv preprint}, year = {2024}, arxiv = {2410.04526} }
NLP4Sci
Hypothesis Generation with Large Language Models

Yangqiaoyu Zhou, Haokun Liu, Tejes Srivastava, Hongyuan Mei, and Chenhao Tan

In EMNLP Workshop of NLP4Science 2024

Abs arXiv Bib

Effective generation of novel hypotheses is instrumental to scientific progress. So far, researchers have been the main powerhouse behind hypothesis generation by painstaking data analysis and thinking (also known as the Eureka moment). In this paper, we examine the potential of large language models (LLMs) to generate hypotheses. We focus on hypothesis generation based on data (i.e., labeled examples). To enable LLMs to handle arbitrarily long contexts, we generate initial hypotheses from a small number of examples and then update them iteratively to improve the quality of hypotheses. Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks, improving accuracy by 31.7% on a synthetic dataset and by 13.9%, 3.3% and, 24.9% on three real-world datasets. We also outperform supervised learning by 12.8% and 11.2% on two challenging real-world datasets. Furthermore, we find that the generated hypotheses not only corroborate human-verified theories but also uncover new insights for the tasks.
@inproceedings{zhou2024hypothesis, abbr = {NLP4Sci}, bibtex_show = {true}, title = {Hypothesis Generation with Large Language Models}, author = {Zhou, Yangqiaoyu and Liu, Haokun and Srivastava, Tejes and Mei, Hongyuan and Tan, Chenhao}, booktitle = {EMNLP Workshop of NLP4Science}, year = {2024}, arxiv = {2404.04326} }
COLM
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models

Peng Ding, Jiading Fang, Peng Li, Kangrui Wang, Xiaochen Zhou, Mo Yu, Jing Li, Matthew R Walter, and Hongyuan Mei

In COLM 2024

Abs arXiv Bib

Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their capabilities to perform text-based mapping and navigation. Our benchmark includes 53 mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, a large language model reads the walkthrough and answers hundreds of mapping and navigation questions such as "How should you go to Attic from West of House?" and "Where are we if we go north and east from Cellar?". Although these questions are easy to humans, it turns out that even GPT-4, the best-to-date language model, performs poorly at answering them. Further, our experiments suggest that a strong mapping and navigation ability would benefit large language models in performing relevant downstream tasks, such as playing textgames. Our MANGO benchmark will facilitate future research on methods that improve the mapping and navigation capabilities of language models. We host our leaderboard, data, code, and evaluation program at https://oaklight.github.io/mgwb/ and https://github.com/oaklight/mango/.
@inproceedings{ding2024mango, abbr = {COLM}, bibtex_show = {true}, title = {MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models}, author = {Ding, Peng and Fang, Jiading and Li, Peng and Wang, Kangrui and Zhou, Xiaochen and Yu, Mo and Li, Jing and Walter, Matthew R and Mei, Hongyuan}, booktitle = {COLM}, year = {2024}, arxiv = {2403.19913} }
ICRA
Statler: State-Maintaining Language Models for Embodied Reasoning

Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew R. Walter

In ICRA 2024

Abs arXiv Bib

Large language models (LLMs) provide a promising tool that enable robots to perform complex robot reasoning tasks. However, the limited context window of contemporary LLMs makes reasoning over long time horizons difficult. Embodied tasks such as those that one might expect a household robot to perform typically require that the planner consider information acquired a long time ago (e.g., properties of the many objects that the robot previously encountered in the environment). Attempts to capture the world state using an LLM’s implicit internal representation is complicated by the paucity of task- and environment-relevant information available in a robot’s action history, while methods that rely on the ability to convey information via the prompt to the LLM are subject to its limited context window. In this paper, we propose Statler, a framework that endows LLMs with an explicit representation of the world state as a form of “memory” that is maintained over time. Integral to Statler is its use of two instances of general LLMs – a world-model reader and a world-model writer – that interface with and maintain the world state. By providing access to this world state “memory”, Statler improves the ability of existing LLMs to reason over longer time horizons without the constraint of context length. We evaluate the effectiveness of our approach on three simulated table-top manipulation domains and a real robot domain, and show that it improves the state-of-the-art in LLM-based robot reasoning.
@inproceedings{yoneda2023llm, abbr = {ICRA}, bibtex_show = {true}, title = {Statler: State-Maintaining Language Models for Embodied Reasoning}, author = {Yoneda, Takuma and Fang, Jiading and Li, Peng and Zhang, Huanyu and Jiang, Tianchong and Lin, Shengjie and Picker, Ben and Yunis, David and Mei, Hongyuan and Walter, Matthew R.}, booktitle = {ICRA}, year = {2024}, arxiv = {2306.17840} }
ICLR
Easytpp: Towards open benchmarking the temporal point processes

Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Hongyan Hao, Fan Zhou, Caigao Jiang, Chen Pan, James Y Zhang, Qingsong Wen, Jun Zhou, and Hongyuan Mei

In ICLR 2024

Abs arXiv Bib Code Talk

Continuous-time event sequences play a vital role in real-world domains such as healthcare, finance, online shopping, social networks, and so on. To model such data, temporal point processes (TPPs) have emerged as the most natural and competitive models, making a significant impact in both academic and application communities. Despite the emergence of many powerful models in recent years, there hasn’t been a central benchmark for these models and future research endeavors. This lack of standardization impedes researchers and practitioners from comparing methods and reproducing results, potentially slowing down progress in this field. In this paper, we present EasyTPP, the first central repository of research assets (e.g., data, models, evaluation programs, documentations) in the area of event sequence modeling. Our EasyTPP makes several unique contributions to this area: a unified interface of using existing datasets and adding new datasets; a wide range of evaluation programs that are easy to use and extend as well as facilitate reproducible research; implementations of popular neural TPPs, together with a rich library of modules by composing which one could quickly build complex models. All the data and implementation can be found at this https URL. We will actively maintain this benchmark and welcome contributions from other researchers and practitioners. Our benchmark will help promote reproducible research in this field, thus accelerating research progress as well as making more significant real-world impacts.
@inproceedings{zhao2023logic, abbr = {ICLR}, bibtex_show = {true}, title = {Easytpp: Towards open benchmarking the temporal point processes}, author = {Xue, Siqiao and Shi, Xiaoming and Chu, Zhixuan and Wang, Yan and Hao, Hongyan and Zhou, Fan and Jiang, Caigao and Pan, Chen and Zhang, James Y and Wen, Qingsong and Zhou, Jun and Mei, Hongyuan}, booktitle = {ICLR}, year = {2024}, arxiv = {2307.08097}, code = {https://github.com/ant-research/EasyTemporalPointProcess} }

2023

arxiv
Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

Chen Feng Tsai, Xiaochen Zhou, Sierra S. Liu, Jing Li, Mo Yu, and Hongyuan Mei

arXiv preprint 2023

Abs arXiv Bib

Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence. Precisely, ChatGPT can not construct the world model by playing the game or even reading the game manual; it may fail to leverage the world knowledge that it already has; it cannot infer the goal of each step as the game progresses. Our results open up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing.
@article{tsai2023game, abbr = {arxiv}, bibtex_show = {true}, title = {Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions}, author = {Tsai, Chen Feng and Zhou, Xiaochen and Liu, Sierra S. and Li, Jing and Yu, Mo and Mei, Hongyuan}, journal = {arXiv preprint}, year = {2023}, arxiv = {2304.02868} }
arxiv
Autoregressive Modeling with Lookahead Attention

Li Du, Hongyuan Mei, and Jason Eisner

arXiv preprint 2023

Abs arXiv Bib

To predict the next token, autoregressive models ordinarily examine the past. Could they also benefit from also examining hypothetical futures? We consider a novel Transformer-based autoregressive architecture that estimates the next-token distribution by extrapolating multiple continuations of the past, according to some proposal distribution, and attending to these extended strings. This architecture draws insights from classical AI systems such as board game players: when making a local decision, a policy may benefit from exploring possible future trajectories and analyzing them. On multiple tasks including morphological inflection and Boolean satisfiability, our lookahead model is able to outperform the ordinary Transformer model of comparable size. However, on some tasks, it appears to be benefiting from the extra computation without actually using the lookahead information. We discuss possible variant architectures as well as future speedups.
@article{du2023lookahead, abbr = {arxiv}, bibtex_show = {true}, title = {Autoregressive Modeling with Lookahead Attention}, author = {Du, Li and Mei, Hongyuan and Eisner, Jason}, journal = {arXiv preprint}, year = {2023}, arxiv = {2305.12272} }
EMNLP
Explicit Planning Helps Language Models in Logical Reasoning

Hongyu Zhao, Kangrui Wang, Mo Yu, and Hongyuan Mei

In EMNLP 2023

Abs arXiv Bib Code Talk

Language models have been shown to perform remarkably well on a wide range of natural language processing tasks. In this paper, we propose LEAP, a novel system that uses language models to perform multi-step logical reasoning and incorporates explicit planning into the inference procedure. Explicit planning enables the system to make more informed reasoning decisions at each step by looking ahead into their future effects. Moreover, we propose a training strategy that safeguards the planning process from being led astray by spurious features. Our full system significantly outperforms other competing methods on multiple standard datasets. When using small T5 models as its core selection and deduction components, our system performs competitively compared to GPT-3 despite having only about 1B parameters (i.e., 175 times smaller than GPT-3). When using GPT-3.5, it significantly outperforms chain-of-thought prompting on the challenging PrOntoQA dataset. We have conducted extensive empirical studies to demonstrate that explicit planning plays a crucial role in the system’s performance.
@inproceedings{zhao2023logid, abbr = {EMNLP}, bibtex_show = {true}, title = {Explicit Planning Helps Language Models in Logical Reasoning}, author = {Zhao, Hongyu and Wang, Kangrui and Yu, Mo and Mei, Hongyuan}, booktitle = {EMNLP}, year = {2023}, arxiv = {2303.15714}, code = {https://github.com/cindermond/leap} }
NeurIPS
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning

Xiaoming Shi, Siqiao Xue, Kangrui Wang, Fan Zhou, James Y. Zhang, Jun Zhou, Chenhao Tan, and Hongyuan Mei

In NeurIPS 2023

Abs arXiv Bib Code Talk

Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction performance of event sequence models. We design LAMP, a framework that integrates a large language model in event prediction. Particularly, the language model performs abductive reasoning to assist an event sequence model: the event model proposes predictions on future events given the past; instructed by a few expert-annotated demonstrations, the language model learns to suggest possible causes for each proposal; a search module finds out the previous events that match the causes; a scoring function learns to examine whether the retrieved events could actually cause the proposal. Through extensive experiments on several challenging real-world datasets, we demonstrate that our framework—thanks to the reasoning capabilities of large language models—could significantly outperform the state-of-the-art event sequence models.
@inproceedings{shi2023abductive, abbr = {NeurIPS}, bibtex_show = {true}, title = {Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning}, author = {Shi, Xiaoming and Xue, Siqiao and Wang, Kangrui and Zhou, Fan and Zhang, James Y. and Zhou, Jun and Tan, Chenhao and Mei, Hongyuan}, code = {https://github.com/iLampard/lamp}, booktitle = {NeurIPS}, year = {2023}, arxiv = {2305.16646} }
ACL
Robustness of Learning from Task Instructions

Jiasheng Gu, Hongyu Zhao, Hanzi Xu, Liangyu Nie, Hongyuan Mei, and Wenpeng Yin

In Findings of ACL 2023

Abs arXiv Bib

Traditional supervised learning mostly works on individual tasks and requires training on a large set of task-specific examples. This paradigm seriously hinders the development of task generalization since preparing a task-specific example set is costly. To build a system that can quickly and easily generalize to new tasks, task instructions have been adopted as an emerging trend of supervision recently. These instructions give the model the definition of the task and allow the model to output the appropriate answer based on the instructions and inputs. However, task instructions are often expressed in different forms, which can be interpreted from two threads: first, some instructions are short sentences and are pretrained language model (PLM) oriented, such as prompts, while other instructions are paragraphs and are human-oriented, such as those in Amazon MTurk; second, different end-users very likely explain the same task with instructions of different textual expressions. A robust system for task generalization should be able to handle any new tasks regardless of the variability of instructions. However, the system robustness in dealing with instruction-driven task generalization is still unexplored. This work investigates the system robustness when the instructions of new tasks are (i) manipulated, (ii) paraphrased, or (iii) from different levels of conciseness. To our knowledge, this is the first work that systematically studies how robust a PLM is when it is supervised by instructions with different factors of variability.
@inproceedings{gu2023robust, abbr = {ACL}, bibtex_show = {true}, title = {Robustness of Learning from Task Instructions}, author = {Gu, Jiasheng and Zhao, Hongyu and Xu, Hanzi and Nie, Liangyu and Mei, Hongyuan and Yin, Wenpeng}, booktitle = {Findings of ACL}, year = {2023}, arxiv = {2212.03813} }
AISTATS
Continuous-Time Decision Transformer for Healthcare Applications

Zhiyue Zhang, Hongyuan Mei, and Yanxun Xu

In AISTATS 2023

Abs Bib Code Talk

Offline reinforcement learning is a promising approach for training intelligent medical agents to learn treatment policies and assist decision making in many healthcare applications, such as scheduling clinical visits and assigning dosages for patients with chronic conditions. In this paper, we investigate the potential usefulness of Decision Transformer—a new offline reinforcement learning paradigm—in medical domains where decision making in continuous time is desired. As Decision Transformer only handles discrete-time (or, turn-based) sequential decision making scenarios, we generalize it to Continuous-Time Decision Transformer that not only considers the past clinical measurements and treatments but also the timings of previous visits, and learns to suggest the timings of future visits as well as the treatment plan at each visit. Extensive experiments on synthetic datasets and simulators motivated by real-world medical applications demonstrate that Continuous-Time Decision Transformer is able to outperform competitors and has clinical utility in terms of improving patients’ health and prolonging their survival by learning high-performance policies from logged data generated using policies of different levels of quality.
@inproceedings{zhang2023health, abbr = {AISTATS}, bibtex_show = {true}, author = {Zhang, Zhiyue and Mei, Hongyuan and Xu, Yanxun}, title = {Continuous-Time Decision Transformer for Healthcare Applications}, booktitle = {AISTATS}, year = {2023}, code = {https://github.com/ZhiyueZ/CTDT} }
AAAI
Bellman Meets Hawkes: Model-Based Reinforcement Learning via Temporal Point Processes

Chao Qu, Xiaoyu Tan, Siqiao Xue, Xiaoming Shi, James Zhang, and Hongyuan Mei

In AAAI 2023

Abs arXiv Bib Code Talk

We consider a sequential decision making problem where the agent faces the environment characterized by the stochastic discrete events and seeks an optimal intervention policy such that its long-term reward is maximized. This problem exists ubiquitously in social media, finance and health informatics but is rarely investigated by the conventional research in reinforcement learning. To this end, we present a novel framework of the model-based reinforcement learning where the agent’s actions and observations are asynchronous stochastic discrete events occurring in continuous-time. We model the dynamics of the environment by Hawkes process with external intervention control term and develop an algorithm to embed such process in the Bellman equation which guides the direction of the value gradient. We demonstrate the superiority of our method in both synthetic simulator and real-world problem.
@inproceedings{qu2023bellman, abbr = {AAAI}, bibtex_show = {true}, title = {Bellman Meets {H}awkes: Model-Based Reinforcement Learning via Temporal Point Processes}, author = {Qu, Chao and Tan, Xiaoyu and Xue, Siqiao and Shi, Xiaoming and Zhang, James and Mei, Hongyuan}, booktitle = {AAAI}, year = {2023}, arxiv = {2201.12569}, code = {https://github.com/Event-Driven-rl/Event-Driven-RL} }

2022

EMNLP
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters

Hongyu Zhao, Hao Tan, and Hongyuan Mei

In EMNLP 2022

Abs arXiv Bib Code Talk

Adapter-tuning is a paradigm that transfers a pretrained language model to downstream tasks by adding and tuning a small number of new parameters. Previously proposed adapter architectures are all feed-forward neural networks. In this paper, we investigate the effectiveness of using tiny-attention – i.e., attention with extremely small per-head dimensionality – as adapters. Our tiny-attention adapter learns to modify the hidden states at each position directly conditioned on the hidden states at all the other positions, which is missed by the previously proposed adapters. Moreover, we view its multiple attention heads as a mixture of experts and propose to average their weights during deployment, which further reduces its inference computation cost. On the GLUE benchmark, our tiny-attention adapter outperforms the other parameter-efficient transfer learning methods as well as full fine-tuning while only updating 0.05% of the parameters. On the FewGLUE benchmark, its performance is comparable to that of GPT-3 and PET.
@inproceedings{zhao2022tiny, abbr = {EMNLP}, bibtex_show = {true}, author = {Zhao, Hongyu and Tan, Hao and Mei, Hongyuan}, title = {Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters}, booktitle = {EMNLP}, year = {2022}, arxiv = {2211.01979}, code = {https://github.com/cindermond/tiny-attn} }
EMNLP
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

Shuo Xie, Jiahao Qiu, Ankita Pasad, Li Du, Qing Qu, and Hongyuan Mei

In Findings of EMNLP 2022

Abs arXiv Bib Code Talk

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on which subset of the layers to adapt and where to place the classifier. The goal is to reduce the computation cost of transfer learning methods (e.g. fine-tuning or adapter-tuning) without sacrificing its performance. We propose to select layers based on the variability of their hidden states given a task-specific corpus. We say a layer is already "well-specialized" in a task if the within-class variability of its hidden states is low relative to the between-class variability. Our variability metric is cheap to compute and doesn’t need any training or hyperparameter tuning. It is robust to data imbalance and data scarcity. Extensive experiments on the GLUE benchmark demonstrate that selecting layers based on our metric can yield significantly stronger performance than using the same number of top layers and often match the performance of fine-tuning or adapter-tuning the entire language model.
@inproceedings{xie2022nc, abbr = {EMNLP}, bibtex_show = {true}, author = {Xie, Shuo and Qiu, Jiahao and Pasad, Ankita and Du, Li and Qu, Qing and Mei, Hongyuan}, title = {Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning}, booktitle = {Findings of EMNLP}, year = {2022}, arxiv = {2210.10041}, code = {https://github.com/shuox20/variability-efficient-tuning} }
NeurIPS
HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences

Siqiao Xue, Xiaoming Shi, James Y Zhang, and Hongyuan Mei

In NeurIPS 2022

Abs arXiv Bib Code Talk

In this paper, we tackle the important yet under-investigated problem of making long-horizon prediction of event sequences. Existing state-of-the-art models do not perform well at this task due to their autoregressive structure. We propose HYPRO, a hybridly normalized probabilistic model that naturally fits this task: its first part is an autoregressive base model that learns to propose predictions; its second part is an energy function that learns to reweight the proposals such that more realistic predictions end up with higher probabilities. We also propose efficient training and inference algorithms for this model. Experiments on multiple real-world datasets demonstrate that our proposed HYPRO model can significantly outperform previous models at making long-horizon predictions of future events. We also conduct a range of ablation studies to investigate the effectiveness of each component of our proposed methods.
@inproceedings{xue2022hypro, abbr = {NeurIPS}, bibtex_show = {true}, author = {Xue, Siqiao and Shi, Xiaoming and Zhang, James Y and Mei, Hongyuan}, title = {HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences}, booktitle = {NeurIPS}, year = {2022}, arxiv = {2210.01753}, code = {https://github.com/iLampard/hypro_tpp} }
ICLR
Transformer Embeddings of Irregularly Spaced Events and Their Participants

Chenghao Yang, Hongyuan Mei, and Jason Eisner

In ICLR 2022

Abs arXiv Bib Code Talk

We propose an approach to modeling irregularly spaced sequences of discrete events. We begin with a continuous-time variant of the Transformer, which was originally formulated (Vaswani et al., 2017) for sequences without timestamps. We embed a possible event (or other boolean fact) at time t by using attention over the events that occurred at times < t (and the facts that were true when they occurred). We control this attention using pattern-matching logic rules that relate events and facts that share participants. These rules determine which previous events will be attended to, as well as how to transform the embeddings of the events and facts into the attentional queries, keys, and values. Other logic rules describe how to change the set of facts in response to events. Our approach closely follows Mei et al. (2020a), and adopts their Datalog Through Time formalism for logic rules. As in that work, a domain expert first writes a set of logic rules that establishes the set of possible events and other facts at each time t. Each possible event or other fact is embedded using a neural architecture that is derived from the rules that established it. Our only difference from Mei et al. (2020a) is that we derive a flatter, attention-based neural architecture whereas they used a more serial LSTM architecture. We find that our attention-based approach performs about equally well on the RoboCup dataset, where the logic rules play an important role in improving performance. We also compared these two methods with two previous attention-based methods (Zuo et al., 2020; Zhang et al., 2020a) on simpler synthetic and real domains without logic rules, and found our proposed approach to be at least as good, and sometimes better, than each of the other three methods.
@inproceedings{yang2022andtt, abbr = {ICLR}, bibtex_show = {true}, author = {Yang, Chenghao and Mei, Hongyuan and Eisner, Jason}, title = {Transformer Embeddings of Irregularly Spaced Events and Their Participants}, booktitle = {ICLR}, year = {2022}, arxiv = {2201.00044}, code = {https://github.com/yangalan123/anhp-andtt} }
BA
Personalized Dynamic Treatment Regimes in Continuous Time: A Bayesian Joint Model for Optimizing Clinical Decisions with Timing

William Hua, Hongyuan Mei, Sarah Zohar, Magali Giral, and Yanxun Xu

Bayesian Analysis 2022

Abs arXiv Bib Code Talk

Accurate models of clinical actions and their impacts on disease progression are critical for estimating personalized optimal dynamic treatment regimes (DTRs) in medical/health research, especially in managing chronic conditions. Traditional statistical methods for DTRs usually focus on estimating the optimal treatment or dosage at each given medical intervention, but overlook the important question of "when this intervention should happen." We fill this gap by developing a two-step Bayesian approach to optimize clinical decisions with timing. In the first step, we build a generative model for a sequence of medical interventions-which are discrete events in continuous time-with a marked temporal point process (MTPP) where the mark is the assigned treatment or dosage. Then this clinical action model is embedded into a Bayesian joint framework where the other components model clinical observations including longitudinal medical measurements and time-to-event data conditional on treatment histories. In the second step, we propose a policy gradient method to learn the personalized optimal clinical decision that maximizes the patient survival by interacting the MTPP with the model on clinical observations while accounting for uncertainties in clinical observations learned from the posterior inference of the Bayesian joint model in the first step. A signature application of the proposed approach is to schedule follow-up visitations and assign a dosage at each visitation for patients after kidney transplantation. We evaluate our approach with comparison to alternative methods on both simulated and real-world datasets. In our experiments, the personalized decisions made by the proposed method are clinically useful: they are interpretable and successfully help improve patient survival.
@article{hua2022personalized, abbr = {BA}, bibtex_show = {true}, title = {Personalized Dynamic Treatment Regimes in Continuous Time: A {B}ayesian Joint Model for Optimizing Clinical Decisions with Timing}, author = {Hua, William and Mei, Hongyuan and Zohar, Sarah and Giral, Magali and Xu, Yanxun}, journal = {Bayesian Analysis}, year = {2022}, arxiv = {2007.04155}, code = {https://github.com/YanxunXu/doct} }

2020

NeurIPS
Noise-Contrastive Estimation for Multivariate Point Processes

Hongyuan Mei, Tom Wan, and Jason Eisner

In NeurIPS 2020

Abs arXiv Bib Code Slides Talk

The log-likelihood of a generative model often involves both positive and negative terms. For a temporal multivariate point process, the negative term sums over all the possible event types at each time and also integrates over all the possible times. As a result, maximum likelihood estimation is expensive. We show how to instead apply a version of noise-contrastive estimation—a general parameter estimation method with a less expensive stochastic objective. Our specific instantiation of this general idea works out in an interestingly non-trivial way and has provable guarantees for its optimality, consistency and efficiency. On several synthetic and real-world datasets, our method shows benefits: for the model to achieve the same level of log-likelihood on held-out data, our method needs considerably fewer function evaluations and less wall-clock time.
@inproceedings{mei2020nce, abbr = {NeurIPS}, bibtex_show = {true}, author = {Mei, Hongyuan and Wan, Tom and Eisner, Jason}, title = {Noise-Contrastive Estimation for Multivariate Point Processes}, booktitle = {NeurIPS}, year = {2020}, arxiv = {2011.00717}, code = {https://github.com/HMEIatJHU/nce-mpp}, talk = {https://youtu.be/2GkZfl9NtO0}, slides = {mei+wan+eisner.neurips20.talk.pdf} }
ICML
Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification

Hongyuan Mei, Guanghui Qin, Minjie Xu, and Jason Eisner

In ICML 2020

Abs arXiv Bib Code Slides Talk

Learning how to predict future events from patterns of past events is difficult when the set of possible event types is large. Training an unrestricted neural model might overfit to spurious patterns. To exploit domain-specific knowledge of how past events might affect an event’s present probability, we propose using a temporal deductive database to track structured facts over time. Rules serve to prove facts from other facts and from past events. Each fact has a time-varying state—a vector computed by a neural net whose topology is determined by the fact’s provenance, including its experience of past events. The possible event types at any time are given by special facts, whose probabilities are neurally modeled alongside their states. In both synthetic and real-world domains, we show that neural probabilistic models derived from concise Datalog programs improve prediction by encoding appropriate domain knowledge in their architecture.
@inproceedings{mei2020datalog, abbr = {ICML}, bibtex_show = {true}, author = {Mei, Hongyuan and Qin, Guanghui and Xu, Minjie and Eisner, Jason}, title = {Neural {D}atalog Through Time: {I}nformed Temporal Modeling via Logical Specification}, booktitle = {ICML}, year = {2020}, arxiv = {2006.16723}, code = {https://github.com/HMEIatJHU/neural-datalog-through-time}, talk = {https://youtu.be/mp79uOO5ZuA}, slides = {mei+qin+xu+eisner.icml20.talk.pdf} }

2019

ICML
Imputing Missing Events in Continuous-Time Event Streams

Hongyuan Mei, Guanghui Qin, and Jason Eisner

In ICML 2019

Abs arXiv Bib Code Poster Slides Talk

Events in the world may be caused by other, unobserved events. We consider sequences of events in continuous time. Given a probability model of complete sequences, we propose particle smoothing—a form of sequential importance sampling—to impute the missing events in an incomplete sequence. We develop a trainable family of proposal distributions based on a type of bidirectional continuous-time LSTM: Bidirectionality lets the proposals condition on future observations, not just on the past as in particle filtering. Our method can sample an ensemble of possible complete sequences (particles), from which we form a single consensus prediction that has low Bayes risk under our chosen loss metric. We experiment in multiple synthetic and real domains, using different missingness mechanisms, and modeling the complete sequences in each domain with a neural Hawkes process (Mei & Eisner 2017). On held-out incomplete sequences, our method is effective at inferring the ground-truth unobserved events, with particle smoothing consistently improving upon particle filtering.
@inproceedings{mei2019smoothing, abbr = {ICML}, bibtex_show = {true}, author = {Mei, Hongyuan and Qin, Guanghui and Eisner, Jason}, title = {Imputing Missing Events in Continuous-Time Event Streams}, booktitle = {ICML}, year = {2019}, arxiv = {1905.05570}, code = {https://github.com/HMEIatJHU/neural-hawkes-particle-smoothing}, poster = {mei+qin+eisner.icml19.poster.pdf}, slides = {mei+qin+eisner.icml19.talk.pdf} }
NAACL
On the Idiosyncrasies of the Mandarin Chinese Classifier System

Shijia Liu, Hongyuan Mei, Adina Williams, and Ryan Cotterell

In NAACL 2019

Abs arXiv Bib

While idiosyncrasies of the Chinese classifier system have been a richly studied topic among linguists (Adams and Conklin, 1973; Erbaugh, 1986; Lakoff, 1986), not much work has been done to quantify them with statistical methods. In this paper, we introduce an information-theoretic approach to measuring idiosyncrasy; we examine how much the uncertainty in Mandarin Chinese classifiers can be reduced by knowing semantic information about the nouns that the classifiers modify. Using the empirical distribution of classifiers from the parsed Chinese Gigaword corpus (Graff et al., 2005), we compute the mutual information (in bits) between the distribution over classifiers and distributions over other linguistic quantities. We investigate whether semantic classes of nouns and adjectives differ in how much they reduce uncertainty in classifier choice, and find that it is not fully idiosyncratic; while there are no obvious trends for the majority of semantic classes, shape nouns reduce uncertainty in classifier choice the most.
@inproceedings{mei2019classifier, abbr = {NAACL}, bibtex_show = {true}, author = {Liu, Shijia and Mei, Hongyuan and Williams, Adina and Cotterell, Ryan}, title = {On the Idiosyncrasies of the {M}andarin {C}hinese Classifier System}, booktitle = {NAACL}, year = {2019}, arxiv = {1902.10193} }

2018

*SEM
Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction

Hongyuan Mei*, Sheng Zhang*, Kevin Duh, and Benjamin Van Durme

In Joint Conference on Lexical and Computational Semantics 2018

Abs arXiv Bib

Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios. To tackle this challenge, we propose a training method, called Halo, which enforces the local region of each hidden state of a neural model to only generate target tokens with the same semantic structure tag. This simple but powerful technique enables a neural model to learn semantics-aware representations that are robust to noise, without introducing any extra parameter, thus yielding better generalization in both high and low resource settings.
@inproceedings{mei2018halo, abbr = {*SEM}, bibtex_show = {true}, author = {Mei*, Hongyuan and Zhang*, Sheng and Duh, Kevin and Durme, Benjamin Van}, title = {Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction}, booktitle = {Joint Conference on Lexical and Computational Semantics}, year = {2018}, arxiv = {1805.08271} }

2017

NeurIPS
The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process

Hongyuan Mei, and Jason Eisner

In NeurIPS 2017

Abs arXiv Bib Code Poster Talk

Many events occur in the world. Some event types are stochastically excited or inhibited—in the sense of having their probabilities elevated or decreased—by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. We model streams of discrete events in continuous time, by constructing a neurally self-modulating multivariate point process in which the intensities of multiple event types evolve according to a novel continuous-time LSTM. This generative model allows past events to influence the future in complex and realistic ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. Our model has desirable qualitative properties. It achieves competitive likelihood and predictive accuracy on real and synthetic datasets, including under missing-data conditions.
@inproceedings{mei2017neuralhawkes, abbr = {NeurIPS}, bibtex_show = {true}, author = {Mei, Hongyuan and Eisner, Jason}, title = {The Neural {H}awkes Process: {A} Neurally Self-Modulating Multivariate Point Process}, booktitle = {NeurIPS}, year = {2017}, arxiv = {1612.09328}, code = {https://github.com/HMEIatJHU/neurawkes}, talk = {https://youtu.be/G7JfYnSlKUM}, poster = {mei+eisner.nips17.poster.pdf} }
AAAI
Coherent Dialogue with Attention-based Language Models

Hongyuan Mei, Mohit Bansal, and Matthew R. Walter

In AAAI 2017

Abs arXiv Bib

We model coherent conversation continuation via RNN-based dialogue models equipped with a dynamic attention mechanism. Our attention-RNN language model dynamically increases the scope of attention on the history as the conversation continues, as opposed to standard attention (or alignment) models with a fixed input scope in a sequence-to-sequence model. This allows each generated word to be associated with the most relevant words in its corresponding conversation history. We evaluate the model on two popular dialogue datasets, the open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot dataset, and achieve significant improvements over the state-of-the-art and baselines on several metrics, including complementary diversity-based metrics, human evaluation, and qualitative visualizations. We also show that a vanilla RNN with dynamic attention outperforms more complex memory models (e.g., LSTM and GRU) by allowing for flexible, long-distance memory. We promote further coherence via topic modeling-based reranking.
@inproceedings{mei2017coherent, abbr = {AAAI}, bibtex_show = {true}, title = {Coherent Dialogue with Attention-based Language Models}, author = {Mei, Hongyuan and Bansal, Mohit and Walter, Matthew R.}, booktitle = {AAAI}, year = {2017}, arxiv = {1611.06997} }

2016

NAACL
What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment

Hongyuan Mei, Mohit Bansal, and Matthew R. Walter

In NAACL 2016

Abs arXiv Bib Code Talk

We propose an end-to-end, domain-independent neural encoder-aligner-decoder model for selective generation, i.e., the joint task of content selection and surface realization. Our model first encodes a full set of over-determined database event records via an LSTM-based recurrent neural network, then utilizes a novel coarse-to-fine aligner to identify the small subset of salient records to talk about, and finally employs a decoder to generate free-form descriptions of the aligned, selected records. Our model achieves the best selection and generation results reported to-date (with 59% relative improvement in generation) on the benchmark WeatherGov dataset, despite using no specialized features or linguistic resources. Using an improved k-nearest neighbor beam filter helps further. We also perform a series of ablations and visualizations to elucidate the contributions of our key model components. Lastly, we evaluate the generalizability of our model on the RoboCup dataset, and get results that are competitive with or better than the state-of-the-art, despite being severely data-starved.
@inproceedings{mei2016selective, abbr = {NAACL}, bibtex_show = {true}, title = {What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment}, author = {Mei, Hongyuan and Bansal, Mohit and Walter, Matthew R.}, booktitle = {NAACL}, year = {2016}, arxiv = {1509.00838}, code = {https://github.com/HMEIatJHU/SelectiveGeneration} }
AAAI
Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences

Hongyuan Mei, Mohit Bansal, and Matthew R. Walter

In AAAI 2016

Abs arXiv Bib Code Talk

We propose a neural sequence-to-sequence model for direction following, a task that is essential to realizing effective autonomous agents. Our alignment-based encoder-decoder model with long short-term memory recurrent neural networks (LSTM-RNN) translates natural language instructions to action sequences based upon a representation of the observable world state. We introduce a multi-level aligner that empowers our model to focus on sentence "regions" salient to the current world state by using multiple abstractions of the input sentence. In contrast to existing methods, our model uses no specialized linguistic resources (e.g., parsers) or task-specific annotations (e.g., seed lexicons). It is therefore generalizable, yet still achieves the best results reported to-date on a benchmark single-sentence dataset and competitive results for the limited-training multi-sentence setting. We analyze our model through a series of ablations that elucidate the contributions of the primary components of our model.
@inproceedings{mei2016navigational, abbr = {AAAI}, bibtex_show = {true}, title = {Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences}, author = {Mei, Hongyuan and Bansal, Mohit and Walter, Matthew R.}, booktitle = {AAAI}, year = {2016}, arxiv = {1506.04089}, code = {https://github.com/HMEIatJHU/NeuralWalker} }