Past Talks

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

Abstract: Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose a novel Multi-disciplinary Collaboration (MC) framework for the medical domain that leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MC framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities.

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Tairan He | Carnegie Mellon University

9AM (UTC-7) @USA | 5PM (UTC+1) @UK | 20/07/2024

Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

GraphFM: A Comprehensive Benchmark for Graph Foundation Model

Yuhao Xu | Tongji University

8PM (UTC+8) @Beijing | 1PM (UTC+1) @London | 13 July 2024

Abstract: Foundation Models (FMs) serve as a general class for the development of artificial intelligence systems, offering broad potential for generalization across a spectrum of downstream tasks. Despite extensive research into self-supervised learning as the cornerstone of FMs, several outstanding issues persist in Graph Foundation Models that rely on graph self-supervised learning, namely: 1) Homogenization. The extent of generalization capability on downstream tasks remains unclear. 2) Scalability. It is unknown how effectively these models can scale to large datasets. 3) Efficiency. The training time and memory usage of these models require evaluation. 4) Training Stop Criteria. Determining the optimal stopping strategy for pre-training across multiple tasks to maximize performance on downstream tasks. To address these questions, we have constructed a rigorous benchmark that thoroughly analyzes and studies the generalization and scalability of self-supervised Graph Neural Network (GNN) models. Regarding generalization, we have implemented and compared the performance of various self-supervised GNN models, trained to generate node representations, across tasks such as node classification, link prediction, and node clustering. For scalability, we have compared the performance of various models after training using full-batch and mini-batch strategies. Additionally, we have assessed the training efficiency of these models by conducting experiments to test their GPU memory usage and throughput. Through these experiments, we aim to provide insights to motivate future research.

In-context Learning of Large Language Model

Ruiqi Zhang |University of California, Berkeley

10AM (UTC-7) @ Berkeley | 6PM (UTC+1) @London| 20th April 2024

Ruiqi Zhang: I am Ruiqi Zhang, a second-year Ph.D. student at UC Berkeley, advised by Prof. Peter Bartlett. I mainly focus on theoretical deep learning, sequential decision making, and the theory and application of LLM Alignment.

Abstract: Attention-based neural networks such as transformers have demonstrated a remarkable ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an unseen task, they can formulate relevant per-token and next-token predictions without any parameter updates. By embedding a sequence of labeled training data and unlabeled test data as a prompt, this allows for transformers to behave like supervised learning algorithms. Indeed, recent work has shown that when training transformer architectures over random instances of linear regression problems, these models' predictions mimic those of ordinary least squares.

Towards understanding the mechanisms underlying this phenomenon, we investigate the dynamics of ICL in transformers with a single linear self-attention layer trained by gradient flow on linear regression tasks. We show that despite non-convexity, gradient flow with a suitable random initialization finds a global minimum of the objective function. At this global minimum, when given a test prompt of labeled examples from a new prediction task, the transformer achieves prediction error competitive with the best linear predictor over the test prompt distribution. We additionally characterize the robustness of the trained transformer to a variety of distribution shifts and show that although a number of shifts are tolerated, shifts in the covariate distribution of the prompts are not. Motivated by this, we consider a generalized ICL setting where the covariate distributions can vary across prompts. We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts. We complement this finding with experiments on large, nonlinear transformer architectures which we show are more robust under covariate shifts.

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

ICLR'24 paper code

Ming Jin | Monash University

9PM (GMT+11) @Melbourne| 11AM (GMT+1) @London| 6th April 2024 | Slide

Abstract: Time series forecasting holds significant importance in many real-world dynamic systems and has been extensively studied. Unlike natural language process (NLP) and computer vision (CV), where a single large model can tackle multiple tasks, models for time series forecasting are often specialized, necessitating distinct designs for different tasks and applications. While pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have revealed that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities. In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. We begin by reprogramming the input time series with text prototypes before feeding it into the frozen LLM to align the two modalities. To augment the LLM's ability to reason with time series data, we propose Prompt-as-Prefix (PaP), which enriches the input context and directs the transformation of reprogrammed input patches. The transformed time series patches from the LLM are finally projected to obtain the forecasts. Our comprehensive evaluations demonstrate that Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models. Moreover, Time-LLM excels in both few-shot and zero-shot learning scenarios.

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning

NeurIPS 2023 paper code page

Zirui Zhao| National University of Singapore

7PM (GMT+8) @Singapore | 11AM (GMT+0) @London| 30th March 2024

Abstract: Large-scale task planning is a major challenge. Recent work exploits large language models (LLMs) directly as a policy and shows surprisingly interesting results. This paper shows that LLMs provide a commonsense model of the world in addition to a policy that acts on it. The world model and the policy can be combined in a search algorithm, such as Monte Carlo Tree Search (MCTS), to scale up task planning. In our new LLM-MCTS algorithm, the LLM-induced world model provides a commonsense prior belief for MCTS to achieve effective reasoning; the LLM-induced policy acts as a heuristic to guide the search, vastly improving search efficiency. Experiments show that LLM-MCTS outperforms both MCTS alone and policies induced by LLMs (GPT2 and GPT3.5) by a wide margin, for complex, novel tasks. Further experiments and analyses on multiple tasks---multiplication, multi-hop travel planning, object rearrangement---suggest minimum description length (MDL) as a general guiding principle: if the description length of the world model is substantially smaller than that of the policy, using LLM as a world model for model-based planning is likely better than using LLM solely as a policy.

Graph Language Models

GraphGPT HiGPT

Jiabin Tang | Data Intelligence Lab, The University of Hong Kong

8PM(GMT+8) @Hongkong | 12PM(GMT+0) @London | 24th March 2024

Abstract: In the realm of graph-based research, understanding and leveraging graph structures has become increasingly important, given their wide range of applications in network analysis, bioinformatics and urban science. Graph Neural Networks (GNNs) and their heterogeneous counterparts (HGNNs) have emerged as powerful tools for capturing the intricate relationships within graph data. However, despite their advancements, these models often struggle with generalization in zero-shot learning scenarios and across diverse heterogeneous graph datasets, especially in the absence of abundant labeled data for fine-tuning. Addressing these challenges, we recently introduce two novel frameworks, i.e., “GraphGPT: Graph Instruction Tuning for Large Language Models” and “HiGPT: Heterogeneous Graph Language Model”, which are designed to enhance the adaptability and applicability of graph models in various contexts. GraphGPT presents a pioneering approach by integrating Large Language Models (LLMs) with graph structural knowledge through a graph instruction tuning paradigm. This model leverages a text-graph grounding component and a dual-stage instruction tuning process, incorporating self-supervised graph structural signals and task-specific instructions. This technique enables the model to comprehend complex graph structures and achieve remarkable generalization across different tasks without the need for downstream graph data. On the other hand, HiGPT focuses on heterogeneous graph learning by introducing a heterogeneous graph instruction-tuning paradigm that eliminates the need for dataset-specific fine-tuning. It features an in-context heterogeneous graph tokenizer and employs a large corpus of heterogeneity-aware graph instructions, complemented by a Mixture-of-Thought (MoT) instruction augmentation strategy. This allows HiGPT to adeptly handle distribution shifts in node token sets and relation type heterogeneity, thereby significantly improving its generalization capabilities across various learning tasks.

Knowledge Editing for Large Language Models

Paper 1 Code and Paper 2 Code

Jun-Yu Ma | University of Science and Technology of China

10AM(GMT+8) 16 March 2024@Beijing | Saturday

2AM(GMT+0) 16 March 2024@London | Saturday

10PM(GMT-4) 15 March 2024@New York | Friday

Bio: Jun-Yu Ma is currently a third year Ph.D. student at University of Science and Technology of China, supervised by Prof. Zhen-Hua Ling. His main research interests lie within deep learning for natural language processing, and he is particularly interested in multilinguality, information extraction and model editing.

Abstract: Large language models (LLMs) are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in knowledge editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. In this talk, we study bidirectional language model editing, aiming to provide rigorous evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. We surprisingly observe that current editing methods and LLMs, while effective in recalling editing facts in the direction of editing, suffer serious deficiencies when evaluated in the reverse direction. Besides, we also study whether updating new knowledge to LLMs perturbs the neighboring knowledge encapsulated within them. A plug-and-play framework termed Appending via Preservation and Prevention (APP) is proposed to mitigate the neighboring perturbation by maintaining the integrity of the answer list.

Interactive AI Systems Specialized in Social Influence

Dr. Weiyan Shi | Stanford University NLP Group & Northeastern University

Bio:

Dr. Weiyan Shi is an incoming Assistant Professor at Northeastern University starting in 2024. She will spend 2023-2024 as a postdoc at Stanford NLP. Her research interests are in Natural Language Processing (NLP), especially in social influence dialogue systems such as persuasion, negotiation, and recommendation. She has also worked on privacy-preserving NLP applications. She is recognized as a Rising Star in Machine Learning by the University of Maryland. Her work on personalized persuasive dialogue systems was nominated for ACL 2019 best paper. She was also a core team member behind a Science publication on the first negotiation AI agent, Cicero, that achieves a human level in the game of Diplomacy. This work has been featured in The New York Times, The Washington Post, MIT Technology Review, Forbes, and other major media outlets.

Dr. Weiyan Shi is looking for Master/PhD/Internship students and to join her lab - CHATS Lab (Conversation, Human-AI Tech, Security). More infromation is here.

Abstract:

AI research has so far focused on modeling common human skills, such as building systems to see, read, or talk. As these systems gradually achieve a human level in standard benchmarks, it is increasingly important to develop next-generation interactive AI systems with more advanced human skills, to function in realistic and critical applications such as providing personalized emotional support. In this talk, I will cover (1) how to build such expert-like AI systems specialized in social influence that can persuade, negotiate, and cooperate with other humans during conversations. (2) I will also discuss how humans perceive such specialized AI systems. This study validates the necessity of Autobot Law and proposes guidance to regulate such systems. (3) As these systems become more powerful, they are also more prone to leak users' private information. So I will describe our proposed new privacy notion, Selective Differential Privacy, and an algorithm to train privacy-preserving models with high utilities. Finally, I will conclude with my long-term vision to build a natural interface between human intelligence and machine intelligence via dialogues, from a multi-angel approach that combines Artificial Intelligence, Human-Computer Interaction, and social sciences, to develop expert AI systems for everyone.

Distilling ChatGPT for Explainable Automated Student Answer Assessment

Jiazheng Li | King's College London

12PM (GMT+0) @London | Date: Dec. 2, 2023

Abstract: Providing explainable and faithful feedback is crucial for automated student answer assessment. In this talk, we introduce a novel framework that explores using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation. We identify the appropriate instructions by prompting ChatGPT with different templates to collect the rationales, where inconsistent rationales are refined to align with marking standards. The refined ChatGPT outputs enable us to fine-tune a smaller language model that simultaneously assesses student answers and provides rationales. Extensive experiments on the benchmark dataset show that the proposed method improves the overall QWK score by 11% compared to ChatGPT. Furthermore, our thorough analysis and human evaluation demonstrate that the rationales generated by our proposed method are comparable to those of ChatGPT. Our approach provides a viable solution to achieve explainable automated assessment in education.

Emotional Intelligence of Large Language Models

Dr. Xuena Wang | Tsinghua Laboratory of Brain and Intelligence, Tsinghua University

12PM (GMT+0) @London | Date: Nov. 11, 2023

Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines, primarily assessed through tasks in language generation, knowledge utilization, and complex reasoning. However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated. Here, we assessed LLMs’ Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding, which is necessary for effective communication and social interactions. Specifically, we first developed a novel psychometric assessment focusing on Emotion Understanding (EU), a core component of EI. This test is an objective, performance-driven, and text-based evaluation, which requires evaluating complex emotions in realistic scenarios, providing a consistent assessment for both human and LLM capabilities. With a reference frame constructed from over 500 adults, we tested a variety of mainstream LLMs. Most achieved above-average EQ scores, with GPT-4 exceeding 89% of human participants with an EQ of 117. Interestingly, a multivariate pattern analysis revealed that some LLMs apparently did not rely on the human-like mechanism to achieve human-level performance, as their representational patterns were qualitatively distinct from humans. In addition, we discussed the impact of factors such as model size, training method, and architecture on LLMs’ EQ. In summary, our study presents one of the first psychometric evaluations of the human-like characteristics of LLMs, which may shed light on the future development of LLMs aiming for both high intellectual and emotional intelligence. Project website: https://emotional-intelligence.github.io/

Towards Interpretable Mental Health Analysis with Large Language Models

Kailai Yang | The University of Manchester

12AM (GMT) @London | Date: Nov. 4, 2023

Abstract: Mental health-related issues are posing increasing threats to public health worldwide. Many works have explored NLP techniques to perform mental health analysis in a discriminative manner, but bear the key limitation of lacking interpretability, especially in such a sensitive domain. The latest large language models (LLMs), such as ChatGPT and GPT-4, exhibit strong promise in improving the performance of mental health analysis. In this talk, we report our comprehensive evaluation of different prompting strategies on LLMs’ mental health analysis ability, including few-shot learning, chain-of-thought prompting, and emotion supervision signals. We also explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions. We convey strict human evaluations to assess the quality of the generated explanations. With full investigations of existing LLMs, we formally model interpretable mental health analysis as text generation tasks, and build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset. Based on the IMHI dataset and LLaMA2 foundation models, we train MentaLLaMA, the first open-source instruction-following LLM series for interpretable mental health analysis.

Explainability for Large Language Models

Haiyan Zhao | New Jersey Institute of Technology

10AM (GMT-4) @New Jersey | 3PM (BST) @London| 28th Oct 2023

Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this talk, a taxonomy of explainability techniques and a structured overview of methods for explaining transformer-based language models will be included. These techniques will be introduced on the basis of training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, a typical method will be explained in detail for each different category of techniques. Lastly, some viewpoints about key challenges and emerging opportunities for explanation techniques in the era of LLMs will be shared.

A visual–language foundation model for pathology image analysis using medical Twitter

Dr. Zhi Huang | Stanford University

9AM (GMT-7) @Stanford | 5PM(GMT+1) @London| 16 Sept. 2023

Abstract:

The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing PLIP, a multimodal AI with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: For zero-shot classification, PLIP achieves F1 scores of 0.565 to 0.832 compared to F1 scores of 0.030 to 0.481 for previous contrastive language-image pre-trained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical AI for enhancing diagnosis, knowledge sharing and education.

PIXIU: A Large Language Model, Instruction Data and Benchmark for Finance

Dr. Qianqian Xie | Yale University

8AM@Yale | 9 September 2023

Abstract:

Despite the demonstrated effectiveness of large language models (LLMs) in natural language processing (NLP), their applications in finance have been limited, due to the lack of publicly available financial LLMs, instruction datasets for fine-tuning, and evaluation benchmarks. This talk will introduce our recent work PIXIU, a comprehensive framework including the first open-sourced financial LLM, the supportive instruction data with 136K data samples, and a comprehensive evaluation benchmark featuring 5 tasks and 9 datasets. We will delve into the development of the financial LLM, the assembly of the dataset, and the creation of the benchmark. We hope to foster continual progress in financial AI research by sharing our resources.

Quantum Chemistry Computing with Language Models

Prof. Honghui Shang | Chinese Academy of Sciences

8PM@Beijing | 12 August 2023

ChatArena: Multi-Agent Language Game Environments for Large Language Models

Yuxiang Wu | UCL NLP Group

1PM@London | 8th July 2023

Abstract:

Recent development of Large Language Models (LLMs) unveils exciting and uncharted territory of multi-agent interactions between multiple LLMs. Our focus will be on how multiple LLMs can collaborate and compete within complex scenarios and games. To study this, we have crafted ChatArena, an innovative library that fosters the creation of multi-agent language game environments and encourages research on the autonomous behaviour and social interaction of LLM agents. The talk will elucidate the core features of ChatArena, such as its flexible framework for defining multiple players and environments, built on the Markov Decision Process, a collection of language game environments for understanding and benchmarking agent LLMs, and its user-friendly interfaces, including Web UI and CLI for developing and engineering LLM agents.

We welcome you to discover more about our project and to engage with a live demonstration at chatarena and demo, respectively.

A Dive into APP Development on Large Language Models

Moshi Wei | York University, Canada

8AM@Toronto | 1st July 2023

Page updated

Google Sites

Report abuse