Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines, primarily assessed through tasks in language generation, knowledge utilization, and complex reasoning. However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated. Here, we assessed LLMs’ Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding, which is necessary for effective communication and social interactions. Specifically, we first developed a novel psychometric assessment focusing on Emotion Understanding (EU), a core component of EI. This test is an objective, performance-driven, and text-based evaluation, which requires evaluating complex emotions in realistic scenarios, providing a consistent assessment for both human and LLM capabilities. With a reference frame constructed from over 500 adults, we tested a variety of mainstream LLMs. Most achieved above-average EQ scores, with GPT-4 exceeding 89% of human participants with an EQ of 117. Interestingly, a multivariate pattern analysis revealed that some LLMs apparently did not rely on the human-like mechanism to achieve human-level performance, as their representational patterns were qualitatively distinct from humans. In addition, we discussed the impact of factors such as model size, training method, and architecture on LLMs’ EQ. In summary, our study presents one of the first psychometric evaluations of the human-like characteristics of LLMs, which may shed light on the future development of LLMs aiming for both high intellectual and emotional intelligence. Project website: https://emotional-intelligence.github.io/
Abstract: Mental health-related issues are posing increasing threats to public health worldwide. Many works have explored NLP techniques to perform mental health analysis in a discriminative manner, but bear the key limitation of lacking interpretability, especially in such a sensitive domain. The latest large language models (LLMs), such as ChatGPT and GPT-4, exhibit strong promise in improving the performance of mental health analysis. In this talk, we report our comprehensive evaluation of different prompting strategies on LLMs’ mental health analysis ability, including few-shot learning, chain-of-thought prompting, and emotion supervision signals. We also explore LLMs for interpretable mental health analysis by instructing them to generate explanations for each of their decisions. We convey strict human evaluations to assess the quality of the generated explanations. With full investigations of existing LLMs, we formally model interpretable mental health analysis as text generation tasks, and build the first multi-task and multi-source interpretable mental health instruction (IMHI) dataset. Based on the IMHI dataset and LLaMA2 foundation models, we train MentaLLaMA, the first open-source instruction-following LLM series for interpretable mental health analysis.
Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this talk, a taxonomy of explainability techniques and a structured overview of methods for explaining transformer-based language models will be included. These techniques will be introduced on the basis of training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, a typical method will be explained in detail for each different category of techniques. Lastly, some viewpoints about key challenges and emerging opportunities for explanation techniques in the era of LLMs will be shared.
The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing PLIP, a multimodal AI with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: For zero-shot classification, PLIP achieves F1 scores of 0.565 to 0.832 compared to F1 scores of 0.030 to 0.481 for previous contrastive language-image pre-trained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical AI for enhancing diagnosis, knowledge sharing and education.
Despite the demonstrated effectiveness of large language models (LLMs) in natural language processing (NLP), their applications in finance have been limited, due to the lack of publicly available financial LLMs, instruction datasets for fine-tuning, and evaluation benchmarks. This talk will introduce our recent work PIXIU, a comprehensive framework including the first open-sourced financial LLM, the supportive instruction data with 136K data samples, and a comprehensive evaluation benchmark featuring 5 tasks and 9 datasets. We will delve into the development of the financial LLM, the assembly of the dataset, and the creation of the benchmark. We hope to foster continual progress in financial AI research by sharing our resources.
Recent development of Large Language Models (LLMs) unveils exciting and uncharted territory of multi-agent interactions between multiple LLMs. Our focus will be on how multiple LLMs can collaborate and compete within complex scenarios and games. To study this, we have crafted ChatArena, an innovative library that fosters the creation of multi-agent language game environments and encourages research on the autonomous behaviour and social interaction of LLM agents. The talk will elucidate the core features of ChatArena, such as its flexible framework for defining multiple players and environments, built on the Markov Decision Process, a collection of language game environments for understanding and benchmarking agent LLMs, and its user-friendly interfaces, including Web UI and CLI for developing and engineering LLM agents.