The School of Computing and Data Science (https://www.cds.hku.hk/) was established by the University of Hong Kong on 1 July 2024, comprising the Department of Computer Science and Department of Statistics and Actuarial Science and Department of AI and Data Science.

Events for
Past Seminars and Events
December 09, 2025
  • Title: Genetic and Epigenetic Landscape of Self-Identified Hispanics in All Of Us

    Time: 11:30am 

    Venue: CB 308

    Speaker(s): Dr. Fritz Sedlazeck

    Remark(s): 

    Abstract

    Hispanic populations in the United States are highly admixed and genetically diverse, yet remain underrepresented in genomic studies. To address this, we present the first large-scale long-read sequencing analysis of 1,490 self-reported Hispanic individuals from the All of Us Research Program, capturing small variants, structural variants, tandem repeats (TRs), and CpG methylation. We characterize global and local ancestry across the cohort, enabling ancestry-aware analysis of genetic and epigenetic features. Over 10.3 million previously unknown autosomal variants are identified, including medically relevant alleles stratified by local ancestry and pathogenic risk revealing 402 carriers with potential risk for subsequent generations. We discover 135 individuals with TR alleles exceeding established pathogenic ranges, and conduct the first genome-wide TR-mQTL analysis, identifying 3,329 TR alleles associated with methylation. Allele-specific methylation (ASM) is resolved at >12,000 loci per genome and 24 novel recurrent ASM loci are identified. This includes ancestry specific regulatory activity such as activation of paralogous genes driven by ancestry-enriched variants and epigenetic markers. These findings establish a foundational resource for biomedical research and highlight the critical role of ancestry-aware analyses in understanding gene regulation, disease risk, and personalized medicine.

    About the speaker

    Dr. Fritz Sedlazeck is an Associate Professor at the Human Genome Sequencing Center at Baylor College of Medicine and an Adjunct Associate Professor at Rice University. His research focuses on algorithmic developments and high-performance computing for genomic and genetic applications. Specifically, he studies ways to improve the characterization of complex genomic alterations between individuals’ genomes based on large genomic sequencing data and as such improve our understanding of complex phenotypes such as human diseases.

December 08, 2025
  • Title: Fighting Noise with Noise: Causal Inference with Many Candidate Instruments

    Time: 04:00pm 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Linbo Wang

    Remark(s): 

    Abstract

    Instrumental variable methods provide useful tools for inferring causal effects in the presence of unmeasured confounding. To apply these methods with large-scale data sets, a major challenge is to find valid instruments from a possibly large candidate set. In practice, most of the candidate instruments are often not relevant for studying a particular exposure of interest. Moreover, not all relevant candidate instruments are valid as they may directly influence the outcome of interest. In this article, we propose a data-driven method for causal inference with many candidate instruments that addresses these two challenges simultaneously. A key component of our proposal involves using pseudo variables, known to be irrelevant, to remove variables from the original set that exhibit spurious correlations with the exposure. Synthetic data analyses show that the proposed method performs favourably compared to existing methods. We apply our method to a Mendelian randomization study estimating the effect of obesity on health-related quality of life. .

    About the speaker

    Linbo Wang is an associate professor from the University of Toronto, Canada, and he holds a joint appointment at statistic, mathematics and computer science departments. His research interests are at casual inference and graphical models. Currently he is a Canada Research Chair in Causal Machine Learning.

     

December 01, 2025
  • Title: Reinventing Operations Management’s Research and Practice with Data Science

    Time: 03:00pm 

    Venue: HW312, Haking Wong Building

    Speaker(s): Prof. David Simchi-Levi

    Remark(s): 

    Abstract

    In this talk we show how data-driven research fosters the development of new engineering and scientific methods that explain, predict, and change behavior. We report on a few projects with online and brick-and-mortar retailers where we combine machine learning, optimization and econometrics techniques to improve business performance.

    About the speaker

    Prof. David Simchi-Levi is the MIT William Barton Rogers Professor, named after MIT’s founder and first president, and a Professor of Engineering Systems at MIT. He also leads the MIT Data Science Lab and is widely recognized as a leading authority in supply chain management and business analytics.

    His Ph.D. graduates hold faculty positions at top institutions, including UC Berkeley, Carnegie Mellon, Columbia, Cornell, Duke, Georgia Tech, Harvard, Illinois Urbana-Champaign, Michigan, Purdue, and Virginia Tech.

    Prof. Simchi-Levi served as Editor-in-Chief of Management Science (2018–2023) and previously led Operations Research (2006–2012) and Naval Research Logistics (2003–2005). In 2023, he was elected to the National Academy of Engineering. He received the INFORMS Impact Prize (2020) for pioneering risk mitigation strategies in global supply chains and is an INFORMS Fellow and MSOM Distinguished Fellow. His accolades include the Koopman Award (2020), Ford Engineering Excellence Award (2015), and multiple INFORMS practice prizes.

    An entrepreneur, he founded LogicTools (acquired by IBM in 2009), co-founded OPS Rules (joined Accenture in 2016), and Opalytics (acquired by Accenture Applied Intelligence in 2018).

     

  • Title: The Three Faces of Networking

    Time: 10:30am 

    Venue: CB 308

    Speaker(s): Prof. Ang Chen

    Remark(s): 

    Abstract

    What does a network connect: people? machines? infrastructures? All three are true, but one tends to dominate at any given time in response to societal needs. Telephony networks interconnected people, but the Internet recast communication as connecting machines. This was a subtle yet profound shift—machines have different failures,misbehaviors, and performance goals, which translated into the network design and still define much of our problem space today. As of late, however, another quiet change is playing out which demands a rethinking of networks. Societal infrastructures, such as power grids, water systems, and datacenters, are increasingly interdependent, but historically they were never designed for coordinated operation. Networking at the "infrastructure nexus" is becoming a pressing need, bringing with it a fresh source of research problems.

    About the speaker

    Ang Chen is an Associate Professor in Computer Science and Engineering at the University of Michigan, Ann Arbor. Prior to this, he received his PhD at the University of Pennsylvania, and was a faculty member at Rice University. His research interests are in computer systems, networking, and security. He has received an NSF CAREER Award, a VMWare Early Career Faculty Grant, Best/Distinguished Paper Awards at FAST, APNet,USENIX Security, and the ACM SIGCOMM Rising Star Award.

November 28, 2025
  • Title: Mortality Surface Modeling With Gaussian Processes

    Time: 11:00am 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Mike Ludkovski

    Remark(s): 

    Abstract

    I will discuss several interrelated projects on the use of Gaussian Process (GP) models for longevity analysis. The underlying Age-Period-Cohort structure is well-suited for capturing by a GP in order to address the common actuarial tasks of nowcasting the latest mortality rates and probabilistically projecting them into the future. I will review the GP spatial covariance framework in the context of mortality surfaces and the key steps of kernel and prior mean selection, improvement factor computation, and posterior sampling. Among the various GP implementations we have developed, I will highlight: (i)multi-output GPs for joint analysis of several dozen populations, hierarchically arranged along nationalities, genders and causes-of-death; (ii) compositional GP kernel search to identify the fittest kernels matching the spatio-temporal mortality dynamics in different countries; (iii) deflator GP models to capture the relative mortality of a small pension fund population vis-a-vis a national mortality table. Plentiful illustrations using Human Mortality Database datasets and corresponding insights into evolving mortality patterns will be given. Co-authors include Nhan Huynh, Jimmy Risk, Rodrigo Targino and Eduardo de Melo.

  • Title: Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

    Time: 10:30am 

    Venue: CB 308

    Speaker(s): Prof. Jian Li

    Remark(s): 

    Abstract

    Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales—from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap’s and Zipf’s laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors of LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.

    About the speaker

    Jian Li is a professor at the Institute for Interdisciplinary Information Sciences, Tsinghua University. His research focuses on theoretical computer science, artificial intelligence, FinTech and databases. He has published over 100 papers in major international conferences and journals. His work has received the Best Paper Award at the VLDB conference and the European Symposium on Algorithms (ESA), as well as the Best Newcomer Award at the International Conference on Database Theory (ICDT). Multiple papers of his have been selected for oral presentations or highlighted as spotlight papers. He has led several research projects, including those funded by NSFC and industry projects with companies such as Baidu, Ant Group, ByteDance, E-Fund Management, Huatai Securities etc.

     

November 27, 2025
  • Title: Biomedicine in the Age of AI and Foundation Models

    Time: 02:00pm 

    Venue: HW312, Haking Wong Building

    Speaker(s): Prof. Lei Xing

    Remark(s): 

    Abstract

    AI, driven by deep learning, has garnered significant attention in recent years and is increasingly being adopted for various applications in medical imaging and multi-omics data analysis in biomedicine. The remarkable success of AI and deep learning can be attributed to their unique ability to extract essential features from big data and make accurate inferences. This talk aims to update the audience on the latest advancements in the field of omics data analysis, including foundation models and large language models. It will also address the pitfalls of current data-driven approaches, summarize recent developments in interpretable AI, and offer perspectives on the applications of AI in multi-omics data analysis and precision oncology.

    About the speaker

    Prof. Lei Xing is the Jacob Haimson & Sarah S. Donaldson Professor and Director of Medical Physics Division of Radiation Oncology Department at Stanford University. He also holds affiliate faculty positions in Department of Electrical engineering, Institute for Computational and Mathematical Engineering (ICME), and Molecular Imaging Program at Stanford (MIPS). Prof. Xing obtained his PhD from the Johns Hopkins University in 1992. His research has been focused on AI, biomedical data science, medical imaging and image guided interventions, treatment planning and clinical decision-making. Prof. Xing is an author on more than 500 publications in high impact journals, an inventor on many issued and pending patents, and an investigator on numerous research grants. He is a fellow of AAPM, ASTRO, and AIMBE. He is the recipient of the 2023 Edith Quimby Lifetime Achievement Award of AAPM, which denotes outstanding scientific achievements in medical physics, influence on the professional development of others, and organizational leadership.

     

November 26, 2025
  • Title: Statistical Analysis of Large-Scare Item Response Data Under Measurement Noninvariance

    Time: 02:30pm 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Jing Ouyang

    Remark(s): 

    Abstract

    International Large-Scale Assessments collect valuable data on educational quality and performance across countries, enabling education systems to share effective techniques and policies. A key analytical tool is the generalized factor model, which measures individuals’ latent traits such as skills and abilities. However, a major challenge arises from Differential Item Functioning (DIF), where different groups (e.g.,genders and countries) may have different probabilities of correctly answering the items after controlling for individual latent abilities. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. In a related line of work, we propose a novel estimation approach for multi-group DIF analysis that estimates the performance distributions of different groups and produces fair group rankings. The proposed method is applied to PISA 2022 data from the mathematics, science, and reading domains, providing insights into their DIF structures and performance rankings of countries.

    About the speaker

    Dr. Jing Ouyang is an Assistant Professor of Innovation and Information Management at the Business school of the University of Hong Kong. Prior to joining HKU, Jing received a Ph.D. in Statistics from the University of Michigan and a BSc. in Mathematics and Economics from the Hong Kong University of Science and Technology. Jing is generally interested in latent variable models, psychometrics, high-dimensional statistical inference, and statistical machine learning. Specifically, her research focuses on developing statistical theory, novel methodology and efficient computing tool for latent variable models to analyze high-dimensional and complex data, with interdisciplinary applications in large-scale educational assessments, psychological measurements, and biomedical sciences.

November 24, 2025
  • Title: The Implications of Side Bequest Motives on the Life Insurance Decisions of Retired Couples

    Time: 02:30pm 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Prof. Ki Wai Chau

    Remark(s): 

    Abstract

    Recent empirical evidence shows that the death of a first spouse in retired couples leads to a sharp decline in wealth, reflecting not only reduced income but also additional transfers to heirs outside the couple. Such ‘side’ bequests have significant financial consequences for a surviving spouse, but the existing literature on financial decision-making does not account for them. To fill this gap, we build a model for optimal life insurance, consumption and portfolio decisions of a retired couple, with side bequest motives. Using analytical results and numerical simulations, we show that side bequests substantially alter couples’ optimal life insurance and consumption decisions. In particular, we show that life insurance is an important tool that allows couples to balance their side bequest motive with the utility of a surviving spouse. Our model, therefore, highlights the importance of accounting for side bequests when making these decisions:

November 21, 2025
  • Title: Causal Representation Learning

    Time: 10:30am 

    Venue: Room 301, Run Run Shaw Building

    Speaker(s): Dr. Guangyi Chen

    Remark(s): 

    Abstract

    Traditional deep learning methods heavily rely on statistical correlations, often at the expense of generalization, robustness, and interpretability. In contrast, classical causal discovery techniques are well-suited for identifying causal relationships in structured tabular data but face significant challenges when applied to unstructured, high-dimensional inputs such as images and videos. Causal representation learning bridges this gap by uncovering the latent causal structure underlying observations. In this talk, we introduce the foundational principles of causal representation learning and its growing importance in trustworthy AI systems. Specifically, we discuss two central research questions:

    1. Under what theoretical conditions can causal factors be identified from observed unstructured data?
    2. How can learned causal representations improve the transferability, transparency, controllability, and attribution of AI systems in real-world applications?

    About the speaker

    Guangyi Chen is a postdoctoral research fellow at Carnegie Mellon University (CMU) and a research scientist at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). He currently co-leads the Causal Learning and Reasoning (CLeaR) Group with Prof. Kun Zhang. Prior to that, he received both his Ph.D. and B.S. degrees from Tsinghua University. His research interests include causality, representation learning, and visual understanding. A central focus of his work is to develop principled and practical methods for learning meaningful representations from visual data that support understanding, generation, and reasoning. He has published over 50 papers in top-tier machine learning and computer vision conferences, including NeurIPS, CVPR, ICLR, and so on, with several recognized as highlights or oral presentations. He also co-organized the Causal Representation Learning workshops at NeurIPS 2024 and ICDM 2024.




Division of Computer Science,
School of Computing and Data Science

Rm 207 Chow Yei Ching Building
The University of Hong Kong
Pokfulam Road, Hong Kong
香港大學計算與數據科學院, 計算機科學系
香港薄扶林道香港大學周亦卿樓207室

Email: csenq@hku.hk
Telephone: 3917 3146

Copyright © School of Computing and Data Science, The University of Hong Kong. All rights reserved.
Don't have an account yet? Register Now!

Sign in to your account