Breadcrumb

Latest News

Dealing with Latent Pre-exposure to Information Treatments

Abstract: In Social Sciences, many experiments rely on responses to information treatments. Experimental subjects in the treatment group receive some information that subjects in the control group don't. Often, the proportion of people in the treatment and control groups who were pre-exposed to the information is unknown and uncontrolled by the researchers. If that pre-exposure...

Machine Learning Guided Modeling of Ligand-Protein Binding Energy Landscape: Applications in Small Molecule and Protein-based Drug Design.

Abstract: Molecules in cells constantly move. The motions of proteins in living cells can be simple fluctuations or functional. Therefore, investigating protein dynamics is crucial for understanding protein function and for accurately compute ligand-protein binding free energy landscape. Because experimental structures are static conformations, classical or enhanced molecular dynamics (MD) simulations are commonly used for...

Some Thoughts on Data Science - Population Health Collaborations.

Abstract: Data science involves the application of knowledge from the fields of computer science (on how to manage data) and statistics (on how to analyze data) to solve theoretical and practical problems. The field of population health involves investigation of health outcomes, patterns of health determinants, and policies and interventions that link them (Kindig and...

Remote Sensing of plant and soil for precision agriculture

Abstract: Agricultural systems are often characterized by high spatial and temporal variability in the factors that determine crop yield. In particular, the variability of soil and other environmental factors affecting yield are notoriously hard to characterize at very high spatial resolution. Recent high-resolution satellites (e.g., Sentinel and PlanetScope) may be useful tools for monitoring crops...

Illuminating metabolomics dark matter - Reshaping how to mine and reuse big mass spectrometry data for small molecule discovery

Abstract: High-throughput mass spectrometry has enabled unprecedented depth and versatility to observe the molecules in the world around us. Traditionally, a handful of molecules were detected in a typical measurement. Today, this has grown to thousands of molecules in a few minutes. The growth in data presents new opportunities for discovery but also challenges in...

The Age of Creative AI ?

Abstract: Generative models have made significant advances in recent years, sparking an explosion of new applications with far-reaching societal implications. I will discuss the mathematical intuition behind diffusion models, the core technology behind recent art generation tools like DALL-E-2, Imagen, Stable Diffusion, Dreambooth, Lensa, and others. These applications introduce new technical challenges both for computational...

Are hallucinations in text generation always undesirable? A perspective from text elaboration

Abstract: Recent developments in deep learning have led to exponential improvements in Natural Language Generation (NLG), particularly in terms of fluency and coherency. On the other hand, deep learning-based text generation is also susceptible to hallucinating unintended text that is not directly supported by the source document. These unsupported texts are called hallucinations and are...

How to survive Google taking over your research field and (perhaps) thrive

Abstract: Two years ago, Google team made an incredible advance in structural biology, practically solving the protein folding problem (predicting protein structure from its amino acid sequence). The AI-based AlphaFold algorithm was shown to produce protein models comparable in quality to the experimental ones. It shaked up the field of structural biology, which now must...

New Regression Model: Modal Regression

Abstract: Built on the ideas of mean and quantile, mean regression and quantile regression are extensively investigated and popularly used to model the relationship between a dependent variable Y and covariates x. However, the research about the regression model built on the mode is rather limited. In this talk, we propose a new regression tool...

Scalable Privacy-Aware Collaborative Learning

Abstract: Privacy-preserving collaborative learning allows multiple data-owners to jointly train machine learning models while keeping their individual datasets private from each other. The main bottleneck against the scalability of such systems to a large number of participants is their communication cost. In this talk, we will introduce novel distributed training frameworks that can achieve scalability...

Functional Ultrasound Imaging (fUSI): A game changer in neuroscience and medicine

Abstract: Recent advances in neuroimaging technology have significantly contributed to a better understanding of human brain organization, and the development and application of more efficient clinical programs. However, the limitations and tradeoffs inherent to the existing techniques, prevent them from providing large-scale imaging of neural activity with high spatiotemporal resolution, deep penetration, and specificity in...

Statistical methods for analyzing and comparing single-cell gene expression data

Abstract: Single-cell RNA sequencing (scRNA-seq) experiments enable gene expression measurement at a single-cell resolution, and provide an opportunity to characterize the molecular signatures of diverse cell types and states in tissue development and disease progression. However, it remains a challenge to construct a comprehensive view of single cell transcriptomes in health and disease, due to...

Why 95% of papers on Time Series Anomaly Detection are Wrong (with more general lessons for Researchers).

Abstract: Time Series Anomaly Detection (TSAD) is the task of monitoring a time series, say an ECG, or the pressure in an industrial boiler, while attempting to recognize when there has been an anomalous event. The anomalies could be the beginning of heart attack, or a leak in the boiler that will cause the industrial...

Estimation and Sensitivity Analysis for Causal Decomposition: Assessing Robustness Toward Omitted Variable Bias

Abstract: A key objective of decomposition analysis is to identify risks or resources (‘mediators’) that contribute to disparities between groups of individuals defined by social characteristics such as race, ethnicity, gender, class, and sexual orientations. In decomposition analysis, a scholarly interest often centers on estimating how much the disparity (e.g., health disparities between Black women...

A Bayesian multilevel time-varying framework for joint modeling of hospitalization and survival in patients on dialysis.

Abstract: Over 782,000 individuals in the U.S. have end-stage kidney disease with about 72% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality and frequent hospitalizations, at about twice per year. These poor outcomes are exacerbated at key time periods, such as the fragile period after transition to dialysis. In order to...

Deplatforming Right-Wing Extremists on Twitter Following the January 6 Insurrection

Abstract: What happened when Twitter deplatformed 70,000 right-wing extremists following the January 6 insurrection? Using a panel of over a half million active Twitter users and a sharp regression discontinuity design, we test the causal effects of this intervention on the circulation of misinformation by those deplatformed, and by users from adjacent groups such as...

Understanding Large ML Models through the Structure of Feature Covariance

Abstract: An overarching goal in machine learning is to enable accurate statistical inference in the setting where the sample size is less than the number of parameters. This overparameterized setting is particularly common in deep learning where it is typical to train large neural nets with relatively smaller sample sizes and little concern of overfitting...

Multiview learning for knowledge discovery

Abstract: Extracting hidden patterns of multiview data containing heterogeneous feature representations is attracting more and more attention in various scientific fields such as image processing and natural language processing. In this talk we will present a comprehensive unsupervised framework that leverages existing and novel multiview learning models, towards obtaining a single node embedding from a...

Characterizing soil – plant – water relationships across scales for sustainable agricultural management

Abstract: Agricultural systems are pressured by growing global population, increasing water scarcity, and changing climate. In the pursuit of increasing food security, agriculture (especially intensive systems) should also minimize negative and undesired impacts on the environment and on rural societies. Part of the solution to this challenge lies in understanding how environmental factors such as...

Immune regulatory pathways in infection, inflammation and sepsis

Abstract: My lab investigates the immune responses to infection and inflammation using mouse models of parasitic worm infection and clinical samples from sepsis patients. Our ultimate goal is to identify protective or pathogenic immune pathways that we can target for diagnostic or therapeutic purposes. In our mouse infection models we investigate macrophages as first responders...