Breadcrumb

Latest News

Illuminating metabolomics dark matter - Reshaping how to mine and reuse big mass spectrometry data for small molecule discovery

Abstract: High-throughput mass spectrometry has enabled unprecedented depth and versatility to observe the molecules in the world around us. Traditionally, a handful of molecules were detected in a typical measurement. Today, this has grown to thousands of molecules in a few minutes. The growth in data presents new opportunities for discovery but also challenges in...

The Age of Creative AI ?

Abstract: Generative models have made significant advances in recent years, sparking an explosion of new applications with far-reaching societal implications. I will discuss the mathematical intuition behind diffusion models, the core technology behind recent art generation tools like DALL-E-2, Imagen, Stable Diffusion, Dreambooth, Lensa, and others. These applications introduce new technical challenges both for computational...

Are hallucinations in text generation always undesirable? A perspective from text elaboration

Abstract: Recent developments in deep learning have led to exponential improvements in Natural Language Generation (NLG), particularly in terms of fluency and coherency. On the other hand, deep learning-based text generation is also susceptible to hallucinating unintended text that is not directly supported by the source document. These unsupported texts are called hallucinations and are...

How to survive Google taking over your research field and (perhaps) thrive

Abstract: Two years ago, Google team made an incredible advance in structural biology, practically solving the protein folding problem (predicting protein structure from its amino acid sequence). The AI-based AlphaFold algorithm was shown to produce protein models comparable in quality to the experimental ones. It shaked up the field of structural biology, which now must...

New Regression Model: Modal Regression

Abstract: Built on the ideas of mean and quantile, mean regression and quantile regression are extensively investigated and popularly used to model the relationship between a dependent variable Y and covariates x. However, the research about the regression model built on the mode is rather limited. In this talk, we propose a new regression tool...

Scalable Privacy-Aware Collaborative Learning

Abstract: Privacy-preserving collaborative learning allows multiple data-owners to jointly train machine learning models while keeping their individual datasets private from each other. The main bottleneck against the scalability of such systems to a large number of participants is their communication cost. In this talk, we will introduce novel distributed training frameworks that can achieve scalability...

Functional Ultrasound Imaging (fUSI): A game changer in neuroscience and medicine

Abstract: Recent advances in neuroimaging technology have significantly contributed to a better understanding of human brain organization, and the development and application of more efficient clinical programs. However, the limitations and tradeoffs inherent to the existing techniques, prevent them from providing large-scale imaging of neural activity with high spatiotemporal resolution, deep penetration, and specificity in...

Statistical methods for analyzing and comparing single-cell gene expression data

Abstract: Single-cell RNA sequencing (scRNA-seq) experiments enable gene expression measurement at a single-cell resolution, and provide an opportunity to characterize the molecular signatures of diverse cell types and states in tissue development and disease progression. However, it remains a challenge to construct a comprehensive view of single cell transcriptomes in health and disease, due to...

Why 95% of papers on Time Series Anomaly Detection are Wrong (with more general lessons for Researchers).

Abstract: Time Series Anomaly Detection (TSAD) is the task of monitoring a time series, say an ECG, or the pressure in an industrial boiler, while attempting to recognize when there has been an anomalous event. The anomalies could be the beginning of heart attack, or a leak in the boiler that will cause the industrial...

Estimation and Sensitivity Analysis for Causal Decomposition: Assessing Robustness Toward Omitted Variable Bias

Abstract: A key objective of decomposition analysis is to identify risks or resources (‘mediators’) that contribute to disparities between groups of individuals defined by social characteristics such as race, ethnicity, gender, class, and sexual orientations. In decomposition analysis, a scholarly interest often centers on estimating how much the disparity (e.g., health disparities between Black women...

A Bayesian multilevel time-varying framework for joint modeling of hospitalization and survival in patients on dialysis.

Abstract: Over 782,000 individuals in the U.S. have end-stage kidney disease with about 72% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality and frequent hospitalizations, at about twice per year. These poor outcomes are exacerbated at key time periods, such as the fragile period after transition to dialysis. In order to...

Deplatforming Right-Wing Extremists on Twitter Following the January 6 Insurrection

Abstract: What happened when Twitter deplatformed 70,000 right-wing extremists following the January 6 insurrection? Using a panel of over a half million active Twitter users and a sharp regression discontinuity design, we test the causal effects of this intervention on the circulation of misinformation by those deplatformed, and by users from adjacent groups such as...

Understanding Large ML Models through the Structure of Feature Covariance

Abstract: An overarching goal in machine learning is to enable accurate statistical inference in the setting where the sample size is less than the number of parameters. This overparameterized setting is particularly common in deep learning where it is typical to train large neural nets with relatively smaller sample sizes and little concern of overfitting...

Multiview learning for knowledge discovery

Abstract: Extracting hidden patterns of multiview data containing heterogeneous feature representations is attracting more and more attention in various scientific fields such as image processing and natural language processing. In this talk we will present a comprehensive unsupervised framework that leverages existing and novel multiview learning models, towards obtaining a single node embedding from a...

Characterizing soil – plant – water relationships across scales for sustainable agricultural management

Abstract: Agricultural systems are pressured by growing global population, increasing water scarcity, and changing climate. In the pursuit of increasing food security, agriculture (especially intensive systems) should also minimize negative and undesired impacts on the environment and on rural societies. Part of the solution to this challenge lies in understanding how environmental factors such as...

Immune regulatory pathways in infection, inflammation and sepsis

Abstract: My lab investigates the immune responses to infection and inflammation using mouse models of parasitic worm infection and clinical samples from sepsis patients. Our ultimate goal is to identify protective or pathogenic immune pathways that we can target for diagnostic or therapeutic purposes. In our mouse infection models we investigate macrophages as first responders...

Learning Binary Code Representations for Security Applications

Abstract: Learning a numeric representation (also known as embedded vector, or simply embedding) for a piece of binary code (an instruction, a basic block, a function, or even an entire program) has many important security applications, ranging from vulnerability search, plagiarism detection, to malware classification. By reducing a binary code with complex control-flow and data-flow...

Too Many Dimensional Gas Chromatography/Mass Spectrometry Analysis of Compounds in Smoke Samples from Wildland Fires

The air quality and fire management communities are faced with increasingly difficult decisions regarding critical fire management activities, given the potential contribution of wildland fires to fine particulate matter (PM2.5). Unfortunately, in model frameworks used for air quality management, the ability to represent PM2.5 from fires is severely limited. This is due in part to...

Lost in translation: The challenges and benefits of understanding complex insect societies

Social insects include the termites, ants and the social bees and wasps, which are a very large and ecologically very successful group of animals. They are also of tremendous importance for humans. Whereas some social insects are serious pest species that become increasingly difficult to control, others are of central importance for agricultural food production...

Outcomes from an experiment in creating data science centers

The Berkeley Institute for Data Science (or BIDS) was founded as part of a high-profile, multi-university initiative funded by the Moore and Sloan Foundations, collectively known as the Moore-Sloan Data Science Environments (or MSDSE), with the mission of creating ``institutional change'' around data science in academia. I will discuss some of the lessons learned in...