The complement system is a part of innate immunity that rapidly removes invading pathogens and impaired host-cells. Activation of the complement system is balanced under homeostasis by regulators that protect healthy host-cells. Impairment of complement regulators tilts the balance, favoring activation and propagation that leads to inflammatory and autoimmune diseases. To understand the dynamics of...
Adaptive Boosting or AdaBoost, introduced by Freund and Schapire (1996) has been proved to be effective to solve the high-dimensional binary classification or binary prediction problems. Friedman, Hastie, and Tibshirani (2000) show that AdaBoost builds an additive logistic regression model via minimizing the ‘exponential loss’. We show that the exponential loss in AdaBoost is equivalent...
The explosion of Internet of Things and cloud computing applications has generated a huge demand for multi-tenant collocation data centers everywhere, extending the Internet edge beyond the traditional hub locations. As one would expect, securing datacenters against cyber attacks is extremely important, and so is providing a reliable power supply to servers. While the threat...
It is well known that relationships between data points (i.e., context) in structured data can be exploited to obtain better recognition performance. In our recent work, we have explored a different, but related, problem: how can these inter relationships be used to efficiently learn and continuously update a recognition model, with minimal human labeling effort...
Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including analyzing events and users activities and critical applications like discovering health issues and rescue services. Consequently, major research efforts are...
Increases in greenhouse gas concentrations are expected to impact the terrestrial hydrologic cycle through changes in radiative forcings and plant physiological and structural responses. As a result, projections of future changes in water resources become complicated due to the tight coupling between the biosphere and terrestrial hydrologic cycle. In recent years a number of physically...
Environmental sensing has expanded rapidly for more than a decade. I will provide an overview of the dimensions of this data revolution within the ecological sciences. I will then describe a specific evaluation of the water-ecosystem service trade-offs for the use of urban vegetation to cool cities. Vegetation interacts strongly with urban water sustainability. Plants...
I propose a new method to place lobbyists into standard common space measures for ideology scores, leveraging responses from former members of the U.S. Congress to a survey containing a battery of ideology attitude measures, along with a flexible Bayesian statistical model. The statistical model incorporates estimation uncertainty into the imputed lobbyist ideology measures and...
The product composition of bilateral trade encapsulates complex relationships about comparative advantage, global production networks, and domestic politics. Yet, despite the availability of product-level trade data, most researchers rely on either the total volume of trade or certain sets of aggregated products. We develop a new dynamic clustering method to effectively summarize this massive amount...
Insects use the sense of smell to identify their host animals and plants. The ability to detect and discriminate thousands of odorants from their hosts uses a very large family of transmembrane odorant receptors and complex neuronal circuitry. The study of olfaction has benefited from computational approaches to identify important principles: protein sequences of receptors...
Time series data mining is a perennially popular research topic, due to the ubiquity of time series in medical, financial, industrial, and scientific domains. There are about a dozen major time series data mining tasks, including: • Time Series Motif Discovery • Time Series Joins • Time Series Classification (shapelet discovery) • Time Series Density...
We face two broad challenges as we design the next generation of intelligent and interconnected devices: On one extreme, these systems will collect an enormous amount of data from a multitude of sources and require low-complexity, versatile algorithms that can make sense of all the data. On the other extreme, certain physical or system constraints...
Tensors and tensor decompositions have been very popular and effective tools for analyzing multi-aspect data in a wide variety of fields, ranging from Psychology to Chemometrics, and from Signal Processing to Data Mining and Machine Learning. In this talk, first I will motivate the use of tensors as an effective data analytic tool in a...
The explosion in the amount of spatial data in the recent years urged researchers to build specialized systems for big spatial data. This talk will have two parts. In the first part, we describe SpatialHadoop, the most comprehensive open source system for big spatial data. We describe how SpatialHadoop managed to achieve simplicity and efficiency...
For decades, cognitive psychologists and linguists have studied language development by testing theories of learning and development in highly controlled behavioral experiments. Much has been learned from this approach. However, Big Data and computational models allow us to investigate language development in a radically different way: by collecting large datasets of actual speech to children...
The modern electronic structure methods allow for a reasonably accurate computation of quantum mechanical behavior of atoms and electrons in a material with almost any chemical composition, using virtually no input parameters. These methods allow design of new materials even before they are made in the laboratory. The only input parameters for these methods are...
My lab investigates new high throughput functional genomics methods to study gene regulation in the major human malaria parasite species (P. falciparum and P. vivax) and the zoonotic species, P. knowlesi. In particular, my laboratory generates large genome-wide data sets using next generation sequencing and proteomics technologies along with novel computational biology approaches to better...
Microbial eukaryotes (organisms <1mm, such as nematodes, fungi, protists, and other ‘minor’ metazoan phyla) are abundant and ubiquitous in marine sediments, performing key functions such as nutrient cycling and sediment stability in marine habitats. Yet, their unexplored diversity represents one of the major challenges in biology and currently limits our capacity to understand, mitigate and...