The usefulness of two-class statistical classifiers is limited when one or both of the conditional miss-classification rates are unacceptably high. Incorporating a neutral zone region into the classifier provides a mechanism to refer ambiguous cases to follow-up where additional information might be obtained to clarify the classification decision. Through the use of the neutral zone...
Plants show a remarkable ability to survive in diverse environments. Our work seeks to identify the genetic basis of plant adaptation through computational analysis of whole genome sequences isolated from thousands of individuals. I will discuss the unequal retention of genetic variation in the genomes of plant populations, the potential for ancestral variation to facilitate...
The process of galaxy formation and evolution involves a number of interesting and complicated physical processes. However, we have only limited information on the real galaxies we would like to compare to our theories: often just the shape, size, and amount of light given off in a few different color filters. This data set behaves...
While the amount of data that we store and consume is consistently growing, the similar trend is visible in the scale of the modern machine learning (ML) algorithms. Fueled by big data, these algorithms use many parameters to capture the intricate latent structure in the data. Hence, data continues to fuel the success of machine...
Neural recordings in behaving animals has revealed the types of sensory and motor information represented by cortical neurons. However, we do not know the functional organization of the neural circuits that generate these representations. Neural circuits with different structures produce different activity dynamics. Therefore, analyses of the dynamics of neural activity may provide insights into...
Quasars, light sources powered by black holes, are amongst the brightest objects in the universe, and are used as searchlights to illuminate the contents of the universe between earth and the quasar. Historically features in these quasars were detected by visually inspecting the spectra! However, astronomers now have surveys containing 10^5 quasars, which makes this...
While autonomous systems that perform solo missions can yield significant benefits, greater efficiency and operational capability will be realized from teams of autonomous systems operating in a coordinated fashion. Potential applications for networked multiple autonomous systems include environmental monitoring, search and rescue, space-based interferometers, hazardous material handling, and combat, surveillance, and reconnaissance systems. Networked multi-agent...
Magnetic resonance imaging (MRI) allows for the detection structural changes due to disease or inspection of neural patterns that underlie cognition. Prior hardware and software limitations kept the dimensionality of MRI datasets low and reduced the statistical power of MRI datasets. However, recent technological advancements in MRI have allowed for the development of large scale...
Since the sequencing of the first human genome in 2001, more than 3000 human genomes have been sequenced and more than 150 million SNPs have been identified in those genomes. Many of the SNPs lie in regions of genes that encode proteins and potentially impact function and contribute to disease. However, the vast majority of...
Texts are increasingly used to make causal inferences: either with the document serving as the treatment or the outcome. We introduce a new conceptual framework to understand all text-based causal inferences, demonstrate fundamental problems that arise when using manual or computational approaches applied to text for causal inference, and provide solutions to the problems we...
Robot motion planning and control in real-world settings is hindered, in part, by uncertainty. Dealing with uncertainty is a difficult problem because it invalidates the performance guarantees often available in deterministic cases, while its precise effect on motion cannot be predicted. Further, (autonomous) robot performance often emerges through the interaction of multiple components, mainly including...
Hunter-gatherer adaptations are tied to the way that climate and environment shape the food and technological resource base. Discovering the relation between climate and environmental change and human origins must be grounded in a causal understanding of the connection between climate, environment, resource patterning, and human behavior. To better understand the origins of modern humans...
Multielectrode arrays (MEA) allow recording of electroencephalogram (EEG) signals from multiple sites simultaneously. We have implemented skull surface MEA in a mouse model of Fragile X syndrome (Fmr KO mouse). This enables unprecedented electrophysiological characterization of normal vs. Fmr KO mice. In this presentation, we will describe the rationale and our early data implementing MEA...
Food intake and energy balance are controlled by a dynamic interplay of gut-brain signaling pathways that are poorly defined. Recent work from the DiPatrizio lab, however, suggests that our bodies' own cannabis-like signaling molecules, the endocannabinoids, control gut-brain signaling important for food intake. Furthermore, this signaling becomes upregulated in diet-induced obesity and causes overeating. These...
Evolutionary biology and systematics seek to understand how organisms are related and processes that lead to species, populations and associated traits of organisms. Using whole genome sequencing to inventory the DNA, computational tools to assemble and annotate genomes, and analyses to identify shared gene sequences among organisms we are assembling the fungal tree of life...
De novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies. Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the tradeoff between maximizing contiguity and minimizing assembly errors (e.g., mis-joins). In order to obtain...
High-throughput methods based on chromosome conformation capture technologies have greatly advanced our understanding of the three-dimensional (3D) organization of genomes and demonstrated that genome architecture strongly influences gene regulation. However, methods to analyze the 3D chromatin spatial organization data are still in their infancy. In this talk, I will first present a wavelet approach for...
We explore the NYSE Trades and Quotes (TAQ) database that contains tick-by-tick transaction information of stocks traded in the New York Stock Exchange and NASDAQ stock markets. Consistent with asymmetric information models of market infrastructure, we analyze the role of trading intensity, as a proxy for latent information, on the value of financial assets. We...
Markov chain Monte Carlo (MCMC) produces a correlated sample for estimating expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering this question lies in assessing the Monte Carlo error through a multivariate Markov chain central...
Here I will discuss some of the projects that the Brain Game Center is working on and areas where there are significant advantages to moving beyond traditional approaches to data analytics. Issues that we are trying to solve are how can one classify people into subgroups based upon a collection of tests? What rehabilitation approaches...