Universal properties of genotype-phenotype maps (Royal Society Interface 2014)
We introduce a simple genotype-phenotype (GP) map for biological self-assembly on a lattice, and show that it shares many properties with the well-established GP maps of both RNA secondary structure and the HP model. These properties include a heavily skewed distribution of the number of genotypes per phenotype, disconnected neutral components, and shape space covering. The fact that these important properties emerge in three very different GP maps underline their fundamental importance for biological evolution. It also means that the lattice model, which is highly simplified and therefore tractable, can be used to study a wide variety of evolutionary phenomena.
Network analysis identifies the sustainers of historical underground communities (Leonardo 2014; English Literary History forthcoming)
We apply network analysis to a curated social network of the Protestant underground community during the reign of Mary I of England (1553-1558), derived from the contents of several hundred letters sent by members of this community. This quantitative approach identifies individuals in the network who did not necessarily have many connections to others, but who nevertheless occupied strategically important positions in the network. The importance of these individuals is confirmed by historical evidence of their role as sustainers who passed messages, provided shelter and financial support, and who continued to hold the network together after most of the leading figures had been executed by Mary I.
This work was also covered in the New Scientist.
Power graph compression of networks reveals dominant relationships (Scientific Reports 2014; Molecular BioSystems 2013; see also Nature 2014)
We show that compression of complex networks into power graphs with freely overlapping power nodes allows us to detect dominant connectivity patterns in a wide range of different networks. This approach can be applied to undirected, directed and bipartite networks such as social networks, food webs and recipe-ingredient networks. When applied to genetic transcription networks we can assign meaning to power nodes by using GO term enrichment, which reveals that functional modules in genetic transcription networks are highly overlapping.
This method has also been used to map the functional organisation of the gene regulatory network in Arabidopsis responsible for xylem specification and secondary wall biosynthesis (accepted for publication in Nature).
Network analysis of chemical flavour compounds (Flavour 2013; Scientific Reports 2011)
Using network analysis we investigate the widespread hypothesis that foods with compatible flavours share chemical flavour compounds. Until now this hypothesis has relied on anecdotal rather than quantitative evidence. We construct a bipartite network of flavour compounds and ingredients, and compare it to large recipe data sets. This reveals that the shared compound hypothesis holds in some regional cuisines but not in others. More generally our analysis demonstrates how the type of large-scale data analysis that has transformed biology in recent years can lead to new results in other fields, such as food science.
Our article in Scientific Reports was the most downloaded article across all Nature Publishing Group journals in December 2011, exceeding 100,000 downloads in the first four weeks following publication. It also received attention from the Scientific American, Nature News, New Scientist, The Huffington Post, The Technology Review, BioTechniques, and Ingeniøren, among others.
A poster of the network between food ingredients can be downloaded here.
Self-assembly, modularity and physical complexity (Physical Review E 2011; Physical Review E 2010)
Self-assembly is not just a ubiquitous phenomenon in biology and physics, it is also a language that can be used to describe a physical structure, and measure its complexity and modularity. To illustrate this, we introduce a versatile lattice model of self-assembly, before applying our approach to more general structures such as molecules and protein complexes. In further work we show that genetic algorithms can be used in conjunction with our lattice model to answer questions about the emergence of symmetry and modularity in biological evolution.
In this context I was also one of the organisers of the ESFCB 2012 conference on the Evolution of Structural and Functional Complexity in Biology, together with Ard Louis, Iain Johnston, and Thomas Fink.
Time scales in microarray data (BMC Genomics 2010)
Biological processes take place on a vast range of time scales, and many of them occur simultaneously in the living organism. Gene expression measurements, such as microarrays, have the potential to capture many of these processes in parallel. The challenge however, is to separate these processes and their time scales in the data. We introduce a method for detecting different time scales in time-series gene expression data, by identifying expression patterns that are temporally shifted between replicate datasets, and show that the time scales we find in data from S. cerevisiae and A. thaliana can be associated with particular biological functions.
Pattern detection in microarray data (Science 2010; PLoS One 2008; Bioinformatics 2006)
Over the last decade, microarrays have generated an unprecedented amount of genetic expression data. Here we introduce an approach for detecting statistically significant patterns in these datasets without making prior assumptions about the nature of the pattern. This method is based on concepts from Algorithmic Information Theory.
Classifying directed networks (Physical Review E 2008)
Directed networks - in which every edge is an arrow, rather than an undirected link - exhibit a much more complicated connectivity than undirected networks. We show that a set of four directed clustering coefficients provide a useful space for classifying a wide range of real-world directed networks, ranging from social networks to transcription networks, language networks and food webs.
Predicting genome statistics (Journal of Theoretical Biology 2008)
The role of non-coding DNA in the genomes of multicellular organisms remains largely unknown. We show that eukaryotes (encompassing practically all multicellular species from yeast to humans) appear to require a minimum amount of non-coding DNA, and propose a model which predicts this minimum by using a simple growth model of genetic regulatory networks.
I am also interested in cellular automata, Boolean networks and Gaussian processes, among other things, and am co-organiser of the Cambridge Networks Network meetings. Past research interests of mine include quantum measurement and molecular dynamics.
Some of my collaborators, past and present:
Links to pages on various scientific and non-scientific topics.
Imbrella - A free and invisible umbrella
How to play Go on a Hypercube
John Baez's Homepage
The Chocolate Revolution
The biggest number
The Clay Millenium Prize
The Klein Bottle Shop
The Complexity Zoo
'Math In LaTeX'
The CSS Zen Garden
The Simulation Argument
Minds, Machines and Gödel by John Lucas
Robert J. Lang's Origami Designs
The elgooG Google mirror
57 Optical Illusions
An Arabidopsis Gene Regulatory Network for Xylem Specification and Secondary Wall Biosynthesis
accepted for publication in Nature (2014)
Social Network Analysis Predicts Health Behaviours and Self-Reported Health in African Villages
PLoS ONE 9(7): e103500 (2014)
A community under attack: Protestant letter networks in the reign of Mary I
Leonardo 47, 275 (2014)
Generalised power graph compression reveals dominant relationship patterns in complex networks
Scientific Reports 4, 4385 (2014)
A tractable genotype-phenotype map modelling the self-assembly of protein quaternary structure
Journal of the Royal Society Interface 11, 20140249 (2014)
Power graph compression reveals dominant relationships in genetic transcription networks
Molecular BioSystems 9, 2681 (2013)
Low-Temperature Behaviour of Social and Economic Networks
Entropy 15, 3148 (2013)
The Flavor Network
Leonardo 46, 272 (2013)
Protein complexes are under evolutionary selection to assemble via ordered pathways
Cell 153, 461 (2013)
The rich club of the C. elegans neuronal connectome
Journal of Neuroscience 33, 6380 (2013)
Network analysis and data mining in food science: the emergence of computational gastronomy
Flavour 2:4 (2013)