PřF: Bi4013


Building an annotation package for microbial consortia (with the associated workflows to generate
it)


When the desiderata is to make hypothesis of which kind of functional potential a microbial
community can express as a whole, to collect information on the single microbes constituting the
community and then trying to merge the information together could not be effective: some flexible
data structures allowing us to surf and to integrate the information at different levels of
biological complexity could be the right perspective.


Here we propose to follow the general idea that inspired the design of Bioconductor libraries
related to annotations for organisms. These libraries are “gene centered” and this can create some
difficulty in our setting where multiple organisms cooperate or compete to address the challenges
coming from the Environment. A solution could be to consider the unifying view of set of orthologs
genes (called KOs in the KEGG database). As an example of “Bioconductor library based design” we
propose the following key /values pairs of biological entities corresponding to well defined data
structures:


KOs2AAseq

KOs2rxns

KOs2metaboNet

species2KOs

metagenomes2KOs


The project will implement (in R) both the workflows to build the data structures and the data
structures per se . For testing purposes, the metagenomics collection provided by Almeida et al
will be used to “instantiate” the annotation package and to test it in its capability to provide
starting point building blocks to move a step forward in exploring the functional potential of a
microbial consortium of interest.


References

https://www.nature.com/articles/s41586-019-0965-1


ftp://ftp.ebi.ac.uk/pub/databases/metagenomics/umgs_analyses/functional_analyses/


https://www.bioconductor.org/