getpdf  NLM-PubMed-Logo  doi: 10.17113/ftb.55.02.17.4749 

MEGGASENSE – The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses



Ranko Gacesa1,2small orcid_display_4pp, Jurica Zucko1,3small orcid_display_4pp, Solveig K. Petursdottir4small orcid_display_4pp, Elisabet Eik Gudmundsdottir4small orcid_display_4ppOlafur H. Fridjonsson4small orcid_display_4pp, Janko Diminic1,3small orcid_display_4pp, Paul F. Long2,5small orcid_display_4pp, John Cullum6small orcid_display_4ppDaslav Hranueli1,3small orcid_display_4pp, Gudmundur O. Hreggvidsson4,7small orcid_display_4pp and Antonio Starcevic1,3*small orcid_display_4pp


1SemGen Ltd., Lanište 5/D, HR-10 000 Zagreb, Croatia
2Institute of Pharmaceutical Science, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
3Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, HR-10 000 Zagreb, Croatia
4Matis Ltd., Vínlandslei? 12, IS-113 Reykjavík, Iceland
5Faculty of Life Sciences and Medicine, King’s College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, UK
6Department of Genetics, University of Kaiserslautern, Postfach 3049, DE-67653 Kaiserslautern, Germany
7Faculty of Life and Environmental Sciences, University of Iceland, Sturlugötu 7, IS-101 Reykjavík, Iceland




Article history:
Received April 22, 2016
Accepted January 17, 2017
cc



Key words:
bioprospecting, carbohydrate-modifying enzymes, DNA assembly


Summary:
The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya. The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel ‘functional’ assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.




*Corresponding author:  email3  This email address is being protected from spambots. You need JavaScript enabled to view it.
                                          tel3  +385 1 4605 147
                                          fax2  +385 1 4836 083