bTSSfinder is a novel tool that predicts putative prompters for five classes of sigma factors in E. coli and in Cyanobacteria. bTSSfinder also classifies cyanobacterial promoters. Comparing to currently available tools, bTSSfinder achieves higher accuracy and has a broad scope.


This system makes a thorough analysis of ChIP-Seq peaks and identifies the dominant sequence motif families as potential binding sites of DNA-interacting proteins.


Tool that can with a very high accuracy demarcate those genomic regions that are unlikely to promote the initiation of transcription. In our machine learning algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around verified TSSs, as well as statistical analyses of these surrounding regions.

If you are using this resource in your research please cite:

Schaefer U, Kodzius R, Kai C, Kawai J, Carninci P, Hayashizaki Y, Bajic VB (November 2010) High sensitivity TSS prediction: estimates of locations where TSS cannot occur. PLoS One 5(11): e13934. Epub 2010 Nov 15. doi:10.1371/journal.pone.0013934.


Dragon Motif Finder is a simple ab-initio motif finding tool in DNA sequences. It allows the processing of large sequence sets in a relatively short amount of time on the web. It is heavily used in Fantom5 consortium project for the analysis of promoter sequences.


The Dragon PolyA Spotter is a tool for predicting polyadenylation signals variants in human DNA genomic sequences based on two machine learning algorithms. The tool displays predicted polyA signal variants and their positions in each submitted DNA sequence
Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Kalkatawi M, Rangkuti F, Schramm M, Jankovic BR, Kamau A, Chowdhary R, Archer JA, Bajic VB. Bioinformatics. 2013 Jun 1;29(11):1484. doi: 10.1093/bioinformatics/btt161.


Tool that aims at predicting the DNA binding sites of peroxisome proliferatior-activated receptors (PPARs) with extremely high accuracy.


Dragon TIS Spotter searches for Translation Initiation Sites (TISs) in plant genomic sequences provided in fasta format. The tool analyzes content of the sliding windows of 300 bp of DNA sequence, assuming the TIS is located at 150-152 position of the window counted from the 5prime end. The machine learning prediction algorithm is trained on Arabidopsis genome and tested on genomic sequences of three plant genomes.
Dragon TIS Spotter: an Arabidopsis-derived predictor of translation initiation sites in plants. Magana-Mora A, Ashoor H, Jankovic BR, Kamau A, Awara K, Chowdhary R, Archer JA, Bajic VB. Bioinformatics. 2013 Jan 1;29(1):117-8. doi: 10.1093/bioinformatics/bts638. 


The program is a pipeline for genetic algorithms for optimization of decision tree structures .


Histone Modification in Cancer (HMCan) is Hidden Markov Model based tool that is developed to detect histone modification in cancer ChIP-seq data. It applies three correction steps to the data: copy number correction, GC bias correction and noise level correction. In order to run HMCan, one needs ChIP-seq target alignment file, and control alignment file.
HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data. Ashoor H, Hérault A, Kamoun A, Radvanyi F, Bajic VB, Barillot E, Boeva V. Bioinformatics. 2013 Dec 1;29(23):2979-86. doi: 10.1093/bioinformatics/btt524


Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.


LD motif finder


LigandRFs is a random forest-based approach to predict protein-ligand binding sites.


Dimitrios Kleftogiannis, Panos Kalnis and Vladimir B. Bajic

A fundamental problem in bioinformatics is genome assembly. Next-Generation Sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures. Kleftogiannis D, Kalnis P, Bajic VB. PLoS One. 2013 Sep 27;8(9):e75505. doi: 10.1371/journal.pone.0075505.


miRNAVISA is a web-based tool that allows customized interrogation and comparisons of miRNA families for hypotheses generation, and comparison of the per-species chromosomal distribution of miRNA genes in different families.
Exploration of miRNA families for hypotheses generation. Kamanu TK, Radovanovic A, Archer JA, Bajic VB. Sci Rep. 2013 Oct 15;3:2940. doi: 10.1038/srep02940.


A framework for scalable parameter estimation of gene circuit models using structural information.


Genome-wide analysis of alternative TSSs - Improved recognition of industrially important enzymes


The program is able to predict the 12 main variants of human poly(A) motifs, i.e., AATAAA, ATTAAA, AAAAAG, AAGAAA, TATAAA, AATACA, AGTAAA, ACTAAA, GATAAA, CATAAA, AATATA, and AATAGA.


Our method trains a two-round support vector regression model for predicting protein-DNA binding affinity.


Fast and scalable pathogen discovery program with accurate genome relative abundance estimation.

If you are using this resource in your research please cite:

Naeem R, Rashid M, Pain A. (Nov 2012) READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation. Bioinformatics. 2012 Nov 28. [Epub ahead of print]


INDIGO enables the integration of annotations for the exploration and analysis of newly sequenced microbial genomes.It provides keyword search or Query builder for more complex queries for data mining and allows for saving or exporting results. INDIGO is developed in CBRC.
INDIGO - INtegrated data warehouse of microbial genomes with examples from the red sea extremophiles. Alam I, Antunes A, Kamau AA, Ba Alawi W, Kalkatawi M, Stingl U, Bajic VB. PLoS One. 2013 Dec 6;8(12):e82210. doi: 10.1371/journal.pone.0082210.


Dragon Database for Methylated Genes and Diseases (DDMGD) can provide associations between methylated genes and diseases. The associations were extracted automatically from PubMed abstracts using Dragon Extractor of Methylated Genes in Diseases (DEMGD) tool. You can search using the gene, diseases and/or species
DDMGD: the database of text-mined associations between genes methylated in diseases from different species. Bin Raies A, Mansour H, Incitti R, Bajic VB. Nucleic Acids Res. 2015 Jan;43(Database issue):D879-86. doi: 10.1093/nar/gku1168.


This knowledgebase resource efficiently and conveniently integrates expert reviewed information summary about the genes implicated in prostate cancer, including comprehensive information about every specific gene. A centralized resource for researchers to support functional characterization and analysis of molecular processes related to prostate cancer. It is compiled based on extensive text-mining and data-mining.
DDPC: Dragon Database of Genes associated with Prostate Cancer. Maqungo M, Kaur M, Kwofie SK, Radovanovic A, Schaefer U, Schmeier S, Oppon E, Christoffels A,Bajic VB. Nucleic Acids Res. 2011 Jan;39(Database issue):D980-5. doi: 10.1093/nar/gkq849.


Database is a resource for systems biology studies that facilitate exploration of transcription regulation networks and include information about proteins involved in the regulation of transcription in humans. It contains hand-curated information about transcription factor (TF) proteins (those that directly bind to DNA and affect transcription initiation) and transcription co-factor (TcoF) regulatory proteins that bind to TFs and affect transcription initiation but not to bind DNA directly.
TcoF-DB: dragon database for human transcription co-factors and transcription factor interacting proteins. Schaefer U, Schmeier S, Bajic VB. Nucleic Acids Res. 2011 Jan;39(Database issue):D106-10. doi: 10.1093/nar/gkq945.


Database is a resource for systems biology studies that facilitate exploration of transcription regulation networks and include information about promoter regions of human miRNA genes, SNPs, and predicted TFBSs in the promoter regions. The web-interface allows to explore the effect of SNPs on the transcriptional regulation of miRNA genes.
dPORE-miRNA: polymorphic regulation of microRNA genes. Schmeier S, Schaefer U, MacPherson CR, Bajic VB. PLoS One. 2011 Feb 4;6(2):e16657. doi: 10.1371/journal.pone.0016657.


DENdb is a centralized on-line repository of predicted enhancers derived from multiple human cell-lines. DENdb integrates enhancers predicted by five different methods (ChromHMM, Segway, RFECS, CSI-ANN, and ENCODE integrated annotation) generating an enriched catalogue of enhancers for each of the analyzed cell-lines.


DESMSCI is a complex knowledgebase that provides information about the associations of the sponge compounds with the different biological entities like human genes, proteins, diseases and pathways based on the scientific literature available in the PubMed and information deposited in the other databases. It is compiled based on extensive text-mining and data-mining.
Dragon exploration system on marine sponge compounds interactions. Sagar S, Kaur M, Radovanovic A, Bajic VB. J Cheminform. 2013 Feb 16;5(1):11. doi: 10.1186/1758-2946-5-11.


Specialized knowledgebase resource specifically aimed at researchers investigating Sickle-cell disease. A centralized resource that support functional characterization and analysis of molecular processes related to Sickle-cell disease. It is compiled based on extensive text-mining and data-mining.
Information exploration system for sickle cell disease and repurposing of hydroxyfasudil. Essack M, Radovanovic A, Bajic VB. PLoS One. 2013 Jun 10;8(6):e65190. doi: 10.1371/journal.pone.0065190.


This knowledgebase is a resource aimed to facilitate research in reproductive toxicity. It is a complex information system that conveniently and efficiently integrates various information related to reproductive toxicity. Its primary goal is to enable researchers to efficiently and rapidly query the human reproductive toxicity literature in an innovative manner and on a deeper level than articles and from differing gene/protein-, protein-, metabolite / enzyme-, biological- and chemical/toxin-, disease- and human anatomical-centric perspectives.
DESTAF: a database of text-mined associations for reproductive toxins potentially affecting human fertility. Dawe AS, Radovanovic A, Kaur M, Sagar S, Seshadri SV, Schaefer U, Kamau AA, Christoffels A, Bajic VB. Reprod Toxicol. 2012 Jan;33(1):99-105. doi: 10.1016/j.reprotox.2011.12.007.


This is a comprehensive knowledgebase on Hepatitis C Virus (HCV) based on extensive text-mining and data-mining. It integrates in an efficient and convenient manner information using specified concepts, keywords and phrases. Each concept search generates text-derived association networks and hypotheses which could be tested to identify potentially novel relationship between different concepts.

If you are using this resource in your research please cite:

Kwofie SK, Radovanovic A, Sundararajan VS, Maqungo M, Christoffels A, Bajic VB (June 2011) Dragon Exploratory System on Hepatitis C Virus (DESHCV). Infect Genet Evol 11(4):734-9. doi:10.1016/j.meegid.2010.12.006.


Dragon Explorer of Enzymes and Compounds Of Industrial Importance (DEECOII) is an online resource that highlights industrially important biological entities (enzymes, microorganisms or chemical compounds).


Dragon Explorer of Osmoprotection Pathways (DEOP) is a unique web system focused on the exploration of manually-curated information related to the production of osmoprotectants captured from species and includes pathways, genes/enzymes, compounds and reactions. The system allows the user to test their genomic sequences in relation to osmoprotectant pathways.
DEOP: a database on osmoprotectants and associated pathways. Bougouffa S, Radovanovic A, Essack M, Bajic VB. Database (Oxford). 2014 Oct 17;2014. pii: bau100. doi: 10.1093/database/bau100.


FARNA is a database that contains functional annotations (GO terms, pathways, disease information) for human non-coding RNA (ncRNA) transcripts. Two classes of ncRNAs are included: microRNA (miRNA) and long ncRNA (lncRNA). Functional annotation of transcripts is obtained based on the novel method we developed for function inference that relies on the regulatory network of the transcript.


Comprehensive and integrated knowledgebase on Hepatitis C Virus (HCV) protein interactions. It contains manually verified information from literature and databases on curated interactions comprising of HCV and host human cellular proteins.
HCVpro: hepatitis C virus protein interaction database. Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. Infect Genet Evol. 2011 Dec;11(8):1971-7. doi: 10.1016/j.meegid.2011.09.001.


We present the Homo Sapiens Comprehensive Model Collection (HOCOMOCO) of transcription factor (TF) binding models obtained by careful integration of data from different sources. HOCOMOCO contains 426 non-redundant curated binding models for 401 human TFs.
HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ. Nucleic Acids Res. 2013 Jan;41(Database issue):D195-202. doi: 10.1093/nar/gks1089.


Meta-INDIGO enables the integration of annotations for the exploration and analysis of newly sequenced microbial genomes.


Portal for Microbial Knowledge Exploration Systems is a platform for discovery, analysis and exploration of information from a number of topic-specific microbial knowledgebases. The knowledgebases include up to date information from published scientific literature and a number of major databases from biology fields.


ACRE is a JAVA-based tool that enumerates all the biochemical reaction networks that consist of user-created nodes from user-selected modules under user-specified constraints.


GA based optimizer of decision tree parameters and structure


Dragon Extractor of Methylated Genes in Diseases (DEMGD) was developed for mining of associations between methylated genes and diseases from free text abstracts submitted by users. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. DEMGD allows research scientists to extract methylated genes and diseases associations efficiently and from the most recent literature.
Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text. Bin Raies A, Mansour H, Incitti R, Bajic VB. PLoS One. 2013 Oct 16;8(10):e77848. doi: 10.1371/journal.pone.0077848.


Dragon Feature Selector Based on Biclustering (DFSBBC)


Dragon Text Mining PWMs Generator (DTMPG) was developed for generating Position Weight Matrices (PWMs) from free text sentences. Also, DTMPG provides functionalities to normalize the generated PWMs from text and use these matrices to compute matching scores for sentences. Using DTMPG, you can view your input data, the generated PWMs and the computed matching scores.


Using a novel methodology, this tool predicts the functional class of a binding partner of transcription factor (TF) or transcription co-factor (TcoF). It consists of two models. Model 1 predicts if the interacting partner of a TF is TcoF or not. Model 2 predicts if the interacting partner of a TcoF is TF or not.
Simplified method for predicting a functional class of proteins in transcription factor complexes. Piatek MJ, Schramm MC, Burra DD, Binshbreen A, Jankovic BR, Chowdhary R, Archer JA, Bajic VB. PLoS One. 2013 Jul 12;8(7):e68857. doi: 10.1371/journal.pone.0068857. 


The DragonWFS is a feature selection (FS) web-tool based on the randomized wrapper models. It is designed for researchers who deal with high dimensional data and it is applicable to a variety of applications coming from the broad biological and biomedical research fields.
DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. Soufan O, Kleftogiannis D, Kalnis P, Bajic VB. PLoS One. 2015 Feb 26;10(2):e0117988. doi: 10.1371/journal.pone.0117988.


This tools has two components, one for NMR slice picking and the other one for resonance assignment based on picked sliced. The required inputs are CBCACONH.ucsf and HNCACB.ucsf.


Search engine for biological and medical databanks. Use it to search well over a terabyte of indexed text.


Internal wikipedia for the KAUST Computational Bioscience Research Center.