[PWM] HOmo sapiens COmprehensive MOdel COllection (HOCOMOCO) contains transcription factor (TF) binding models represented as classic Position Weight Matrices (PWMs, also known as Position-Specific Scoring Matrices, PSSMs) and precalculated score thresholds.
The dinucleotide PWMs provide better TFBS recognition quality and are also available for selected TFs, primarily for those with ChIP-Seq data available (see HOCOMOCO-v10 paper for details).
[Motif finding; Sequence scanning] Currently the HOCOMOCO database provides PWMs and precomputed score thresholds for download. Stand-alone tools, such as SPRY-SARUS, should be used to scan a given sequence for putative TFBS. A web-interface in BioUML is also suitable for this task.
PWMs in HOCOMOCO were derived from various types of experimental data using data integration by ChIPMunk motif discovery tool.
[Quality score] Each PWM has a quality rating from A to D where A represents motifs with the highest confidence, and D motifs only weakly describe the pattern with a limited applications for quantitative analyses. Special S quality marks the single-box motifs. A flowchart on quality assignment used in HOCOMOCO-v10 is available as Supplementary Figure. The details on quality assignment can be found in 'Assembling the final collection' section of the HOCOMOCO-v10 paper.
[AUC, wAUC] The wAUC (weighted Area Under Curve for ROC) represents the power of a given model to discriminate true positive ChIP-Seq segments from random noise, wAUC provides aggregated information based on all tested ChIP-Seq data sets for a particular transcription factor. The best AUC shows the highest value reachable on a particular data set for a selected transcription factor. More details are given in Details are given in 'Model and dataset benchmarking' section in the HOCOMOCO-v10 paper.