CAT (Composition Analysis Toolkit) is a software package that includes a novel measure of codon usage bias, Codon Deviation Coefficient (CDC). Unlike previous measures, CDC effectively accounts for background nucleotide composition in estimating codon usage bias and utilizes a bootstrap assessment of the statistical significance of codon usage bias.
Zhang, Z., Li, J., Cui, P., Ding, F., Li, A., Townsend, J.P., and Yu, J. (2011) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance, under review.
The CAT package (version 1.0), including source codes, compiled executables, and documentation, is freely available for academic use only.
CAT version 1.0 for MAC OS with Graphical User Interface (GUI) can be downloaded here. Below is a screenshot:
- Copyright & License
CAT is distributed as open-source software and licensed under the GNU General Public License (Version 3; http://www.gnu.org/licenses/gpl.txt), in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Commercial use of CAT requires a special contract.
For high efficiency and compatibility with more platforms, CAT is written in standard C++. The package is normally named CATXXX.tar.gz (XXX stands for the version).
- 4.1 Compiled Executables
Executables have been precompiled for Linux/Unix/Mac/Windows. Please unpack the package of CATXXX.tar.gz (see below) and then you will find compiled executables in the folder of "CATXXX/bin/".
- 4.2 Linux/Unix/Mac/Windows
For compilation on your specific platform, please follow the steps below.
(1) Unpack the package of CATXXX.tar.gz by the following commands.
(2) If you use other Linux/Unix/Mac OS, you have to compile the program in the source codes folder with the help of g++/gcc compiler.
That's it. Then you can find an executable named "CAT" in this folder.
Note for Mac users: Mac on your computer might use the case insensitive file system, so that "CAT" would have the completely "same" name with a system command "cat". When running the "CAT" program, please specify the working directory of "CAT" for access.
- Setting Parameters
CAT allows the user to customize parameters. The following are the parameters' settings, which can also be found by typing "CAT -h".
- -i input fasta file name [string, required]
- -o output file name [string, optional], default = input file name with the characters ".cat" appended
- -b bootstrap replications [integer, optional], default = 10000
- -c genetic code to be used [integer, optional], default = 1. All genetic codes are available at NCBI.
- Input File
CAT accepts FASTA file (http://www.ncbi.nlm.nih.gov/BLAST/fasta.shtml) which contains multiple nucleotide coding sequences. Stop codons are eliminated from the analysis.
An example data file as well as its results file accompanies the CAT package in the folder "CATXXX/example/".
- Format of Output
CAT output is in the form of a tab-delimited text file with one header row. Each row thereafter displays the results for each single gene, including columns with gene ID and gene length (bp), GC and purine contents, the estimates of CDC and its significance level P-value. In addition, the observed and expected compositions of nucleotides, codons and amino acids are also provided.
The description for each column is listed as follows.
- ID, Length: Gene ID and the length of the Gene.
- GC, AG: GC content and purine content.
- GCi, AGi: GC content and purine content at codon position i, i=1,2,3
- CDC: Codon Deviation Coefficient as a measure of codon usage bias
- P(CDC): P-value of CDC
In addition, observed and expected compositions for nucleotide (3*4), codon (64) and amino acid (20) are also outputted.
The output file name, by default, will be same as the original input file name with the characters ".cat" appended. In addition, the output file name can also be customized by setting the parameter "-o output_filename". Please see details in the section of "Setting Parameters".
We thank Joe Yu for constructive comments on this work and George Marselis for providing assistance on web page hosting. We also thank many users for reporting bugs and sending suggestions.
- Contact Information
Please send bugs or advice to Dr. Zhang Zhang (firstname.lastname@example.org, email@example.com).