Example of creating gene catalogs for public shotgun metagenomes

Using the advanced search function at EBI to create a list of fastq files for metagenomes whose taxonomy was restricted to metagenomes and shotgun sequencing platform was restricted to paired-end Illumina sequencing technology, as shown in the following query:
tax_tree(408169) AND library_layout="PAIRED" AND instrument_platform="ILLUMINA" AND library_strategy="WGS" AND library_source="METAGENOMIC" AND nominal_length>=100 AND base_count>=200000000

Download metagenomes using wget and GridFTP, while keeping track of the ENA run, sample, project, and metagenome taxon identifiers.

Read cleaning using Bbduk (http://jgi.doe.gov/data-and-tools/bb-tools/):
$bbduk in=$r1 in2=$r2 overwrite=true prealloc=t k=23 ktrim=r mink=11 hdist=1 qtrim=rl trimq=10 minlength=60 threads=4 ref=adapters.fa,phix174_ill.ref.fa.gz tbo tpe |$repair.sh in=stdin.fq out=$r1.trim.filt_1.fq.gz out2=$r2.trim.filt_2.fq.gz outs=$r.single.fq.gz overwrite=true"

$megahit -1 $r1.trim.filt_1.fq.gz -2 $r1.trim.filt_2.fq.gz --continue $ksteps -t 64 --min-contig-len 500 --out-prefix $sample.min500 -o $assemblyDir/$sample.megahit

prodigal -c -p m -i $infile -a $outdir/$id\_prodigal.aa -d $outdir/$id\_prodigal.fna -f gff -o $outdir/$id\_prodigal.gff -m -q

cd-hit-est -i $in.fna -o $out.fna -c 0.95 -T 32 -M 0 -G 0 -aS 0.9 -g 1 -r 1 -d 0 -s 0.8

$bbmap threads=8 in=$r1.trim.filt_1.fq.gz in2=$r1.trim.filt_2.fq.gz ref=$geneCatalog nodisk rpkm=$fpkm ambig=toss idfilter=0.9 tossbrokenreads