13th November 2013: release v22.214.171.124. Get it here. Minor release with updates to the 1000 Genomes Phase3 pipeline. These changes will migrate into run_calls in a forthcoming release.
4th August 2013: release v126.96.36.199. I have again made two releases in quick succession. These are small incremental releases. I've modified the VCF filtering now, depending on whether a joitn or independent workflow is being used. See the Release Notes for details. Also there is now a --subsample option, which is sometimes useful to examine how things (eg power) varies with coverage. I'm afraid v188.8.131.52 had slightly broken VCF filtering, hence the rapid new release.
10th July 2013: reissued release v184.108.40.206. Apologies everyone - on July 1st I mis-bundled the v220.127.116.11 zip file and left out one of the new scripts. I've just fixed this and reissued the release.
1st July 2013: release v18.104.22.168. I've just brought out version 22.214.171.124 and then very rapidly brought out v.126.96.36.199 also. There are some small bugfixes (see the release notes), but the main benefits of this release are: more robust choice of error-cleaning threshold, in the face of real-life heterogeneous data. run_calls now looks at the coverage distribution (as always) and makes a better choice of threshold. It also dumps a PDF for each sample, showing the distribution and where the line was drawn. There is also an important bugfix, fixing a bug I introduced by reverting a previous fix - essentially I was not allowing error-cleaning of contigs longer than a SNP - this can affect high diversity species as well as microbial samples where you often have low coverage contigs of the host species.
21 June 2013: release v188.8.131.52. This is a minor release. Bugfixes include: a bug in VCF filters which were removing calls where the ref allele in the reference FASTA was lower case, and various memory leaks. For those interested in testing beta code, there are early versions of the new Cortex error correction code (used in 1000 Genomes Project) and a simple pan-genome analysis function so you can compare a set of known genes with a set of samples, to see which samples have which genes/plasmids/whatever. Both of these functions (including the user-interface) will change in the next releases though, so bear that in mind.
6 February 2013: release v184.108.40.206. This is a bugfix release. Run_calls now allows you to specify the ascii offset if you are parsing non-standard FASTQ quality encoding. It also now copes better when being given a dataset with multiple samples with different read-lengths - if for example some have 50bp reads, and others have 100bp reads, it copes properly when the user has specified going up to k>50, so some samples have empty graphs. Some installation issues reported by a few users have been fixed by Isaac Turner. Some bugs in the 1000 Genomes Phase2b Cortex pipeline have been caught and fxed by Chunlin Xiao - this pipeline has been used so far to call variants on 1500 humans (mean depth 5x), so you might be interested if you want to work on a population of samples with large genomes. See the link elsewhere on this page (on the right) for docs on the pipeline itself. Various other small fixes, plus moved to using htslib for parsing bams. See the release notes for further details.
20th November 2012: New Cortex paper out! The next Cortex paper, "High-throughput microbial population genomics using the Cortex variation assembler" is out at Bioinformatics, early access. Get it here. Check it out to find out how we call at multiple kmers, the different discovery workflows you can use, different ways to integrate a reference assembly into your work, and how to scale to thousands of microbial samples. Also worth taking a look at the two case studies - for microbial samples the relatively high coverage compared with vertebrate sequencing, plus the relatively low repeat content, means you can attain comparable sensitivity with assembly as you do with mapping, plus of course the lower FDR, better access to non-SNP variation, and in fact the ability to compare a sample with the entire pan-genome of known sequence, not just a reference.
14 November 2012: release v220.127.116.11. Get it here.. This is a bugfix release, fixing a crash (segfault) that could be caused when you set --max_var_len to very large (megabase) values, plus a few other small changes. See the release notes for further details.
12 October 2012: release v18.104.22.168. This release introduces support for reading GZIPPED FASTQ (and FASTA) and BAM files (alignment information in the BAM is ignored). There are also a number of bugfixes, notably for a nasty bug which created invalid graph binaries. I've also updated the description of the 1000 Genomes Cortex pipeline here . There are a few UI changes - the --format option has been removed, --max_read_len is now only needed if you use --gt. Also the install process has been updated; you should now only need to run the install shell script, and then compile Cortex itself. run_calls should now work without needing to set any environment variables also.
23rd August 2012: Bugfix release v22.214.171.124. The main change in this release is in the scripts/1000genomes directory, which I have not advertised previously. It contains scripts for running Cortex on large numbers (tens, hundreds) of samples with large genomes - i.e. for the 1000 Genomes project. These are to allow collaborators across the world to reliably run a consistent Cortex pipeline on human populations. However this is the first time people other than me have done this, so I expect there may be some smoothing-out of issues in the near future. You can see a PDF describing the pipeline here. I've had enough people ask me about running Cortex on lots of samples with big genomes, that I thought people would find it useful to see the process.
This release is a bugfix for a script in that 1000 Genomes directory, plus fixes for a few potential bugs-in-waiting (array overflow errors) in Cortex itself.
17th August 2012: Bugfix release v126.96.36.199. Thanks to Akdes Serin for finding some bugs in run_calls.pl, and Fernando Cruz, for implicitly pointing out that the INSTALL file had some text referring to an old release. If you are not using run_calls.pl, there is NO benefit to upgrading to this release.
14th August 2012: I've just put a new release v188.8.131.52 up on Sourceforge. Various new features for Cortex itself (genotyping separate from discovery, novel sequence calling), considerable performance improvement thanks to some changes made by Isaac Turner (e.g if I/O is not a bottleneck, loading a human reference genome binary (k=31) now takes 15 minutes where it used to take 45 minutes). However the biggest change in this release is the introduction of a wrapper script that allows you to automate an entire analysis across many samples (from fastq to VCF). The manual has been completely overhauled also.
9th January 2012: The Cortex paper is now out in Nature Genetics! Check it out for a detailed description of our methods for variant discovery and genotyping.
3rd November 2011: We have just released v184.108.40.206 - new features are: better error cleaning, genotype calls and likelihoods, allows dumping of subgraphs found by alignment of sequence to the graph. There is a new dependency on the GNU Scientific Library, which for simplicity I have bundled with cortex_var - this means the zip is a lot bigger than before (about 24Mb). Apologies for this - I'll pare this down in future releases. Note I have modified the binary header for binary files slightly - Cortex will continue to be able to read old binaries.