Running freebayes

Now we need to start calling more SNPs on the indiviudals that have been mapped to the reference genome. I’m breaking this up into groups of individuals (200-300) across many populations. I’m also submitting this via sbatch rather than spinning up a srun /bin/bash approach… Failed.

Published

February 21, 2023

library( tidyverse )
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library( ggrepel )
theme_set( theme_minimal() )

As a first pass through the data, I split individual sequences into different groups to run bwa. This works on an individual-by-individual basis and with 5% of each individuals genome takes roughly 30 minutes per individual (*1419 individuals …). Doing it in smaller groups allows me to run multiple instances at the same time. See the table here for the groups and current status. At the time of this writing, this process is still waiting on the final 6 groups of samples to be processed.

For a subset of the data, I did call SNPs at the same time that

Batching freebayes

To batch these I made a new directory called snpcalling and linked the bam files from a set of indiviudals in several different populations to this folder.

for bam in ../samples/O/*.bam; do  
  ln -s $bam; 
done

I then made a small runFreeBayes.sh run file that would

#!/bin/bash
ref="../samples/all005/reference.fasta"
ls *.bam > bam.fofn
freebayes --fasta-reference ${ref} --bam-list bam.fofn --vcf Output.vcf

I then made it executable

chmod +x runFreeBayes.sh

And made a slurm batch file freebayes.sub which contains

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 36
#SBATCH -t 96:00:00
#SBATCH -J freebayesbaby
#SBATCH -o freebayes.o%j
#SBATCH -e freebayes.e%j
#SBATCH --mail-user=rjdyer@vcu.edu
#SBATCH --mail-type=ALL
ulimit -s unlimited
./runFreeBayes.sh
scontrol show job $SLURM_JOB_ID

and then dumped it off for batch processing

sbatch freebayes.sub

After this, you can see that the batch job has been queued up (#34827) and waiting to go.

[rjdyer@huff snpcalling]$ squeue
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
  34827     basic freebaye   rjdyer PD       0:00      1 (Priority)
  34828     basic freebaye   rjdyer PD       0:00      1 (Priority)

So, now lets see if it actually works. This run has 293 samples and the next one (#34828) has 270. Failed because sbatch didn’t work properly.