Scripts-and-Protocols/RNAseq/Instructions/BWA.md at 5281745d92ffa80b4e8f41ec7f17061bafebbd41 · SinhaLab/Scripts-and-Protocols

#Using BWA for Mapping

##Resources

Using guide made by Mike Covington located at https://github.com/mfcovington/BIS180L-RNAseq

Documentation for tools used for mapping reads and visualizing alignments can be found online:

BWA: http://bio-bwa.sourceforge.net/bwa.shtml

Samtools: http://samtools.sourceforge.net/samtools.shtml

###Required Files

Download index file from ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG2.4_release/ITAG2.4_cds.fasta
All files that have been sorted by barcode.
1.0.batch_mapping.pl
1.1.BWA_RNAseq_mapping.pl

##Mapping reads to an indexed reference

First we are building the index.
```
 bwa index ITAG2.4_cds.fasta 
```
There should now be five new versions of the file (.amb, .ann, .bwt, .pac, sa).

Organize your files so that they are named how you would like them. In my case, I am going to name each barcode. For now I will just concatenate files with the same library and rep all in one .fq file. In order to name the libraries in order of their barcode, I wrote a script below:

 #!/usr/bin/env python
 #renameFiles.py
 #Ciera Martinez 
 #This script takes in a .csv file as a key on how to rename files in current directory.  

 import os
 import csv

 #open file
 with open('lane1Key.csv','rU') as csvfile:
         reader = csv.reader(csvfile, delimiter = ',')
         mydict = {rows[1]:rows[0] for rows in reader}

 # renaming
 for fileName in os.listdir( '.' ):
     newName = mydict.get(fileName) if mydict.get(fileName) else "empty" #can not read 'typeNone' from the keys that do not have matching files.
     list(newName)
     #print newName
     os.rename(fileName, newName)

Now I need to concatenate all the reads from each specific library/rep into one file ie combine the lanes. I used this shell command to accomplish this. I first make a key folder that contained empty names with all possible libraries.
```
mkdir libKey

ls libFastaLane1 | while read FILE; do 
touch "$FILE" 
```
Then I moved all the .fq files in to libKey. mv *.fq libKey. I repeated with each libFasta lane. mv all into libKey. If there were repeat files they were overwritten. Therefore I have a directory with all possible libraries so I could run the script below.
```
#!/bin/bash
#catFiles.sh
#This program concatenates files of the same name in multiple directories into a new file in a new directory. This takes about 30 min. 

ls libKey | while read FILE; do
  cat libFastaLane1/"$FILE" libFastaLane2/"$FILE" libFastaLane3/"$FILE" > libFastaAll/"$FILE"
done


#to append the A and B files to new files (with the same names) somewhere else.
```
Map reads in FASTQ file to FASTA reference. Need 1.0.batch_mapping.pl and 1.1.BWA_RNAseq_mapping.pl. Make sure you are specifying where you index is (from step 1) in 1.0.batch_mapping.pl script. Run the batch script whith the last argument being where your barcode files are.
```
 perl 1.0.batch_mapping.pl location/of/fasta/files
```
example
```
 perl 1.0.batch_mapping.pl mikeSplitter/
```
Now that I have all the files, I need to 1.)convert to .BAM files so I can view the reads on genome viewer and 2.) have all the .SAM files in one directory to run Sam2counts.R (aashish R script to count genes).

###1.) Convert to bam.

Gave up. Need to get this done. For now I just made a .sh script that had all the commands (toBam.sh).

Ideas to accomplish, but for some reason does not work. Why?

ls . | while read FILE; do
  samtools view -Sb "$FILE" > "$FILE"
done

###2.) Run sam2counts.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

BWA.md

Latest commit

History

BWA.md

File metadata and controls