1. Running commands with scripts
Recall in class that we were able to look at the fastq file, in a compressed format, using the program zcat. The output was printed out to the terminal. Using linux pipes were were able to count the number of lines in the file via the program wc.
> zcat myfile.fastq.gz | wc -l
The output is the number of lines in the file, not the number of records.
We also used the grep command to allow us to count only the header lines. The resulting line count represented the number of records in the file.
The task is to write a short script, in the scripting language of your choice (likely Python for most of you) to do the following. (Start with A, then see if you can add to your script to do the other tasks, which will be progressively more difficult.)
A. count the number of lines in a set of fastq files that are in a directory. For each file, print out the name of the file and the number of sequences records.
B. test to see that the number of lines in paired end files are the same and indicate this in the output.
C. measure the length of each sequence in each file, and indicate the longest and the shortest read.
D. look at the quality of the first base from each read and count how many are < 30.
some things you’ll need to know (or learn) how to do
- read directory contents
- execute a system command and capture the results
- open a text file and read into your program
- split or extract substrings from a larger string
Note that I don’t yet know if you’ve learned enough python to do all these tasks. Feel free to do this work after the course is over and you’ve learned more,or do it sooner if you do have the necessary knowledge. As I noted in class, I don’t program in pilot, but feel free to send me the scripts that you are working and I am certain I can help with issues you are having trouble with.
Assignment 1:GWAS Home Work
Assignment 2: Practising with Ensemble