Homework 2 (L3)
Homework Assignment #2 (50 points)
For this assignment you will have until 5PM on Monday, January 27th to submit on Brightspace. Late assignments will NOT be accepted.
Directions for Students:
Open a new Microsoft Word Document and submit answers to the questions below. The first four lines of your document should contain the following:
- Your name
- MMG3320/5320
- Today's date
- Homework Assignment #2
Part A: Practice using Less
-
This is a multi-part question:
a. Navigate into the
genomics_data
folder.b. Use the
less
command to open up the fileEncode-hesc-Nanog.bed
.c. Use the shortcut to get to the end of the file.
d. Search for the string
chr11
.e. Report two rows that start with
chr11
. Include the start and end position in your answer.Exit the
less
buffer. -
Print to screen the last 5 lines of the file
Encode-hesc-Nanog.bed
. Submit a screenshot of the output as your answer. -
How many commands have you typed after going through this exercise? Submit a screenshot of the output as your answer.
Part B: Generating your own script
You got the following line of codes from a trusted source but need to modify it so you can submit it to the VACC-Bluemoon server. You decide its time to make your own script. Follow the steps below:
-
Create a new file in the
other
directory calledscript.sh
.-
The .sh file extension typically indicates that a file is a shell script.
-
In Unix-like operating systems (such as Linux and macOS), shell scripts are plain text files containing a sequence of commands that can be executed by a shell.
-
-
Paste in the code below to
script.sh
.STAR --runThreadN 4 \ --runMode genomeGenerate \ --genomeDir /username/chr1_hg19_STAR_index/ \ --genomeFastaFiles /username/reference_data_ensembl/Homo_sapiens.GRCh19.dna.chromosome.1.fa \ --sjdbGTFfile /username/reference_data_ensembl/Homo_sapiens.GRCh19.gtf
-
Replace every occurrence of "username" with your netid.
-
Delete the line containing
--runMode
-
Change the
--runThreadN
from 4 to 6 -
You would also like to use the newest genome assembly, human reference 38 (hg38/GRCh38). Change this as well in your script.
-
Submit a screenshot of your script in the Nano buffer as homework Part B.
Save the file and EXIT.
Please Take Note:
-
The argument
--genomeDir
is pointing to an entire directory while--genomeFastaFiles
is pointing to a specific file. This is really important as the program is looking for specific files or entire directories (with files in them!) to run successfully. -
Each line here ends with a
\
. The\
can also be used as an escape character that signals that the character following it has a special meaning in this case its a continuation.