Helpful Tips HW7
Helpful Tips for HW#7
Step 1: Is Trimming Required or Not?
Determine whether your samples require trimming. If trimming is necessary, we covered this process here.. In this tutorial, you were asked to make a copy of the trimmomatic_exercise/
folder. This folder contains a subfolder called trimmomatic_adapters/
. You will need to specify the path to trimmomatic_adapters/
along with the appropriate FASTA file containing the adapters to trim. See your options below:

Step 2: Aligning using HISAT2
During Week 6, we covered the aligner HISAT2. There were two key exercises:
HISAT2_example
(L11) – For single-end (SE) samplesHISAT2_modify
(L13) – For paired-end (PE) samples
Use the appropriate script based on your sample type:
- SE samples → Use the script in
HISAT2_example
- PE samples → Use the modified script in
HISAT2_modify
Tip
Need a refresher? Review the slides on Alignment Outputs (L15) to revisit the differences between these scripts.
During alignment, you will need to specify the location to the Genomic Index. More information regarding the indexes and where to find them can be found here.
Warning
You will ONLY need to specify the LOCATION/PATH of the Genomic Index when running HISAT2. Please DO NOT make a copy of these indexes! They are extremely large and should not be duplicated.
Strandedness: Don't Forget this Step!
-
Before aligning your full dataset, run the alignment on one sample first to determine the strandedness of your samples.
-
If you do not specify
--rna-strandness
during alignment, HISAT2 assumes unstranded data, which can lead to incorrect alignments and downstream errors.
How to Specify Strandedness in HISAT2
--rna-strandness <string>
For single-end reads, use F
or R
.
F
means a read corresponds to a transcript.R
means a read corresponds to the reverse complement.
For paired-end reads, use either FR
or RF
.
Tip
Need a refresher? Review the slides on Alignment Outputs (L15) to revisit the differences between these scripts.
RSeQC and MultiQC Struggles
MultiQC sometimes struggles with processing RSeQC outputs.
+ To determine strandedness, continue using the [Sequera.io MultiQC](https://seqera.io/multiqc/) website.
Step 3: Counting Reads
- Your
htseq_2025_demo
folder contains the script you created to run htseq-count on your samples. - Each sample takes approximately 30 minutes to process.
- To be safe, request
24:00:00
for time.
- To be safe, request
- More information regarding the GTF file and its location can be found: