Introduction

Feature counting is the process of quantifying the number of sequencing reads that map to specific genomic features, such as genes, exons, or transcripts. It is a critical step in RNA sequencing analysis, as it transforms raw alignment data into meaningful metrics for gene expression studies. Accurate feature counting requires precise mapping of reads and careful handling of overlaps, ambiguous reads, and multimapping. The resulting counts are used for downstream analyses, such as differential expression or functional annotation, providing insights into biological processes and gene regulation. This step bridges the gap between raw sequencing data and interpretable biological results.

Feature Counts

Get the data

For this hands-on we could continue from previous exercise 02_mapping. You will need aligned bam and bai (index) files and gtf file.

If you’re unable to finish previous example, you can work on precomputed data. Just copy them to your designated working folder.

Don’t forget to ask for interactive job!

qsub -I -l select=1:ncpus=4:mem=8gb:scratch_local=10gb -l walltime=1:00:00
cd /path/to/your/working/directory 

cp -r /storage/brno2/home/bartonv/E5444/featureCounts/input ${PWD}

Inspect copied data if it contains everything needed.

featureCounts

We will continue with actual counting. Please add featureCounts from package subread, and run it on your data.

module add subread

# inspect featureCounts options
featureCounts --help

# run featureCounts on our data
featureCounts \
  -t exon \
  -g gene_id \
  -p -P \
  -C \
  -s 0 \
  -T 4 \
  -Q 0 \
  -d 1 \
  -D 25000 \
  -a genomic.gtf \
  -o example.featureCounts \
  *Aligned.sortedByCoord.out.bam

Questions:

  1. What does parameter -t do? How is it connected to the gtf file?

  2. What does parameters -d and -D mean? How would you correct its values?

Results

Inspect the contents of the file:

cat example.featureCounts.summary | head

cat example.featureCounts | head

Questions:

  1. What information do you find useful in the summary file?

  2. What information does the featureCounts output file give you? What is the meaning of individual columns?

Differential expression analysis

Get the data

To continue with example of processing featureCounts data, please copy precomputed counts for a full experiment.

cd /path/to/your/working/directory 

cp -r /storage/brno2/home/bartonv/E5444/featureCounts/input_full ${PWD}

Inspect copied data.

Questions:

  1. What information does design.txt file provide?

  2. How the counts file is organised?

R processing

For R processing, we use onDemand app by metacentrum. Login in and select RStudio.

Please select 1 hour, 2 CPU, 8GB of memory, 0 GPU, 10GB of scratch local. Select the newest version of R (4.4.1) and working directory according to your preference.

Questions:

  1. Is the FBgn0003360 gene differentially expressed because of the treatment? If yes, how much?

  2. Is the Pasilla gene (ps, FBgn0261552) downregulated by the RNAi treatment?

  3. Where is the most over-expressed gene located? What is the name of the gene?

  4. How many genes have a significant change in gene expression between conditions?