Heatmaps

Prep

This week uses the package pheatmap, so if you haven’t installed it already uncomment the code in the chunk below and install it.

#install.packages('pheatmap')

Also, you’ll need to download the data table from the meetup website and load it in the chunk below.

data <- read_tsv('heatmap_tbl.tsv')

## Parsed with column specification:
## cols(
##   gene = col_character(),
##   dmso_1 = col_double(),
##   dmso_2 = col_double(),
##   dmso_3 = col_double(),
##   treat1_1 = col_double(),
##   treat1_2 = col_double(),
##   treat1_3 = col_double(),
##   treat2_1 = col_double(),
##   treat2_2 = col_double(),
##   treat2_3 = col_double()
## )

The Basics

To plot a heatmap with pheatmap(), all that’s needed is a completely numeric table. Row and column names will become labels in the heatmap.

### look at the data and make sure it's totally numeric
data

## # A tibble: 20 x 10
##    gene    dmso_1  dmso_2  dmso_3 treat1_1 treat1_2 treat1_3 treat2_1
##    <chr>    <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1 gene1 -0.983   -0.690   0.641   -2.25   -2.64e+0   -2.52  -0.00343
##  2 gene2 -0.871    0.474   0.0122  -2.15   -2.19e+0   -1.85   0.299  
##  3 gene3 -0.838   -0.666  -0.0857  -1.75   -1.44e+0   -2.00  -0.323  
##  4 gene4 -0.820    0.445  -1.20    -1.42   -1.70e+0   -1.71  -0.275  
##  5 gene5 -0.763   -0.375   0.0625  -2.00   -1.75e+0   -2.47  -0.254  
##  6 gene6 -0.760   -0.287   0.0668  -1.84   -1.83e+0   -1.75   1.05   
##  7 gene7 -0.435    1.03   -0.0966  -2.05   -9.36e-1   -1.50   0.669  
##  8 gene8 -0.390    0.121  -1.66    -1.87   -1.24e+0   -1.01   1.56   
##  9 gene9 -0.107   -0.505   0.652   -2.04   -6.44e-1   -1.96   1.64   
## 10 gene… -0.0942   0.722  -1.23    -1.61   -1.33e+0   -1.58   1.58   
## 11 gene… -0.0610  -0.115   0.0411  -1.91   -1.33e+0   -1.44   1.46   
## 12 gene… -0.00656  0.280   0.238   -1.02   -8.40e-1   -0.897  1.29   
## 13 gene…  0.0139  -0.702  -1.89    -0.533  -1.47e+0   -0.724  0.734  
## 14 gene…  0.0953  -0.437   1.03    -1.06   -6.68e-1   -0.765  1.57   
## 15 gene…  0.662    0.0429  0.638   -0.461   4.35e-2   -0.943  2.48   
## 16 gene…  0.741   -0.692   1.32    -0.0387  1.71e-1   -0.754  1.73   
## 17 gene…  0.817   -0.836   0.607   -0.0627 -6.86e-1   -0.620  1.42   
## 18 gene…  0.940    0.774   0.300   -0.942   2.75e-1   -0.346  2.40   
## 19 gene…  1.07    -0.627   0.528   -0.645  -2.35e-1    0.140  2.58   
## 20 gene…  1.69     0.513  -0.834    0.869   4.45e-4    0.917  2.43   
## # ... with 2 more variables: treat2_2 <dbl>, treat2_3 <dbl>

This table has genes as a column, so they’ll need to be converted to rownames in order to keep them as labels for the heatmap, or dropped.

data %>%
# tibbles don't allow rownames, so you have to convert to a dataframe first
  as.data.frame() %>%
# convert the gene column to rownames
  column_to_rownames('gene') %>%
# plot the heatmap
  pheatmap(.)

Modifying Appearance

Before playing with the appearance, save the data as a modified table that’s in the correct format for pheatmap()

data %>%
  as.data.frame() %>%
  column_to_rownames('gene') -> data_mat

Change the color palette for the heatmap by supplying a different R color palette to the color argument.

### use an existing color palette like viridis
pheatmap(data_mat, color = viridis(50))

### changing the number after the palette changes how many colors/breaks there are in the scale
pheatmap(data_mat, color = viridis(5))

### pick colors for high, medium, low
# colorRampPalette takes a list of colors and makes them into a continuous palette
pheatmap(data_mat, color = colorRampPalette(c("navy", "white", "firebrick3"))(50))

Cut the heatmap to emphasize regions of interest. The cutree_* argument cuts based on the hierarchical clustering at the level where the number of branches matches the number you supply.

### cut by columns
pheatmap(data_mat, color = magma(50), cutree_cols = 3)

### cut by rows
pheatmap(data_mat, color = magma(50), cutree_rows = 2)

### both
pheatmap(data_mat, color = magma(50), cutree_cols = 2, cutree_rows = 2)

Add color annotation to row/columns

pheatmap() will take annotation in a separate table and add it to the heatmap. The table must be a dataframe because the rownames of the annotation table must match either the column names or the row names (whichever one you want to annotate) of the main heatmap table. We’ll set up an annotation table for the columns in the chunk below.

# make a data frame with columns listing what you want to annotate
data.frame(treat_type = c(rep('DMSO', 3), rep('treat1', 3), rep('treat2', 3)), 
# make the row names of the table the same as the column names
           row.names = colnames(data_mat)) -> col_anno

Add the annotation onto the heatmap with the annotation_col argument.

pheatmap(data_mat, color = magma(50), annotation_col = col_anno)

Select your own annotation colors by making a list where the annotation categories are assigned colors.

anno_colors <- list(treat_type = c(DMSO = 'cornsilk3', 
                                   treat1 = 'orange2', 
                                   treat2 = 'midnightblue'))

Then supply the annotation color list to the annotation_colors argument in pheatmap()

pheatmap(data_mat, 
         color = viridis(50), 
         annotation_col = col_anno, 
         annotation_colors = anno_colors)

Combine everything

pheatmap(data_mat, 
         color = viridis(10),
         cutree_rows = 2,
         cutree_cols = 2,
         annotation_col = col_anno, 
         annotation_colors = anno_colors)

Clustering options

Remember from last week (hierarchical clustering) that when clustering is implemented, first a measure of similarity is calculated, then second clustering is calculated. pheatmap() calls dist() and hclust() under the hood and you can select arguments to them within pheatmap()

Change the distance calculation

You can change the distance calculated or calculate a correlation instead by supply different arguments to clustering_distance_rows and/or clustering_distance_cols. Check the documentation for both pheatmap() and dist() for all the options

### default
pheatmap(data_mat, clustering_distance_rows = 'euclidean')

### correlation
pheatmap(data_mat, clustering_distance_rows = 'correlation')

### or any other option from dist(), like manhattan
pheatmap(data_mat, clustering_distance_rows = 'manhattan')

Change the clustering method

You can change the clustering method by supplying a different argument to clustering_method in pheatmap(). Check the documentation for both pheatmap() and hclust() for all the options

### default
pheatmap(data_mat, clustering_method = 'complete')

### different populat method
pheatmap(data_mat, clustering_method = 'ward.D2')

### third option; see hclust() documentation for complete list
pheatmap(data_mat, clustering_method = 'average')

No clustering

If you don’t want your heatmap clustered, you can set the cluster_rows and/or cluster_cols arguments to FALSE.

pheatmap(data_mat, cluster_rows = FALSE)

kmeans

You can do kmeans clustering within pheatmap() as well. Here it clusters rows by kmeans, then displays the aggregated rows on the heatmap.

pheatmap(data_mat, kmeans_k = 3)

Scaling

Not clustering, but in the same vein, data can be centered and scaled by either rows or columns.

### default is no scaling
pheatmap(data_mat, scale = 'none')

### scale rows
pheatmap(data_mat, scale = 'row')

### scale columns
pheatmap(data_mat, scale = 'column')

`geom_tile()`

ggplot() sort of has a heatmap geom, geom_tile(). However, it only creates blocks of color and doesn’t cluster the rows and columns.

data %>% gather(sample, expression, 2:10) %>% 
  ggplot(aes(x = sample, y = gene, fill = expression)) + 
    geom_tile() +
    scale_fill_viridis() +
    theme(axis.text.x = element_text(angle = 45, vjust = 0.6))