This week uses the package pheatmap
, so if you haven’t installed it already uncomment the code in the chunk below and install it.
#install.packages('pheatmap')
Also, you’ll need to download the data table from the meetup website and load it in the chunk below.
data <- read_tsv('heatmap_tbl.tsv')
## Parsed with column specification:
## cols(
## gene = col_character(),
## dmso_1 = col_double(),
## dmso_2 = col_double(),
## dmso_3 = col_double(),
## treat1_1 = col_double(),
## treat1_2 = col_double(),
## treat1_3 = col_double(),
## treat2_1 = col_double(),
## treat2_2 = col_double(),
## treat2_3 = col_double()
## )
To plot a heatmap with pheatmap()
, all that’s needed is a completely numeric table. Row and column names will become labels in the heatmap.
### look at the data and make sure it's totally numeric
data
## # A tibble: 20 x 10
## gene dmso_1 dmso_2 dmso_3 treat1_1 treat1_2 treat1_3 treat2_1
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 gene1 -0.983 -0.690 0.641 -2.25 -2.64e+0 -2.52 -0.00343
## 2 gene2 -0.871 0.474 0.0122 -2.15 -2.19e+0 -1.85 0.299
## 3 gene3 -0.838 -0.666 -0.0857 -1.75 -1.44e+0 -2.00 -0.323
## 4 gene4 -0.820 0.445 -1.20 -1.42 -1.70e+0 -1.71 -0.275
## 5 gene5 -0.763 -0.375 0.0625 -2.00 -1.75e+0 -2.47 -0.254
## 6 gene6 -0.760 -0.287 0.0668 -1.84 -1.83e+0 -1.75 1.05
## 7 gene7 -0.435 1.03 -0.0966 -2.05 -9.36e-1 -1.50 0.669
## 8 gene8 -0.390 0.121 -1.66 -1.87 -1.24e+0 -1.01 1.56
## 9 gene9 -0.107 -0.505 0.652 -2.04 -6.44e-1 -1.96 1.64
## 10 gene… -0.0942 0.722 -1.23 -1.61 -1.33e+0 -1.58 1.58
## 11 gene… -0.0610 -0.115 0.0411 -1.91 -1.33e+0 -1.44 1.46
## 12 gene… -0.00656 0.280 0.238 -1.02 -8.40e-1 -0.897 1.29
## 13 gene… 0.0139 -0.702 -1.89 -0.533 -1.47e+0 -0.724 0.734
## 14 gene… 0.0953 -0.437 1.03 -1.06 -6.68e-1 -0.765 1.57
## 15 gene… 0.662 0.0429 0.638 -0.461 4.35e-2 -0.943 2.48
## 16 gene… 0.741 -0.692 1.32 -0.0387 1.71e-1 -0.754 1.73
## 17 gene… 0.817 -0.836 0.607 -0.0627 -6.86e-1 -0.620 1.42
## 18 gene… 0.940 0.774 0.300 -0.942 2.75e-1 -0.346 2.40
## 19 gene… 1.07 -0.627 0.528 -0.645 -2.35e-1 0.140 2.58
## 20 gene… 1.69 0.513 -0.834 0.869 4.45e-4 0.917 2.43
## # ... with 2 more variables: treat2_2 <dbl>, treat2_3 <dbl>
This table has genes as a column, so they’ll need to be converted to rownames in order to keep them as labels for the heatmap, or dropped.
data %>%
# tibbles don't allow rownames, so you have to convert to a dataframe first
as.data.frame() %>%
# convert the gene column to rownames
column_to_rownames('gene') %>%
# plot the heatmap
pheatmap(.)
Before playing with the appearance, save the data as a modified table that’s in the correct format for pheatmap()
data %>%
as.data.frame() %>%
column_to_rownames('gene') -> data_mat
Change the color palette for the heatmap by supplying a different R color palette to the color
argument.
### use an existing color palette like viridis
pheatmap(data_mat, color = viridis(50))
### changing the number after the palette changes how many colors/breaks there are in the scale
pheatmap(data_mat, color = viridis(5))
### pick colors for high, medium, low
# colorRampPalette takes a list of colors and makes them into a continuous palette
pheatmap(data_mat, color = colorRampPalette(c("navy", "white", "firebrick3"))(50))
Cut the heatmap to emphasize regions of interest. The cutree_*
argument cuts based on the hierarchical clustering at the level where the number of branches matches the number you supply.
### cut by columns
pheatmap(data_mat, color = magma(50), cutree_cols = 3)
### cut by rows
pheatmap(data_mat, color = magma(50), cutree_rows = 2)
### both
pheatmap(data_mat, color = magma(50), cutree_cols = 2, cutree_rows = 2)
pheatmap()
will take annotation in a separate table and add it to the heatmap. The table must be a dataframe because the rownames of the annotation table must match either the column names or the row names (whichever one you want to annotate) of the main heatmap table. We’ll set up an annotation table for the columns in the chunk below.
# make a data frame with columns listing what you want to annotate
data.frame(treat_type = c(rep('DMSO', 3), rep('treat1', 3), rep('treat2', 3)),
# make the row names of the table the same as the column names
row.names = colnames(data_mat)) -> col_anno
Add the annotation onto the heatmap with the annotation_col
argument.
pheatmap(data_mat, color = magma(50), annotation_col = col_anno)
Select your own annotation colors by making a list where the annotation categories are assigned colors.
anno_colors <- list(treat_type = c(DMSO = 'cornsilk3',
treat1 = 'orange2',
treat2 = 'midnightblue'))
Then supply the annotation color list to the annotation_colors
argument in pheatmap()
pheatmap(data_mat,
color = viridis(50),
annotation_col = col_anno,
annotation_colors = anno_colors)
pheatmap(data_mat,
color = viridis(10),
cutree_rows = 2,
cutree_cols = 2,
annotation_col = col_anno,
annotation_colors = anno_colors)
Remember from last week (hierarchical clustering) that when clustering is implemented, first a measure of similarity is calculated, then second clustering is calculated. pheatmap()
calls dist()
and hclust()
under the hood and you can select arguments to them within pheatmap()
You can change the distance calculated or calculate a correlation instead by supply different arguments to clustering_distance_rows
and/or clustering_distance_cols
. Check the documentation for both pheatmap()
and dist()
for all the options
### default
pheatmap(data_mat, clustering_distance_rows = 'euclidean')
### correlation
pheatmap(data_mat, clustering_distance_rows = 'correlation')
### or any other option from dist(), like manhattan
pheatmap(data_mat, clustering_distance_rows = 'manhattan')
You can change the clustering method by supplying a different argument to clustering_method
in pheatmap()
. Check the documentation for both pheatmap()
and hclust()
for all the options
### default
pheatmap(data_mat, clustering_method = 'complete')
### different populat method
pheatmap(data_mat, clustering_method = 'ward.D2')
### third option; see hclust() documentation for complete list
pheatmap(data_mat, clustering_method = 'average')
If you don’t want your heatmap clustered, you can set the cluster_rows
and/or cluster_cols
arguments to FALSE.
pheatmap(data_mat, cluster_rows = FALSE)
You can do kmeans clustering within pheatmap()
as well. Here it clusters rows by kmeans, then displays the aggregated rows on the heatmap.
pheatmap(data_mat, kmeans_k = 3)
Not clustering, but in the same vein, data can be centered and scaled by either rows or columns.
### default is no scaling
pheatmap(data_mat, scale = 'none')
### scale rows
pheatmap(data_mat, scale = 'row')
### scale columns
pheatmap(data_mat, scale = 'column')
geom_tile()
ggplot()
sort of has a heatmap geom, geom_tile()
. However, it only creates blocks of color and doesn’t cluster the rows and columns.
data %>% gather(sample, expression, 2:10) %>%
ggplot(aes(x = sample, y = gene, fill = expression)) +
geom_tile() +
scale_fill_viridis() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.6))
You can drastically change the appearance of a heatmap. The same data can appear totally different depending on the parameters selected as seen in the examples below.
pheatmap(data_mat,
annotation_col = col_anno,
annotation_colors = anno_colors,
cutree_rows = 2,
cutree_cols = 3)
pheatmap(data_mat,
color = viridis(50),
scale = 'column',
annotation_col = col_anno,
annotation_colors = anno_colors,
clustering_distance_rows = 'euclidean',
clustering_distance_cols = 'canberra',
cutree_rows = 3)