5  ggplotによるグラフ作成

Rのベース機能でも様々な図を描くことが可能であるが,ggplot2パッケージは美しい図を描く上で非常に役立つ.ggplot2による描画はデータ分析にRを用いる上での重要な特徴といえる.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(palmerpenguins)
library(ggthemes)
library(ggpubr)
library(patchwork)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
ggplot()

ggplot(data = penguins)

ggplot(data = penguins, aes(x = flipper_length_mm))

ggplot(data = penguins, aes(x = flipper_length_mm)) + 
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

ggplot(data = penguins, aes(x = flipper_length_mm)) + 
  geom_density()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

ggplot(data = penguins, aes(x = flipper_length_mm)) + 
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

ggplot(data = penguins, aes(y = flipper_length_mm)) + 
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

x軸の数値をなくしたければx = ""とする.

ggplot(data = penguins, aes(x = "", y = flipper_length_mm)) + 
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

pacthworkパッケージを用いることで図のオブジェクトを+でつなぐことで簡単に 図を結合することができる.次ではp1p2p3という図のオブジェクトを作成し それらをp1 + p2 + p3として結合している.

p1 <- ggplot(data = penguins, aes(x = "", y = flipper_length_mm)) + 
  geom_jitter() + 
  ggtitle(label = "(A) Distribution")
p2 <- ggplot(data = penguins, aes(y = flipper_length_mm)) + 
  geom_histogram() + 
  ggtitle(label = "(B) Histogram")
p3 <- ggplot(data = penguins, aes(x = "", y = flipper_length_mm)) + 
  geom_boxplot() + 
  ggtitle(label = "(C) Boxplot")
p1 + p2 + p3
Warning: Removed 2 rows containing missing values (`geom_point()`).
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

p4 <- ggplot(data = penguins, aes(x = flipper_length_mm)) + 
  geom_histogram() + 
  ggtitle(label = "(B) Histogram")
(p1 + p3) / p4
Warning: Removed 2 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

5.1 度数分布

各値の度数を棒グラフを作成する.

ggplot(data = penguins, aes(x = species)) + 
  geom_bar()

各値の度数を集計し,その値を指定して棒グラフを作成する.

penguins |> 
  count(species) |> 
  ggplot(aes(x = species, y = n)) + 
  geom_col()

6 2変数

6.1 カテゴリカル変数と連続変数

geom_densitygeom_histogramは1つの変数の分布について図示するが, 別のカテゴリカル変数の値別に図示することで, カテゴリ変数によって連続変数の分布がどのように異なるのかを 明らかにすることができる.

groupで変数を指定すると,変数の値別に図を作成する. ただし色,線の種類,塗りつぶしなどが全て同じになるため,凡例は作成されない.

ggplot(data = penguins, aes(x = flipper_length_mm,
                            group = species)) + 
  geom_density()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

どの図がどのカテゴリカル変数の値に対応しているのかを明らかにしたい場合は, 色,線の種類,塗りつぶしをカテゴリカル変数の値によって変更する.

色(color)をspeciesの値で分ける.

ggplot(data = penguins, aes(x = flipper_length_mm,
                            color = species)) + 
  geom_density() + 
  scale_color_colorblind()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

線の種類(linetype)speciesの値で分ける.

ggplot(data = penguins, aes(x = flipper_length_mm,
                            linetype = species)) + 
  geom_density()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

塗りつぶし(fill)をspeciesの値で分ける.

ggplot(data = penguins, aes(x = flipper_length_mm,
                            fill = species)) + 
  geom_density(alpha = .5) + 
  scale_fill_colorblind()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

色,線の種類,塗りつぶしをspeciesの値で分ける.ただしどれか1つを指定すれば, どのカテゴリカル変数の値と図が対応しているのかは分かるので,ここまでする必要はないだろう.

ggplot(data = penguins, aes(x = flipper_length_mm,
                            fill = species,
                            color = species,
                            linetype = species)) + 
  geom_density(alpha = .5) + 
  scale_color_colorblind() + 
  scale_fill_colorblind()
Warning: Removed 2 rows containing non-finite values (`stat_density()`).

カテゴリカル変数をx軸に, 連続変数をy軸に設定して図を描く.

散布図を描く.

ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) + 
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

箱ひげ図を描く.

ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) + 
  geom_boxplot()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) + 
  geom_boxplot() + 
  geom_jitter(color = "skyblue",
              alpha = .7)
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = species, y = flipper_length_mm, color = species)) + 
  geom_boxplot() + 
  geom_jitter(color = "skyblue",
              alpha = .7) + 
  scale_color_colorblind()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = species, y = flipper_length_mm, fill = species)) + 
  geom_boxplot() + 
  geom_jitter(color = "skyblue",
              alpha = .7) + 
  scale_fill_colorblind()
Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = species, y = flipper_length_mm)) + 
  stat_summary(fun.data = "mean_cl_boot")
Warning: Removed 2 rows containing non-finite values (`stat_summary()`).

平均値と信頼区間を求める.

penguins |> 
  summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
            SD = sd(flipper_length_mm, na.rm = TRUE),
            n = sum(!is.na(flipper_length_mm)),
            ll = Mean + qt(0.025, df = n - 1) * SD / sqrt(n),
            ul = Mean + qt(0.975, df = n - 1) * SD / sqrt(n))
# A tibble: 1 × 5
   Mean    SD     n    ll    ul
  <dbl> <dbl> <int> <dbl> <dbl>
1  201.  14.1   342  199.  202.

カテゴリカル変数を.byでグループ変数として指定し,平均値と信頼区間を求める.

tab_flipper_length_mm_by_species <- 
  penguins |> 
  summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
            SD = sd(flipper_length_mm, na.rm = TRUE),
            n = sum(!is.na(flipper_length_mm)),
            ll = Mean + qt(0.025, df = n - 1) * SD / sqrt(n),
            ul = Mean + qt(0.975, df = n - 1) * SD / sqrt(n),
            .by = species)
tab_flipper_length_mm_by_species
# A tibble: 3 × 6
  species    Mean    SD     n    ll    ul
  <fct>     <dbl> <dbl> <int> <dbl> <dbl>
1 Adelie     190.  6.54   151  189.  191.
2 Gentoo     217.  6.48   123  216.  218.
3 Chinstrap  196.  7.13    68  194.  198.

集計した値をもとに,図を描く.

tab_flipper_length_mm_by_species |> 
  ggplot(aes(x = species, y = Mean, ymin = ll, ymax = ul)) + 
  geom_pointrange()

6.2 連続変数と連続変数

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

警告が出ないようにするためにはgeom_point(na.rm = TRUE)とする.

別の変数の値によって色を変更する.

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm,
                            color = species)) + 
  geom_point()
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm,
                            color = species)) + 
  geom_point() + 
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point(aes(color = species)) + 
  geom_smooth(aes(color = species), 
              method = lm) + 
  geom_smooth(method = lm)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm,
                            group =  interaction(year, species, sep = ": "))) + 
  geom_point(aes(shape = species, color = species)) + 
  geom_smooth(aes(color = species,
                  linetype = factor(year)), 
              method = lm) +
  labs(linetype = "year") +
  scale_color_colorblind()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

7 変数の値別にパネルを作成

facet_wrapfacet_gridを用いる.

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point() + 
  geom_smooth(method = lm) + 
  facet_wrap(vars(species), 
             labeller = "label_both")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point(data = penguins |> select(-species), 
             color = "grey") +   # facet_warpの影響がないようにデータからspeciesを削除して散布図を描く
  geom_point() + 
  geom_smooth(method = lm) + 
  facet_wrap(vars(species), 
             labeller = "label_both")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 6 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point(data = penguins |> select(-species), 
             color = "grey") + 
  geom_point() + 
  geom_smooth(method = lm) + 
  facet_wrap(vars(year, species), 
             labeller = "label_both")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 6 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point(data = penguins |> select(-species), 
             color = "grey") + 
  geom_point() + 
  geom_smooth(method = lm) + 
  facet_grid(species ~ year, 
             labeller = "label_both")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 6 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = flipper_length_mm)) + 
  geom_point(data = penguins |> select(-species), 
             color = "grey") + 
  geom_point(aes(color = factor(year)), alpha = .5) + 
  geom_smooth(method = lm, 
              color = "black", 
              linewidth = .5) + 
  facet_wrap(vars(species), 
             labeller = "label_both") +
  labs(color = "year") + 
  scale_color_colorblind()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 6 rows containing missing values (`geom_point()`).
Warning: Removed 2 rows containing missing values (`geom_point()`).

8 x軸の重なりを回避する

penguins |> 
  ggplot(aes(x = interaction(island, sex, species, sep = "\n"),
             y = bill_length_mm,
             color = factor(year))) + 
  geom_jitter(alpha = .5) + 
  labs(x = "Island, Sex, and Ppecies") + 
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) + 
  theme(legend.position = "bottom",
        legend.margin = margin(t = -5, r = 0, b = 0, l = 0, unit = "pt"),
        legend.key.width = unit(40, "pt")) +
  scale_color_colorblind()
Warning: Removed 2 rows containing missing values (`geom_point()`).