1 Packageの読み込み
2 データを読み込む
- 2.1 smokeデータについて
3 スクリプト内でデータを作る
4 CAを実行する前に：行分析
5 対応分析の処理
- 5.1 helper factoextra を使って描画
- 5.2 “colprincipal” は、列を主座標、行を標準座標。
6 Resultへのアクセス
- 6.1 まず、一覧表を！
7 χ二乗検定
- 7.1 元表の慣性(inertia)を計算
8 CAのresultの詳細を見る
- 8.1 resultのリスト構造表示 strは、structure
9 まず、慣性（inertia）に注目！
10 寄与率（contribution）を見る
11 行の標準座標を求めてみます。
- 11.1 行について、標準座標と主座標をあわせてplotしてみる
12 改めてresultを調べましょう
- 12.1 固有値λ
- 12.2 factoextraのfunctionで表示
13 行に関するresult
- 13.1 行慣性
- 13.2 列慣性
14 explorによる動的検討
15 参考

1 Packageの読み込み

library(tidyverse) # dplyr,ggplot2 などRによるデータ処理の基本ツール群
library(readxl) # Excelファイルの読み込み
library(ggrepel) # グラフで文字表示
library(FactoMineR)# CA、MCAなど対応分析function
library(factoextra)# FactoMineRのresultを図示するhelperパッケージ
library(vcd)# mosaci plotなどのカテゴリカルデータ表示function群
library(RColorBrewer) # カラーコードのセット
library(showtext) # mosaic やggplot2で日本語を表示させる
showtext_auto(TRUE)

2 データを読み込む

2.1 smokeデータについて

『対応分析の理論と実践』の第9章「２次元表示」p65〜の事例

行変数

SM シニアマネージャー（年配管理職）
JM ジュニアマネージャー（若手管理職）
SE シニアエンプロイイー（年配社員）
JE ジュニアエンプロイイー（若手社員）
SC セクレタリー（秘書）

列変数

none
light
medium
heavy

3 スクリプト内でデータを作る

このデータ(smoke)は、Greenacre2017でも紹介されている「標準データ」。しかし、内容は架空。
Excelでも提供されているが（最後の参考でExcelを読み込む方法を説明）、ここではスクリプト内で記述することにした。

     喫煙レベル
職制 none light medium heavy Sum
  SM     4     2      3     2  11
  JM     4     3      7     4  18
  SE    25    10     12     4  51
  JE    18    24     33    13  88
  SC    10     6      7     2  25
  Sum   61    45     62    25 193

3.1 データを記述する

.d <- matrix(c(4,2,3,2,     #　c()でデータの内容を行方向（byrow=TRUE）で記述し、5,4の行列matrixとして生成している。
               4,3,7,4,
               25,10,12,4,
               18,24,33,13,
               10,6,7,2),
             byrow=TRUE, 5,4)
# 上の処理では、セルの値は入っているが、行名、列名がないので、dimnamesで記述
dimnames(.d) <- list(職制=c("SM","JM","SE","JE","SC"), 
                    喫煙レベル=c("none","light","medium","heavy"))
.d

##     喫煙レベル
## 職制 none light medium heavy
##   SM    4     2      3     2
##   JM    4     3      7     4
##   SE   25    10     12     4
##   JE   18    24     33    13
##   SC   10     6      7     2

.d %>% addmargins() # 行和、列和を追記した表にした。総数N=193になっている。

##      喫煙レベル
## 職制 none light medium heavy Sum
##   SM     4     2      3     2  11
##   JM     4     3      7     4  18
##   SE    25    10     12     4  51
##   JE    18    24     33    13  88
##   SC    10     6      7     2  25
##   Sum   61    45     62    25 193

3.2 周辺度数を得る margin.table(margin=1,2)

.d %>% margin.table(1) -> 職制度数
.d %>% margin.table(2) -> 喫煙レベル
職制度数

## 職制
## SM JM SE JE SC 
## 11 18 51 88 25

喫煙レベル

## 喫煙レベル
##   none  light medium  heavy 
##     61     45     62     25

3.3 棒グラフで分布を確認

showtext_auto(FALSE)
oldpar <- par(mfrow=c(1,2)) # グラフを1行2列で表示する設定
barplot(職制度数,main="職制の分布")
barplot(喫煙レベル,main="喫煙レベルの分布")

par(oldpar) #もとに戻している。
showtext_auto(TRUE)

4 CAを実行する前に：行分析

各職制ごとの「喫煙程度」を調べる

4.1 行分析

拡張された帯棒グラフであるmosaic plotを使う。これは、vcd::mosaicで、baseに入っているmosaicplotよりも機能が拡張されている。

vcd::mosaic(.d, main="smokeの行分析")

4.2 参考までに列分析：各喫煙レベル内の職制割合

vcd::mosaic(t(.d), main="smokeの列分析")

4.3 期待値との残差を図示

vcd::mosaic(.d, main="smokeの行分析",shade=TRUE)

4.4 期待値表示

mosaic(.d,type="expected")

4.5 期待値との残差

4.5.1 shade=TRUE で残差を色分けで表示できる。

あわせて、セルの値を表示した。

vcd::mosaic(.d, main="smokeの行分析",shade=TRUE,
            labeling = labeling_values) # セルに値を表示させた。

4.5.2 セルには、計算した値（たとえば、行比率）を表示することもできる。

手順

prop.table(.d,margin = 1) %>% `*`(100) %>% round(1) -> .d.pp
.d.pp

##     喫煙レベル
## 職制 none light medium heavy
##   SM 36.4  18.2   27.3  18.2
##   JM 22.2  16.7   38.9  22.2
##   SE 49.0  19.6   23.5   7.8
##   JE 20.5  27.3   37.5  14.8
##   SC 40.0  24.0   28.0   8.0

tab <- ifelse(.d.pp < 3, NA, .d.pp) 

mosaic(.d, pop = FALSE, shade=TRUE)
labeling_cells(text = tab,clip = FALSE)(as.table(.d))

4.6 【参考】セルの色分けもできます

fill_color <- matrix(rep(brewer.pal(4,"Set3"),5),byrow=TRUE,5,4) # 5行4列の色コードセット

mosaic(.d,gp=gpar(fill=fill_color,col="grey"),main="色わけ版",pop=FALSE)　# col=で枠の色を選べます。0なし、1黒、2赤...="grey"と文字で色しても可能。
labeling_cells(text = tab,clip = FALSE)(as.table(.d)) # セルに数値を書くときは、親functionで、pop=FALSE を指定。

5 対応分析の処理

CA functionは、FactoMiner::CA　を使います。
そのresultは、res.CAに格納されています。

res.CA <- FactoMineR::CA(.d,graph = FALSE) # CAではグラフを描画しない。
plot(res.CA,title="職制と喫煙レベルの対応分析") # これで描画

5.1 helper factoextra を使って描画

5.2 “colprincipal” は、列を主座標、行を標準座標。

つまり、行ポイント（職制）が張る空間に列ポイントをplotしている。

fviz_ca(res.CA,map="colprincipal",arrow=c(TRUE,FALSE),title="行：標準座標、列：主座標") + coord_fixed(ratio = 1)

##　“rowprincipal” は、行変数は主座標。

ということは列変数は、標準座標。標準座標がつくった（張った？）空間に、行ポイントをプロットしている。 - これは、列分析

fviz_ca(res.CA,map="rowprincipal",arrow=c(FALSE,TRUE),title="行：主座標、列：標準座標") + coord_fixed(ratio = 1)　#

6 Resultへのアクセス

resultは、リスト（データセットをまとめたもの）になっています。
それぞれ、固有値に関係するもの（$eig）、列に関係するもの（$col）、行に関係するもの（$row）、CAを呼び出したデータに関するもの($call) としてまとめられている。
res.CAとたたくと、その一覧がでてきます。

6.1 まず、一覧表を！

summary(res.CA)

## 
## Call:
## FactoMineR::CA(X = .d, graph = FALSE) 
## 
## The chi square of independence between the two variables is equal to 16.44164 (p-value =  0.1718348 ).
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3
## Variance               0.075   0.010   0.000
## % of var.             87.756  11.759   0.485
## Cumulative % of var.  87.756  99.515 100.000
## 
## Rows
##          Iner*1000    Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## SM     |     2.673 | -0.066  0.330  0.092 |  0.194 21.356  0.800 |  0.071
## JM     |    11.881 |  0.259  8.366  0.526 |  0.243 55.115  0.465 | -0.034
## SE     |    38.314 | -0.381 51.201  0.999 |  0.011  0.300  0.001 | -0.005
## JE     |    26.269 |  0.233 33.097  0.942 | -0.058 15.177  0.058 |  0.003
## SC     |     6.053 | -0.201  7.006  0.865 | -0.079  8.052  0.133 | -0.008
##           ctr   cos2  
## SM     69.433  0.107 |
## JM     25.619  0.009 |
## SE      1.698  0.000 |
## JE      1.205  0.000 |
## SC      2.045  0.001 |
## 
## Columns
##          Iner*1000    Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## none   |    49.186 | -0.393 65.400  0.994 |  0.030  2.934  0.006 | -0.001
## light  |     7.059 |  0.099  3.085  0.327 | -0.141 46.317  0.657 |  0.022
## medium |    12.610 |  0.196 16.562  0.982 | -0.007  0.174  0.001 | -0.026
## heavy  |    16.335 |  0.294 14.954  0.684 |  0.198 50.575  0.310 |  0.026
##           ctr   cos2  
## none    0.061  0.000 |
## light  27.282  0.016 |
## medium 51.140  0.017 |
## heavy  21.517  0.005 |

この一覧表の値は、小数点以下３桁で丸めてあるので、参照して計算したい場合などは、res.CAのリストにアクセスすればよい。

7 χ二乗検定

chisq.test(.d)

## Warning in chisq.test(.d): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  .d
## X-squared = 16.442, df = 12, p-value = 0.1718

#chisq.test(.d) %>% str

7.1 元表の慣性(inertia)を計算

χ^2値をNで割る。

total_inertia <- chisq.test(.d)$statistic / sum(.d) # 慣性 x N  = $\chi ^2$ 
names(total_inertia) <- "慣性(inertia)"
total_inertia

## 慣性(inertia) 
##    0.08518986

8 CAのresultの詳細を見る

8.1 resultのリスト構造表示 strは、structure

str(res.CA)

## List of 5
##  $ eig : num [1:3, 1:3] 7.48e-02 1.00e-02 4.14e-04 8.78e+01 1.18e+01 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:3] "dim 1" "dim 2" "dim 3"
##   .. ..$ : chr [1:3] "eigenvalue" "percentage of variance" "cumulative percentage of variance"
##  $ call:List of 9
##   ..$ X        :'data.frame':    5 obs. of  4 variables:
##   .. ..$ none  : num [1:5] 4 4 25 18 10
##   .. ..$ light : num [1:5] 2 3 10 24 6
##   .. ..$ medium: num [1:5] 3 7 12 33 7
##   .. ..$ heavy : num [1:5] 2 4 4 13 2
##   ..$ marge.col: Named num [1:4] 0.316 0.233 0.321 0.13
##   .. ..- attr(*, "names")= chr [1:4] "none" "light" "medium" "heavy"
##   ..$ marge.row: Named num [1:5] 0.057 0.0933 0.2642 0.456 0.1295
##   .. ..- attr(*, "names")= chr [1:5] "SM" "JM" "SE" "JE" ...
##   ..$ ncp      : num 3
##   ..$ row.w    : num [1:5] 1 1 1 1 1
##   ..$ excl     : NULL
##   ..$ call     : language FactoMineR::CA(X = .d, graph = FALSE)
##   ..$ Xtot     :'data.frame':    5 obs. of  4 variables:
##   .. ..$ none  : num [1:5] 4 4 25 18 10
##   .. ..$ light : num [1:5] 2 3 10 24 6
##   .. ..$ medium: num [1:5] 3 7 12 33 7
##   .. ..$ heavy : num [1:5] 2 4 4 13 2
##   ..$ N        : num 193
##  $ row :List of 4
##   ..$ coord  : num [1:5, 1:3] -0.0658 0.259 -0.3806 0.233 -0.2011 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:5] "SM" "JM" "SE" "JE" ...
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ contrib: num [1:5, 1:3] 0.33 8.37 51.2 33.1 7.01 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:5] "SM" "JM" "SE" "JE" ...
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ cos2   : num [1:5, 1:3] 0.0922 0.5264 0.999 0.9419 0.8653 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:5] "SM" "JM" "SE" "JE" ...
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ inertia: num [1:5] 0.00267 0.01188 0.03831 0.02627 0.00605
##  $ col :List of 4
##   ..$ coord  : num [1:4, 1:3] -0.3933 0.0995 0.1963 0.2938 0.0305 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:4] "none" "light" "medium" "heavy"
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ contrib: num [1:4, 1:3] 65.4 3.08 16.56 14.95 2.93 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:4] "none" "light" "medium" "heavy"
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ cos2   : num [1:4, 1:3] 0.99402 0.32673 0.98185 0.6844 0.00597 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:4] "none" "light" "medium" "heavy"
##   .. .. ..$ : chr [1:3] "Dim 1" "Dim 2" "Dim 3"
##   ..$ inertia: num [1:4] 0.04919 0.00706 0.01261 0.01633
##  $ svd :List of 3
##   ..$ vs: num [1:4] 2.73e-01 1.00e-01 2.03e-02 5.84e-17
##   ..$ U : num [1:5, 1:3] -0.241 0.947 -1.392 0.852 -0.735 ...
##   ..$ V : num [1:4, 1:3] -1.438 0.364 0.718 1.074 0.305 ...
##  - attr(*, "class")= chr [1:2] "CA" "list"

9 まず、慣性（inertia）に注目！

行空間の慣性（res.CA$row$inertiaの和）と列空間の慣性（res.CA$col$inertiaの和）は等しく、これは、元表の慣性とも同じ値（$\chi ^2/N$）
行、列の両空間の慣性が、各カテゴリポイントに分解されている。その割合は、ctr（寄与率）でわかる。

9.1 行慣性

res.CA$row$inertia

## [1] 0.002672932 0.011881177 0.038314129 0.026268627 0.006052995

res.CA$row$inertia %>% sum

## [1] 0.08518986

9.2 列慣性

res.CA$col$inertia

## [1] 0.049186258 0.007058828 0.012610242 0.016334533

res.CA$col$inertia %>% sum

## [1] 0.08518986

このように、行慣性と列慣性は同じ。それは、元のデータ表がもっていた慣性と同じ。

9.3 固有値（eignevalue）

res.CA$eig %>% addmargins(1)

##         eigenvalue percentage of variance cumulative percentage of variance
## dim 1 0.0747591059             87.7558731                          87.75587
## dim 2 0.0100171805             11.7586535                          99.51453
## dim 3 0.0004135741              0.4854734                         100.00000
## Sum   0.0851898605            100.0000000                         287.27040

第1列にある、dim1〜dim3が体現する慣性の総和が、全体の慣性になっている。

10 寄与率（contribution）を見る

列和（軸ごとの和）はみな、100%

res.CA$row$contrib %>% addmargins(1)

##           Dim 1       Dim 2      Dim 3
## SM    0.3297658  21.3557601  69.433113
## JM    8.3658712  55.1150552  25.618603
## SE   51.2005549   0.2997604   1.698418
## JE   33.0973947  15.1772191   1.204516
## SC    7.0064134   8.0522052   2.045351
## Sum 100.0000000 100.0000000 100.000000

10.1 軸１上の距離は、軸１の座標

行の座標をみる

res.CA$row$coord

##          Dim 1       Dim 2        Dim 3
## SM -0.06576838  0.19373700  0.070981028
## JM  0.25895842  0.24330457 -0.033705190
## SE -0.38059489  0.01065991 -0.005155757
## JE  0.23295191 -0.05774391  0.003305371
## SC -0.20108912 -0.07891123 -0.008081076

10.2 距離（座標）と質量（周辺割合）から慣性を計算し、軸のcontribution を計算する

caik = rifik^2/λk2

diag(res.CA$call$marge.row) %*% res.CA$row$coord ^2  %*% diag(res.CA$eig[,1]^(-1))

##             [,1]        [,2]       [,3]
## [1,] 0.003297658 0.213557601 0.69433113
## [2,] 0.083658712 0.551150552 0.25618603
## [3,] 0.512005549 0.002997604 0.01698418
## [4,] 0.330973947 0.151772191 0.01204516
## [5,] 0.070064134 0.080522052 0.02045351

10.3 行慣性（inertia）の分解

列方向に加算すると、Dimごとの慣性（固有値）と同じ値
行方向に加算すると、各行ポイント（職制ポイント）ごとの慣性（これが、$row$inertial）
総和（つまり、行和の和、もしくは列和の和）は、全体の慣性（=分散）に等しい。
これが、行空間での分散の分解

(diag(res.CA$call$marge.row) %*% res.CA$row$coord ^2 ) %>% addmargins() -> rowInertia
rownames(rowInertia) <- c(rownames(.d),"Sum")
rowInertia

##           Dim 1        Dim 2        Dim 3         Sum
## SM  0.000246530 2.139245e-03 2.871574e-04 0.002672932
## JM  0.006254251 5.520975e-03 1.059519e-04 0.011881177
## SE  0.038277077 3.002754e-05 7.024216e-06 0.038314129
## JE  0.024743316 1.520329e-03 4.981565e-06 0.026268627
## SC  0.005237932 8.066039e-04 8.459041e-06 0.006052995
## Sum 0.074759106 1.001718e-02 4.135741e-04 0.085189860

10.4 mosaic plot で分布の様子を見る。

#mosaic(rowInertia[1:5,1:3],main = "相対的寄与率cos2")

fill_color <- matrix(rep(brewer.pal(3,"Set3"),5),byrow=TRUE,5,3) # 5行4列の色コードセット

mosaic(rowInertia[1:5,1:3],gp=gpar(fill=fill_color,col="grey"),main="色わけ版",pop=FALSE)　# col=で枠の色を選べます。0なし、1黒、2赤...="grey"と文字で色しても可能。

#labeling_cells(text = tab,clip = FALSE)(as.table(.d.mrx)) # セルに数値を書くときは、親functionで、pop=FALSE を指定。

fill_color <- matrix(rep(brewer.pal(5,"Set3"),3),byrow=TRUE,3,5) # 3行5列の色コードセット

mosaic(t(rowInertia[1:5,1:3]),gp=gpar(fill=fill_color,col="grey"),main="絶対的寄与率ctr",rot_labels = c(left = 0, top = 45,right=0),)

10.5 行の座標を取得したければ、res.CA$row$coord。

res.CA$row$coord

##          Dim 1       Dim 2        Dim 3
## SM -0.06576838  0.19373700  0.070981028
## JM  0.25895842  0.24330457 -0.033705190
## SE -0.38059489  0.01065991 -0.005155757
## JE  0.23295191 -0.05774391  0.003305371
## SC -0.20108912 -0.07891123 -0.008081076

11 行の標準座標を求めてみます。

主座標F =　行標準座標$\Phi D_{\alpha}$ ですから、右から$D_{\alpha}^{−1}$をかけます。

rowstdcoord <- res.CA$row$coord %*% diag(1/sqrt(res.CA$eig[,1]))  # 特異値αは、固有値（eig）の平方根

11.1 行について、標準座標と主座標をあわせてplotしてみる

x0 <- 0
y0 <- 0
res.CA %>% fviz_ca_row(map="symmetric") + geom_point(aes(x=rowstdcoord[,1],y=rowstdcoord[,2])) +
  geom_text_repel(aes(x=rowstdcoord[,1],y=rowstdcoord[,2],label = str_c(rownames(.d),"_std")), size = 4) +
  annotate("segment",x=rowstdcoord[1,1],xend = x0,y=rowstdcoord[1,2],yend = y0,linetype = 3) +
  annotate("segment",x=rowstdcoord[2,1],xend = x0,y=rowstdcoord[2,2],yend = y0,linetype = 3) +
  annotate("segment",x=rowstdcoord[3,1],xend = x0,y=rowstdcoord[3,2],yend = y0,linetype = 3) +
  annotate("segment",x=rowstdcoord[4,1],xend = x0,y=rowstdcoord[4,2],yend = y0,linetype = 3) +
  annotate("segment",x=rowstdcoord[5,1],xend = x0,y=rowstdcoord[5,2],yend = y0,linetype = 3) +
  coord_fixed(ratio = 1)

  #  geom_line(aes(x=rowstdcoord[,1],y=rowstdcoord[,2]),colour="blue",linetype = 3)

res.CA %>% fviz_ca_row(map="colprincipal",arrow=c(TRUE,FALSE)) + geom_point(aes(x=res.CA$row$coord[,1],y=res.CA$row$coord[,2])) + coord_fixed(ratio=1)

12 改めてresultを調べましょう

12.1 固有値λ

res.CA$eig

##         eigenvalue percentage of variance cumulative percentage of variance
## dim 1 0.0747591059             87.7558731                          87.75587
## dim 2 0.0100171805             11.7586535                          99.51453
## dim 3 0.0004135741              0.4854734                         100.00000

第１列：固有値そのもの特異値２乗が固有値。
第２列：固有値合計に対する各次元の割合。
第３列：割合の累積比率。

このデータは、３軸で100％の分散を体現。元の行列は、５行、4列。その場合のデータの次元は、少ない方の数字−1次元で100%を表現。つまり３次元。

12.2 factoextraのfunctionで表示

res.CA %>% fviz_eig(addlabels = TRUE)

これだと、累積％がでてない。表をみればいいのですが…

res.CA$eig %>% as_tibble %>% rename(eig=1,eigprop=2,cumprop=3) %>% ggplot(aes(x=1:3,y=eigprop)) + geom_col(fill="lightblue") + 
  geom_point(aes(x=1:3,y=cumprop)) + geom_line(aes(x=1:3,y=cumprop)) + 
  geom_text_repel(aes(y=cumprop,label = round(cumprop,2),), size = 3)

このeig（固有値）の表、グラフから、１、２軸を評価すれば、元データ表の分散の99.5％を把握できていることがわかる。

13 行に関するresult

res.CA$row

## $coord
##          Dim 1       Dim 2        Dim 3
## SM -0.06576838  0.19373700  0.070981028
## JM  0.25895842  0.24330457 -0.033705190
## SE -0.38059489  0.01065991 -0.005155757
## JE  0.23295191 -0.05774391  0.003305371
## SC -0.20108912 -0.07891123 -0.008081076
## 
## $contrib
##         Dim 1      Dim 2     Dim 3
## SM  0.3297658 21.3557601 69.433113
## JM  8.3658712 55.1150552 25.618603
## SE 51.2005549  0.2997604  1.698418
## JE 33.0973947 15.1772191  1.204516
## SC  7.0064134  8.0522052  2.045351
## 
## $cos2
##         Dim 1        Dim 2        Dim 3
## SM 0.09223203 0.8003363898 0.1074315846
## JM 0.52639991 0.4646824609 0.0089176266
## SE 0.99903295 0.0007837197 0.0001833323
## JE 0.94193412 0.0578762422 0.0001896393
## SC 0.86534551 0.1332569974 0.0013974967
## 
## $inertia
## [1] 0.002672932 0.011881177 0.038314129 0.026268627 0.006052995

3つでてきました。

$coord は、各職制ポイントの座標です（主座標）
$contrib は、それぞれの職制ポイントが、各軸の生成にどれだけ寄与（contibution）しているかをみています。列を加算すると100になるのは、表示が％だからです。

res.CA$row$contrib %>% addmargins(margin = 1)

##           Dim 1       Dim 2      Dim 3
## SM    0.3297658  21.3557601  69.433113
## JM    8.3658712  55.1150552  25.618603
## SE   51.2005549   0.2997604   1.698418
## JE   33.0973947  15.1772191   1.204516
## SC    7.0064134   8.0522052   2.045351
## Sum 100.0000000 100.0000000 100.000000

cos2 は、各ポイントがどの軸で表現されているかを示しています。ですから行（職制）で加算すると、1になります。

res.CA$row$cos2 %>% addmargins(margin = 2)

##         Dim 1        Dim 2        Dim 3 Sum
## SM 0.09223203 0.8003363898 0.1074315846   1
## JM 0.52639991 0.4646824609 0.0089176266   1
## SE 0.99903295 0.0007837197 0.0001833323   1
## JE 0.94193412 0.0578762422 0.0001896393   1
## SC 0.86534551 0.1332569974 0.0013974967   1

inertia は、行（各職制）ポイントの慣性（分散）です。

res.CA$row$inertia %>% sum() -> .tmp
.tmp * .tmp

## [1] 0.007257312

.d %>% sum()*(.tmp *.tmp)

## [1] 1.400661

(chisq.test(.d)$statistic / .d %>% sum()) %>% sqrt()

## Warning in chisq.test(.d): Chi-squared approximation may be incorrect

## X-squared 
##  0.291873

13.1 行慣性

res.ca <- res.CA
bb<-round(cbind.data.frame(res.ca$call$marge.row, 
                             sqrt(res.ca$row$inertia/res.ca$call$marge.row), 
                             res.ca$row$inertia,
                             res.ca$row$inertia/sum(res.ca$row$inertia)),4)
colnames(bb)<-c("Weight","Distance","Inertia","% of inertia")
bb %>% summarize_all(.funs = sum)

##   Weight Distance Inertia % of inertia
## 1      1   1.4105  0.0853       1.0001

bb

##    Weight Distance Inertia % of inertia
## SM 0.0570   0.2166  0.0027       0.0314
## JM 0.0933   0.3569  0.0119       0.1395
## SE 0.2642   0.3808  0.0383       0.4497
## JE 0.4560   0.2400  0.0263       0.3084
## SC 0.1295   0.2162  0.0061       0.0711

bb %>% summarize_all(.funs = sum)

##   Weight Distance Inertia % of inertia
## 1      1   1.4105  0.0853       1.0001

13.2 列慣性

res.ca <- res.CA
bb<-round(cbind.data.frame(res.ca$call$marge.col, 
                             sqrt(res.ca$col$inertia/res.ca$call$marge.col), 
                             res.ca$col$inertia,
                             res.ca$col$inertia/sum(res.ca$col$inertia)),4)
colnames(bb)<-c("Weight","Distance","Inertia","% of inertia")
bb

##        Weight Distance Inertia % of inertia
## none   0.3161   0.3945  0.0492       0.5774
## light  0.2332   0.1740  0.0071       0.0829
## medium 0.3212   0.1981  0.0126       0.1480
## heavy  0.1295   0.3551  0.0163       0.1917

bb %>% summarize_all(.funs = sum)

##   Weight Distance Inertia % of inertia
## 1      1   1.1217  0.0852            1

14 explorによる動的検討

library(explor)
explor(res.CA)

15 参考

##　Excelから読み込む

read_excle で読み込むとオブジェクトの属性がtibbleなので、as.matrixでmatrixに変更。

.d0 <- read_excel("All_DataSetv0.9/smoke.xls")

## New names:
## • `` -> `...1`

.d0 %>% rename("rn"=1) %>% column_to_rownames("rn") -> .d01 #CAでは dataframe にしないといけないので、rn列を行名にします。
.d01 # 確認

##    none light medium heavy
## SM    4     2      3     2
## JM    4     3      7     4
## SE   25    10     12     4
## JE   18    24     33    13
## SC   10     6      7     2

.d01 %>% as.matrix() -> .d

喫煙データ（smoke）をCAする

/Users/kazuo/Dropbox/RStudio/計量分析セミナー2023/CA_step1_smoke.Rmd

kazuo.fujimoto2007@gmail.com

Sun Sep 3 21:20:52 JST 2023