第二章 母集団と標本

2.1 母集団と標本の関係

In [1]:
sapply(c("pipeR", "dplyr", "tidyr", "ggplot2"), require,character.only=TRUE)
Loading required package: pipeR
Loading required package: dplyr

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading required package: tidyr
Warning message:
"package 'tidyr' was built under R version 3.3.3"Loading required package: ggplot2
Warning message:
"package 'ggplot2' was built under R version 3.3.3"
pipeR
TRUE
dplyr
TRUE
tidyr
TRUE
ggplot2
TRUE
  • 全数調査は難しい場合が多いので,抽出調査する(無作為抽出)
  • 母集団の設定
    • 任意性と一般性のトレードオフ

2.2 標本の性質

In [2]:
A <- c(3, 4, 4, 5, 5, 5, 5, 6, 6, 7)
B <- c(1, 2, 4, 4, 5, 5, 5, 6, 8, 10)
(df1 <- data.frame(SampleA = A, SampleB = B))
SampleASampleB
3 1
4 2
4 4
5 4
5 5
5 5
5 5
6 6
6 8
7 10
In [14]:
options(repr.plot.width = 6, repr.plot.height = 3)
In [15]:
df1 %>>% gather(k, v) %>>% 
    ggplot(aes(v)) + 
    geom_histogram(binwidth = 1) + 
    facet_wrap(~k)
In [9]:
A <- c(43.3, 43.1, 42.6, 42.4, 42.2, 41.8, 41.7, 41.6, 41.5, 41.4, 
       40.8, 40.6, 40.5, 40.4, 40.4, 40.3, 40.2, 39.9, 39.9, 39.8,
       39.7, 39.6, 39.6, 39.5, 39.4, 39.3, 38.9, 38.9, 38.8, 38.8,
       38.7, 38.7, 38.6, 38.6, 38.5, 38.4, 38.3, 38.2, 38.1, 38.1,
       37.6, 37.4, 37.1, 37.8, 37.6, 37.5, 37.4, 37.3, 37.2, 37.1,
       37.1, 36.6, 36.5, 36.5, 36.4, 36.3, 36.2, 36.1, 35.4, 35.3,
       35.2, 35.1, 35.1, 34.7, 34.3, 34.2, 33.2, 33.1, 32.7, 31.5)
B <- c(47.3, 46.1, 45.6, 45.1, 44.5, 44.4, 43.7, 42.6, 42.5, 42.5, 
       41.4, 41.8, 41.6, 41.5, 40.7, 40.5, 40.4, 40.3, 40.1, 40.1,
       39.9, 39.8, 39.7, 39.6, 39.6, 39.5, 39.4, 38.9, 38.8, 38.8,
       38.7, 38.7, 38.6, 38.4, 38.2, 38.1, 38.1, 37.8, 37.7, 37.5,
       37.5, 37.4, 37.3, 37.3, 37.1, 36.8, 36.8, 36.7, 36.6, 36.4,
       36.2, 35.4, 35.4, 35.4, 35.3, 35.2, 34.9, 34.8, 34.7, 34.7,
       33.9, 33.8, 33.7, 33.3, 33.1, 32.8, 32.5, 32.1, 31.7, 29.5)
In [16]:
data.frame(A = A, B = B) %>>% gather(k, v) %>>% 
    ggplot(aes(v)) + 
    geom_histogram(binwidth = 1) + 
    facet_wrap(~k)

2.2.1

  • 平均値,中央値,最頻値

2.2.2

  • 平方和
  • 標本分散: データの個数に依存しない
    • n or n-1
  • 標準偏差
In [17]:
sd(A)
2.59325610491384
In [18]:
sd(B)
3.68342940395402
  • 範囲

2.2.3 不偏分散

2.3 Rで計算

In [ ]: