900字范文,内容丰富有趣,生活中的好帮手!
900字范文 > 概率论与数理统计 4 Continuous Random Variables and Probability Distributi

概率论与数理统计 4 Continuous Random Variables and Probability Distributi

时间:2021-01-10 20:43:00

相关推荐

概率论与数理统计 4 Continuous Random Variables and Probability Distributi

概率论_4.1_4.2_4.3

4.1 Probability Density FunctionsProbability Distributions for Continuous Variables4.2 Cumulative Distribution Functions and Expected Values(累积分布函数与期望值)The Cumulative Distribution FunctionUsing F(x) to Compute ProbabilitiesObtaining f(x) from F(x)Percentiles of a Continuous DistributionExpected Values4.3 The Normal Distribution(正态分布)The Standard Normal Distribution(标准正态分布)Percentiles of the Standard Normal Distributionzαz_{\alpha}zα​ Notation for z Critical ValuesNonstandard Normal DistributionsPercentiles of an Arbitrary Normal DistributionThe Normal Distribution and Discrete PopulationsApproximating the Binomial Distribution

4.1 Probability Density Functions

A discrete random variable (rv) is one whose possible values either constitute a finite set or else can be listed in an infinite sequence (a list in which there is a first element, a second element, etc.). A random variable whose set of possible values is an entire interval of numbers is not discrete.

Probability Distributions for Continuous Variables

DEFINITION:

Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a≤ba \leq ba≤b,

P(a≤X≤b)=∫abf(x)dxP(a \leq X \leq b) = \int_{a}^{b} f(x)dx P(a≤X≤b)=∫ab​f(x)dx

That is, the probability that X takes on a value in the interval [a, b] is the area above this interval and under the graph of the density function, as illustrated in the figure below. The graph of f(x) is often referred to as the density curve.

For f(x) to be a legitimate pdf, it must satisfy the following two conditions:

f(x)≥0f(x) \ge 0f(x)≥0 for all x∫−∞∞f(x)dx=areaundertheentiregraphoff(x)=1\int_{- \infin}^{\infin} f(x)dx = area \, under \, the \, entire \, graph \, of \, f(x)=1∫−∞∞​f(x)dx=areaundertheentiregraphoff(x)=1

DEFINITION:

A continuous rv X is said to have a uniform distribution on the interval [A, B] if the pdf of X is

f(x;A,B)={1B−A,A≤x≤B0,otherwisef(x;A,B) =\begin{cases} \frac{1}{B-A}, A \leq x \leq B \\0, otherwise \end{cases} f(x;A,B)={B−A1​,A≤x≤B0,otherwise​

When X is a discrete random variable, each possible value is assigned positive probability. This is not true of a continuous random variable (that is, the second condition of the definition is satisfied) because the area under a density curve that lies above any single value is zero(当X是一个离散随机变量时,每个可能的值都被赋正概率。对于连续型随机变量(即满足定义的第二个条件),这是不成立的,因为在任意单个值之上的密度曲线下的面积是零):

P(X=c)=∫ccf(x)dx=lim⁡ϵ→∞∫c−ϵc+ϵf(x)dx=0P(X=c)=\int_{c}^{c}f(x)dx=\lim_{\epsilon \to \infty} \int_{c -\epsilon}^{c+\epsilon}f(x)dx=0 P(X=c)=∫cc​f(x)dx=ϵ→∞lim​∫c−ϵc+ϵ​f(x)dx=0

The fact that P(X=c)=0 when X is continuous has an important practical consequence: The probability that X lies in some interval between a and b does not depend on whether the lower limit a or the upper limit b is included in the probability calculation(X位于a和b之间的某个区间的概率并不取决于a的下限或b的上限是否包含在概率计算中):

P(a≤X≤b)=P(a<X<b)=P(a<X≤b)=P(a≤X<b)P(a \leq X \leq b)=P(a < X < b)=P(a < X \leq b)=P(a \leq X < b) P(a≤X≤b)=P(a<X<b)=P(a<X≤b)=P(a≤X<b)

4.2 Cumulative Distribution Functions and Expected Values(累积分布函数与期望值)

The Cumulative Distribution Function

DEFINITION:

The cumulative distribution function(cdf) F(x) for a continuous rv X is defined for every number x by

F(x)=P(X≤x)=∫−∞xf(y)dyF(x)=P(X \leq x)=\int_{- \infin}^{x}f(y)dy F(x)=P(X≤x)=∫−∞x​f(y)dy

For each x, F(x) is the area under the density curve to the left of x. This is illustrated in the figure below, where F(x) increases smoothly as x increases.

Using F(x) to Compute Probabilities

PROPOSITION:

Let X be a continuous rv with pdf f(x) and cdf F(x). Then for any number a,

P(X>a)=1−F(a)P(X > a)=1-F(a) P(X>a)=1−F(a)

and for any two numbers a and b with a < b,

P(a≤X≤b)=F(b)−F(a)P(a \leq X \leq b)=F(b)-F(a) P(a≤X≤b)=F(b)−F(a)

The figure below illustrates the second part of this proposition; the desired probability is the shaded area under the density curve between a and b, and it equals the difference between the two shaded cumulative areas. This is different from what is appropriate for a discrete integer valued random variable (e.g., binomial or Poisson): P(a ≤\leq≤ X ≤\leq≤ b) = F(b) - F(a - 1) when a and b are integers.

Obtaining f(x) from F(x)

PROPOSITION:

If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F’(x) exists, F’(x)=f(x).

Percentiles of a Continuous Distribution

DEFINITION:

Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X, denoted by η(p)\eta(p)η(p), is defined by

p=F(η(p))=∫−∞η(p)f(y)dyp=F(\eta(p))=\int_{- \infin}^{\eta(p)}f(y)dy p=F(η(p))=∫−∞η(p)​f(y)dy

DEFINITION:

The median of a continuous distribution, denoted by μ~\tilde{\mu}μ~​, is the 50th percentile, so μ~\tilde{\mu}μ~​ satisfies .5=F(μ~).5=F(\tilde{\mu}).5=F(μ~​). That is, half the area under the density curve is to the left of μ~\tilde{\mu}μ~​ and half is to the right of μ~\tilde{\mu}μ~​.

A continuous distribution whose pdf is symmetric—the graph of the pdf to the left of some point is a mirror image of the graph to the right of that point—has median μ~\tilde{\mu}μ~​ equal to the point of symmetry, since half the area under the curve lies to either side of this point.

Expected Values

DEFINITION:

The expected or mean value of a continuous rvX with pdf f(x) is

μX=E(X)=∫−∞∞x⋅f(x)dx\mu_X=E(X)=\int_{-\infin}^{\infin}x\cdot f(x)dx μX​=E(X)=∫−∞∞​x⋅f(x)dx

PROPOSITION:

If X is a continuous rv with pdf f(x) and h(X) is any function of X, then

E[h(X)]=μh(X)=∫−∞∞h(x)⋅f(x)dxE[h(X)]=\mu_{h(X)}=\int_{-\infin}^{\infin}h(x)\cdot f(x)dx E[h(X)]=μh(X)​=∫−∞∞​h(x)⋅f(x)dx

DEFINITION:

The variance of a continuous random variable X with pdf f(x) and mean value μ\muμ is

σX2=V(X)=∫−∞∞(x−μ)2⋅f(x)dx=E[(X−μ)2]\sigma_{X}^{2}=V(X)=\int_{-\infin}^{\infin}(x-\mu)^2\cdot f(x)dx=E[(X-\mu)^2] σX2​=V(X)=∫−∞∞​(x−μ)2⋅f(x)dx=E[(X−μ)2]

The standard deviation (SD) of X is σX=V(X)\sigma_X=\sqrt{V(X)}σX​=V(X)​.

PROPOSITION:

V(X)=E(X2)−[E(X)]2V(X)=E(X^2)-[E(X)]^2 V(X)=E(X2)−[E(X)]2

4.3 The Normal Distribution(正态分布)

DEFINITION:

A continuous rv X is said to have a normal distribution with parameters μ\muμ and σ\sigmaσ(or μ\muμ and σ2\sigma^2σ2), where −∞<μ<∞-\infin < \mu < \infin−∞<μ<∞ and 0<σ0 < \sigma0<σ, if the pdf of X is

f(x;μ,σ)=12πσe−(x−μ)2/2σ2−∞<x<∞f(x;\mu ,\sigma)= \frac{1}{\sqrt{2 \pi \sigma}}e^{-(x-\mu)^2/2\sigma^2} \,\, -\infin < x < \infin f(x;μ,σ)=2πσ​1​e−(x−μ)2/2σ2−∞<x<∞

The statement that X is normally distributed with parameters μ\muμ and σ2\sigma^2σ2 is often abbreviated X~N(μ\muμ,σ2\sigma^2σ2).

The Standard Normal Distribution(标准正态分布)

DEFINITION:

The normal distribution with parameter values μ=0\mu=0μ=0 and σ=1\sigma=1σ=1 is called the standard normal distribution. A random variable having a standard normal distribution is called astandard normal random variable(标准正态随机变量)and will be denoted by Z. The pdf of Z is

f(z;0,1)=12πe−z2/2−∞<z<∞f(z;0,1)=\frac{1}{\sqrt{2 \pi}}e^{-z^2/2} \,\, -\infin < z < \infin f(z;0,1)=2π​1​e−z2/2−∞<z<∞

The graph of f(z; 0, 1) is calledthe standard normal (or z) curve(标准正态曲线). Itsinflection points(拐点)are at 1 and -1. The cdf of Z is P(Z≤z)=∫−∞zf(y;0,1)dyP(Z \leq z)=\int_{-\infin}^{z}f(y;0,1)dyP(Z≤z)=∫−∞z​f(y;0,1)dy which we will denote by Φ(z)\Phi(z)Φ(z).

Percentiles of the Standard Normal Distribution

For any p between 0 and 1, Appendix Table A.3 can be used to obtain the (100p)th percentile of the standard normal distribution.

zαz_{\alpha}zα​ Notation for z Critical Values

In statistical inference, we will need the values on the horizontal z axis that capture certain small tail areas under the standard normal curve.

Notation:

zαz_{\alpha}zα​ will denote the value on the z axis for which α\alphaα of the area under the z curve lies to the right of zαz_{\alpha}zα​.

The zα′sz_{\alpha}'szα′​s are usually referred to asz critical values(z临界值). Table 4.1 lists the most useful z percentiles and values.

Nonstandard Normal Distributions

When X∼N(μ,σ2)X \sim N(\mu,\sigma^2)X∼N(μ,σ2), probabilities involving X are computed by “standardizing”.The standardized variable(标准化变量)is (X−μ)/σ(X - \mu)/\sigma(X−μ)/σ. Subtracting μ\muμ shifts the mean from μ\muμ to zero, and then dividing by σ\sigmaσ scales the variable so that the standard deviation is 1 rather than σ\sigmaσ.

PROPOSITION:

If X has a normal distribution with mean and standard deviation , then

Z=X−μσZ=\frac{X-\mu}{\sigma} Z=σX−μ​

has a standard normal distribution. Thus

P(a≤X≤b)=P(a−μσ≤Z≤b−μσ)=Φ(b−μσ)−Φ(a−μσ)P(a \leq X \leq b)=P(\frac{a-\mu}{\sigma}\leq Z \leq \frac{b-\mu}{\sigma}) = \Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma}) P(a≤X≤b)=P(σa−μ​≤Z≤σb−μ​)=Φ(σb−μ​)−Φ(σa−μ​)

P(X≤a)=Φ(a−μσ)P(X≥b)=1−Φ(b−μσ)P(X \leq a)=\Phi(\frac{a-\mu}{\sigma}) \hspace{1cm} P(X\ge b)=1-\Phi(\frac{b-\mu}{\sigma}) P(X≤a)=Φ(σa−μ​)P(X≥b)=1−Φ(σb−μ​)

If the population distribution of a variable is (approximately) normal, then

Roughly 68% of the values are within 1 SD of the mean.Roughly 95% of the values are within 2 SDs of the mean.Roughly 99.7% of the values are within 3 SDs of the mean.

Percentiles of an Arbitrary Normal Distribution

The (100p)th percentile of a normal distribution with mean μ\muμ and standard deviation σ\sigmaσ is easily related to the (100p)th percentile of the standard normal distribution.

PROPOSITION:

(100p)thpercentilefornormal(μ,σ)=μ+[(100p)thforstandardnormal]⋅σ{(100p)th \, percentile \, for \, normal \, (\mu,\sigma)}=\mu +[(100p)th \, for \, standard \, normal] \cdot \sigma (100p)thpercentilefornormal(μ,σ)=μ+[(100p)thforstandardnormal]⋅σ

The Normal Distribution and Discrete Populations

The normal distribution is often used as an approximation to the distribution of values in a discrete population(正态分布常被用作离散总体中数值分布的近似值). In such situations, extra care should be taken to ensure that probabilities are computed in an accurate manner.

The correction for discreteness of the underlying distribution(对底层分布离散性的校正)is often called acontinuity correction(连续性校正). It is useful in the following application of the normal distribution to the computation of binomial probabilities.

Approximating the Binomial Distribution

PROPOSITION:

Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with μ=np\mu=npμ=np and σ=npq\sigma=\sqrt{npq}σ=npq​. In particular, for x=a possible value of X,

P(X≤x)=B(x,n,p)≈(areaunderthenormalcurvetotheleftofx+.5)=Φ(x+.5−npnpq)P(X \leq x)=B(x,n,p) \approx (area \, under \, the \, normal \, curve \, to \, the \, left \, of \, x + .5)=\Phi(\frac{x+.5-np}{\sqrt{npq}}) P(X≤x)=B(x,n,p)≈(areaunderthenormalcurvetotheleftofx+.5)=Φ(npq​x+.5−np​)

In practice, the approximation is adequate provided that both np≥10np \ge 10np≥10 and nq≥10nq \ge 10nq≥10, since there is then enough symmetry in the underlying binomial distribution

概率论与数理统计 4 Continuous Random Variables and Probability Distributions(连续随机变量与概率分布)(上篇)

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。