Chapter 8 .pdf
Original filename: Chapter 8.pdf
This PDF 1.3 document has been generated by Canon SC1011 / MP Navigator EX, and has been sent on pdf-archive.com on 14/01/2016 at 11:46, from IP address 63.241.x.x.
The current document download page has been viewed 384 times.
File size: 79.6 MB (120 pages).
Privacy: public file
Download original PDF file
Document preview
VIII. ANALYZE
BOK
vt.A.2
MEASURING RELATIONSHIPS/REGRESSION
Analyze
Analyze is presented in the following topic areas:
.
.
.
.
Measuring and modeling relationships between variables
Hypottresis testing
Failure mode and effects analysis (FMEA)
Additional analysis methods
Measuring and Modeling Relationships Between Variables
Measuring and modeling relationships between variables is reviewed in the
following topic areas:
.
.
.
Regression
Correlationcoefficient
Multivariate tools
Note that the authors have presented regression ahead of the correlation coefficient
for explanation
Regression
Simple linear regression and multiple linear regression will be discussed here. Note
that non-linear regression models will not be tested on the CSSBB exam, and will not
be desdribed here. The material on linear regression may be found in several
statistics books, including Triola (1994)24.
Simple Linear Regression Model
Consider the problem of predicting the test results (y) for students based upon an
input variable (x), the amount of preparation time in hours using the data presented
in Table 8.1. Please note that this is hypothetical data, and is not based on actual
resulb.
cssBB
2014
vm-2
O QUALITY COUNCIL OF INDIANA
VII!. ANALYZE
BOK
l{f,A.2
trllEAS URI NG RELATIONSHIPS/REGRESSTON
Simple Linear Regression Model (Continued)
Student
Study Time (Hours)
Test Results (%)
60h
40h
50h
65h
35h
40h
50h
30h
45h
55h
1
2
3
4
5
6
7
8
I
10
67o/o
61o/o
73o/"
80o/o
60%
55o/o
620/o
50o/o
61olo
70%
Table 8.1 Study Time Versus Test Results
An initial approach to the analysis of the data in Table 8.1 is to plot the points on a
graph known as a scatter diagram, as shown in Figure 8.2. Observe that y appeanr
to increase as x increases. One method of obtaining a prediction equation relating
y to x is to place a ruler on the graph and move it about until it seems to pass
through the maiority of the points, thus providing what is regarded as the "best fit"
line.
81
'^7r
o
=o67
to
660
o
F
53
30 35 40 45 50 55 60
65
Study Time (Hours), X
Figure 8.2 Plot of Study Time Versus Test Results
CSSBB 2014
vilt -3
@
QUALITY COUNCIL OF INDIANA
VIII. ANALYZE
BOK
MEASURI NG RELATIONSHIPS'REGRESSION
vt.A.2
Simple Linear Regression Model (Continued)
The mathematical equation of a straight line is:
Y=Fo+Frx
Where Bo is the y intercept when x = 0 and p, is the slope of the line. PIease note in
Figure 8.2 that the x axis does not go to zero so the y intercept appears too high.
The equation for a straight line in this example is too simplistic. There will actually
be a random error which is the difference between an observed value of y and the
mean value of y for a given value of x. One assumes that for any given value of x,
the observed value of y varies in a random manner and possesses a normal
probability distribution. The concept is illustrated in Figure 8.3:
xl
x2
x3
Figure 8.3 Variation in y as a Function of x
The probabilistic model for any particular observed value of y is:
y=
fMean value of v for\
* (random error)
|( a^ grven
.---'-;--=..^:----,
vatue
or x )I \
y=Fo+8,x+e
cssBB
2014
vl[ -4
@
QUALITY COUNCIL OF ]NDIANA
VIII. ANALYZE
M
BOK
vr.A.2
EASURING RELATIONSHIPSTREGRESSION
The Method of Least Squares
The statistical procedure of finding the "best-fitting" straight line is, in many
respects, a formalization of the procedure used when one fib a line by eye. The
objective is to minimize the deviations of the points from the prospective line.
lf one denotes the predicted value of y obtained from the fitted line as f tfr"
prediction equation is:
A/\A
Y=Fo+Frx
Where:
/\A
Bo
and p, represent estimates of the true
po
and
Frn
"s
shown in Figure 8.4.
81
674
=o
&. 67
o
P60
AAA
Y=Fo+Frx
30
35
40 45 50 55
60
Study Time (Hours), X
Figure 8.4 Study Time Versus Test Results
Having decided to minimize the deviation of the points in choosing the best fitting
line, one must now define what is meant by ,.best.,,
CSSBB 2014
vilt-5
@
QUALIW COUNCIL OF INDIANA
VIII. ANALYZE
M
BOK
vr.A.2
EASURING RELATIONSHI PS/REGRESSION
The Method of Least Squares (Continued)
The best
employed:
fit criterion of goodness known as the principte of least squares
is
Ghoose, as the b_est fitting line, the line that minimizes the sum of squares of
the deviations of the observed values of y from those predicted
Expressed mathematically, minimize the sum of squared errors given by:
ssE=I(r,-i,),
substituting for f, on" obtains the following expression:
sum of squared errors
=
The least square estimator of
ssE =
Fo
E[r,
- (0, * 0,,*,)]'
and p, are calculated as follows:
"
[r*]'/
s,=Ix3-\i=t
x- ?, '
n
s"=r*'''
(aP
0,=* and 0o=i-0,i
6r.".ng $l h"r" been computed, substitute their values into the equation of a
9n":
line to obtain the least
squares prediction equation, or;gglgsspn line.
As noted earlier, the prediction equation for
f
is:
i=6.+6,x
Where: $o anO $, ,"pr""ent estimates of the true and
B,
Br.
cssBB
2014
vilt -6
@
QUALITY COUNCIL OF INDIANA
VIII. ANALYZE
M
BOK
EASURING RELATIONSHIPS'REGRESSION
vt.A.2
Least Squares Example
Example 8.1: Obtain the least squares prediction line for the data below:
I
1
2
3
4
5
5
7
8
2
Yi
xi
XiYi
Yi
60
40
50
65
35
67
4,420
4,489
3,721
5,329
6,400
3,600
3,025
70
3,600
1,600
2,500
4,225
1,225
1,600
2,500
900
2,025
3,025
639
23,200
30,805
61
73
80
60
55
62
50
N
10
50
30
45
55
Sum
470
I
2
xi
61
2,M0
3,650
5,200
2,100
2,200
3,100
1,500
2,745
3,850
3,8M
2,500
3,721
4,900
41,529
Table 8.5 Data for the Study Time versus Test Results Example
s, = i
^
e
s*,
,
=
[*.,)'
*i' - l?-^' ) = 23.200 _ Uto)2
n
10
ir,r,
i=l
-
,,.)
[i*][i '/-["'*''J
(fi
n
= 30,80s
= 1.110
, rra'aaa\
- (+zoxegg) =TT2
10
i=T! =+z ,=#=63.e
CSSBB 2OI4
vm-7
@
QUALITY COUNCIL OF INDIANA
VIII. ANALYZE
BOK
MEASURING RELATIONSHIPS/REGRESSION
vt.A.2
Least Squares Example (Gontinued)
Example 8.1 (continued):
6.
' t=
fu -
Fo =
y - 9rX = 63.9 - (0.6955X471 = 31.2115
Sr,
772
1'110
=0.69s5
i = 31.2115 + 0.6955 x
One may predict y for a given value of x by substitution into the prediction equation.
For example, if 60 hours of study time is allocated, the predicted test score would
be:
i = sr .2115 + (o.ssss)(eo)
9=72.9415=73o/o
Hints on Regression Analysis
.
Be careful of rounding errors. Normally, the calculations should carry a
minimum of six significant figures in computing sums of squares of
deviations. Note that the prior example consisted of convenient whole
numbers which does not occur often.
.
.
cssBB
Atways plot the data points and graph the least squares line. lf the line does
not provide a reasonable fit to the data points, there may be a calculation
error.
Proiecting a regression line outside of the test area can be risky. The above
equation suggests, without study, a studentwould make 31o/o onthe test. The
odds favor 25% af answer a is selected for all questions. The equation also
suggests that with 100 hours of study the student should attain 100% on the
examination - which is highly unlikely.
2014
vill
-I
O QUALITY COUNCIL OF INDIANA
V!II. ANALYZE
BOK
MEASURING RELATIONSHIPS'REGRESSION
vl.A.2
Galculating sl , an Estimator of
o,2
Recall, the model for y assumes that y is related to x by the equation:
y=Fo +p,,x+e
If the least squares line is used:
Y, =Fo
+6rxi
A random error, e, enters into the calculations of Bo and Br. The random errors affect
the error of prediction. Consequently, the variability of the random errors (measured
by ) plays an important role when predicting by the least squares line.
"3
The first step toward acquiring a boundary on a prediction error requires that one
estimates ol . lt is reasonable to use SSE (sum of squares for error) based on (n - 2)
degrees of freedom, one for each variable (x and y).
An Estimator for oj
SSE
o;=
n-2
6l
is sometimes shown as
sl
Formula for Calculating SSE
Sum of squared errors = SSE
ssE=E(r,-i,)'=I[r,
- (0, * o,*,)]'
SSE may also be written:
SSE=Sr,-prs*r=Sr,-ed
;*^
Where:
s,,=I(r,
S,ry and
cssBB
2014
12
-i)'=Iri+
Sr, were previously defined.
vilr-9
@
QUALITY COUNCIL OF INDIANA
VIII. ANALYZE
BOK
MEASURING RELATIONSHIPS/REGRESSION
vl.A.2
Calculating s2 lcontinued)
Example 8.2: Calculate an estimated ol for the data in Table 8.5.
(:-
)'
s,,=Eri +
=41,52e
i=1
qf
=Ge6.e
SSE = Sy, - P1S,y = 696.9 - (0.6955X7721= 159.97
o:" = SSE - 159'97 =19.996
n-2
1O-2
G, = 4'47
How can one interpret the values of SSE and o! ? Refer to Figure 8.4 and note the
deviations of the 10 points from the least squares line. The sum SSE = 159.97 is
equal to the sum of squares of the numerical values of these deviations.
6" from the above calculation equals 4.47. Thus, most of the points will fall within +1 .966,
or 8.76 of the line. Approximately 95% of the values should be in this region. In
Figure 8.4, all of the values are within t 8.76 of the tine. This estimate provides a
rough check on the calculated value of 6,.
Inferences Concerning the Slope Fr of a Line
The existence of a significant relationship between y and x can be tested by whether
p, is equal to
0. lf Pl *
0 there is a linear relationship. The null hypothesis and
alternative hypothesis are:
Ho:pr=0
Hr:B,,#0
The test statistic is a t distribution with n - 2 degrees of freedom:
1= F,-- F,'
s6,
CSSBB 2014
where:
s6, =
6"
r:-
"iD
Yr
vlfl - 10
z
@
QUALIry COUNCIL OF INDIANA
Related documents
Related keywords
regression
figure
squares
least
equation
indiana
example
analyze
prediction
points
cssbb
council
value
study
relationships