# PDF Archive

Easily share your PDF documents with your contacts, on the Web and Social Networks.

## Chapter 8 .pdf

Original filename: Chapter 8.pdf

This PDF 1.3 document has been generated by Canon SC1011 / MP Navigator EX, and has been sent on pdf-archive.com on 14/01/2016 at 11:46, from IP address 63.241.x.x. The current document download page has been viewed 384 times.
File size: 79.6 MB (120 pages).
Privacy: public file

### Document preview

VIII. ANALYZE

BOK
vt.A.2

MEASURING RELATIONSHIPS/REGRESSION

Analyze
Analyze is presented in the following topic areas:

.
.
.
.

Measuring and modeling relationships between variables
Hypottresis testing
Failure mode and effects analysis (FMEA)

Measuring and Modeling Relationships Between Variables
Measuring and modeling relationships between variables is reviewed in the
following topic areas:

.
.
.

Regression

Correlationcoefficient
Multivariate tools

Note that the authors have presented regression ahead of the correlation coefficient

for explanation

Regression
Simple linear regression and multiple linear regression will be discussed here. Note
that non-linear regression models will not be tested on the CSSBB exam, and will not
be desdribed here. The material on linear regression may be found in several
statistics books, including Triola (1994)24.

Simple Linear Regression Model
Consider the problem of predicting the test results (y) for students based upon an
input variable (x), the amount of preparation time in hours using the data presented
in Table 8.1. Please note that this is hypothetical data, and is not based on actual

resulb.

cssBB

2014

vm-2

O QUALITY COUNCIL OF INDIANA

VII!. ANALYZE

BOK
l{f,A.2

trllEAS URI NG RELATIONSHIPS/REGRESSTON

Simple Linear Regression Model (Continued)
Student

Study Time (Hours)

Test Results (%)

60h
40h
50h
65h
35h
40h
50h
30h
45h
55h

1

2
3

4
5
6
7
8

I

10

67o/o
61o/o

73o/&quot;
80o/o

60%
55o/o
620/o
50o/o

61olo

70%

Table 8.1 Study Time Versus Test Results
An initial approach to the analysis of the data in Table 8.1 is to plot the points on a
graph known as a scatter diagram, as shown in Figure 8.2. Observe that y appeanr
to increase as x increases. One method of obtaining a prediction equation relating
y to x is to place a ruler on the graph and move it about until it seems to pass
through the maiority of the points, thus providing what is regarded as the &quot;best fit&quot;
line.

81

'^7r
o
=o67

to

660
o
F
53

30 35 40 45 50 55 60

65

Study Time (Hours), X

Figure 8.2 Plot of Study Time Versus Test Results

CSSBB 2014

vilt -3

@

QUALITY COUNCIL OF INDIANA

VIII. ANALYZE

BOK

MEASURI NG RELATIONSHIPS'REGRESSION

vt.A.2

Simple Linear Regression Model (Continued)
The mathematical equation of a straight line is:

Y=Fo+Frx
Where Bo is the y intercept when x = 0 and p, is the slope of the line. PIease note in
Figure 8.2 that the x axis does not go to zero so the y intercept appears too high.
The equation for a straight line in this example is too simplistic. There will actually
be a random error which is the difference between an observed value of y and the
mean value of y for a given value of x. One assumes that for any given value of x,
the observed value of y varies in a random manner and possesses a normal
probability distribution. The concept is illustrated in Figure 8.3:

xl

x2

x3

Figure 8.3 Variation in y as a Function of x
The probabilistic model for any particular observed value of y is:

y=

fMean value of v for\
* (random error)
|( a^ grven
.---'-;--=..^:----,
vatue
or x )I \

y=Fo+8,x+e

cssBB

2014

vl[ -4

@

QUALITY COUNCIL OF ]NDIANA

VIII. ANALYZE
M

BOK
vr.A.2

EASURING RELATIONSHIPSTREGRESSION

The Method of Least Squares
The statistical procedure of finding the &quot;best-fitting&quot; straight line is, in many
respects, a formalization of the procedure used when one fib a line by eye. The
objective is to minimize the deviations of the points from the prospective line.

lf one denotes the predicted value of y obtained from the fitted line as f tfr&quot;

prediction equation is:

A/\A

Y=Fo+Frx
Where:

/\A
Bo

and p, represent estimates of the true

po

and

Frn

&quot;s

shown in Figure 8.4.

81

674

=o

&amp;. 67
o
P60

AAA

Y=Fo+Frx

30

35

40 45 50 55

60

Study Time (Hours), X

Figure 8.4 Study Time Versus Test Results
Having decided to minimize the deviation of the points in choosing the best fitting
line, one must now define what is meant by ,.best.,,

CSSBB 2014

vilt-5

@

QUALIW COUNCIL OF INDIANA

VIII. ANALYZE
M

BOK
vr.A.2

EASURING RELATIONSHI PS/REGRESSION

The Method of Least Squares (Continued)
The best
employed:

fit criterion of goodness known as the principte of least squares

is

Ghoose, as the b_est fitting line, the line that minimizes the sum of squares of
the deviations of the observed values of y from those predicted

Expressed mathematically, minimize the sum of squared errors given by:

ssE=I(r,-i,),
substituting for f, on&quot; obtains the following expression:
sum of squared errors

=

The least square estimator of

ssE =
Fo

E[r,

- (0, * 0,,*,)]'

and p, are calculated as follows:

&quot;

[r*]'/
s,=Ix3-\i=t
x- ?, '
n
s&quot;=r*'''

(aP

0,=* and 0o=i-0,i
6r.&quot;.ng \$l h&quot;r&quot; been computed, substitute their values into the equation of a
9n&quot;:
line to obtain the least

squares prediction equation, or;gglgsspn line.

As noted earlier, the prediction equation for

f

is:

i=6.+6,x
Where: \$o anO \$, ,&quot;pr&quot;&quot;ent estimates of the true and
B,
Br.

cssBB

2014

vilt -6

@

QUALITY COUNCIL OF INDIANA

VIII. ANALYZE
M

BOK

EASURING RELATIONSHIPS'REGRESSION

vt.A.2

Least Squares Example
Example 8.1: Obtain the least squares prediction line for the data below:
I
1

2
3

4
5
5
7
8

2

Yi

xi

XiYi

Yi

60
40
50
65
35

67

4,420

4,489
3,721
5,329
6,400
3,600
3,025

70

3,600
1,600
2,500
4,225
1,225
1,600
2,500
900
2,025
3,025

639

23,200

30,805

61

73
80
60
55
62
50

N

10

50
30
45
55

Sum

470

I

2

xi

61

2,M0
3,650
5,200
2,100
2,200
3,100
1,500
2,745
3,850

3,8M
2,500
3,721
4,900
41,529

Table 8.5 Data for the Study Time versus Test Results Example

s, = i
^

e

s*,

,

=

[*.,)'
*i' - l?-^' ) = 23.200 _ Uto)2
n
10

ir,r,
i=l

-

,,.)
[i*][i '/-[&quot;'*''J

(fi

n

= 30,80s

= 1.110

, rra'aaa\
- (+zoxegg) =TT2
10

i=T! =+z ,=#=63.e

CSSBB 2OI4

vm-7

@

QUALITY COUNCIL OF INDIANA

VIII. ANALYZE

BOK

MEASURING RELATIONSHIPS/REGRESSION

vt.A.2

Least Squares Example (Gontinued)
Example 8.1 (continued):
6.
' t=

fu -

Fo =

y - 9rX = 63.9 - (0.6955X471 = 31.2115

Sr,

772
1'110

=0.69s5

i = 31.2115 + 0.6955 x
One may predict y for a given value of x by substitution into the prediction equation.
For example, if 60 hours of study time is allocated, the predicted test score would
be:
i = sr .2115 + (o.ssss)(eo)
9=72.9415=73o/o

Hints on Regression Analysis

.

Be careful of rounding errors. Normally, the calculations should carry a

minimum of six significant figures in computing sums of squares of
deviations. Note that the prior example consisted of convenient whole
numbers which does not occur often.

.
.

cssBB

Atways plot the data points and graph the least squares line. lf the line does
not provide a reasonable fit to the data points, there may be a calculation
error.

Proiecting a regression line outside of the test area can be risky. The above
equation suggests, without study, a studentwould make 31o/o onthe test. The
odds favor 25% af answer a is selected for all questions. The equation also
suggests that with 100 hours of study the student should attain 100% on the
examination - which is highly unlikely.

2014

vill

-I

O QUALITY COUNCIL OF INDIANA

V!II. ANALYZE

BOK

MEASURING RELATIONSHIPS'REGRESSION

vl.A.2

Galculating sl , an Estimator of

o,2

Recall, the model for y assumes that y is related to x by the equation:

y=Fo +p,,x+e
If the least squares line is used:
Y, =Fo

+6rxi

A random error, e, enters into the calculations of Bo and Br. The random errors affect
the error of prediction. Consequently, the variability of the random errors (measured
by ) plays an important role when predicting by the least squares line.
&quot;3

The first step toward acquiring a boundary on a prediction error requires that one
estimates ol . lt is reasonable to use SSE (sum of squares for error) based on (n - 2)
degrees of freedom, one for each variable (x and y).

An Estimator for oj
SSE

o;=

n-2

6l

is sometimes shown as

sl

Formula for Calculating SSE
Sum of squared errors = SSE

ssE=E(r,-i,)'=I[r,

- (0, * o,*,)]'

SSE may also be written:

SSE=Sr,-prs*r=Sr,-ed
;*^

Where:

s,,=I(r,
S,ry and

cssBB

2014

12

-i)'=Iri+

Sr, were previously defined.

vilr-9

@

QUALITY COUNCIL OF INDIANA

VIII. ANALYZE

BOK

MEASURING RELATIONSHIPS/REGRESSION

vl.A.2

Calculating s2 lcontinued)
Example 8.2: Calculate an estimated ol for the data in Table 8.5.

(:-

)'

s,,=Eri +

=41,52e

i=1

qf

=Ge6.e

SSE = Sy, - P1S,y = 696.9 - (0.6955X7721= 159.97

o:&quot; = SSE - 159'97 =19.996

n-2

1O-2

G, = 4'47
How can one interpret the values of SSE and o! ? Refer to Figure 8.4 and note the
deviations of the 10 points from the least squares line. The sum SSE = 159.97 is
equal to the sum of squares of the numerical values of these deviations.

6&quot; from the above calculation equals 4.47. Thus, most of the points will fall within +1 .966,
or 8.76 of the line. Approximately 95% of the values should be in this region. In
Figure 8.4, all of the values are within t 8.76 of the tine. This estimate provides a
rough check on the calculated value of 6,.

Inferences Concerning the Slope Fr of a Line
The existence of a significant relationship between y and x can be tested by whether

p, is equal to

0. lf Pl *

0 there is a linear relationship. The null hypothesis and
alternative hypothesis are:

Ho:pr=0

Hr:B,,#0

The test statistic is a t distribution with n - 2 degrees of freedom:

1= F,-- F,'
s6,

CSSBB 2014

where:

s6, =

6&quot;

r:-

&quot;iD
Yr

vlfl - 10

z

@

QUALIry COUNCIL OF INDIANA

﻿