This is anova.man in view mode; [Download] [Up]
ANOVA(1) |STAT January 29, 1987
NAME
anova - multi-factor analysis of variance
SYNOPSIS
anova [factor names]
DESCRIPTION
_a_n_o_v_a does multi-factor analysis of variance on designs with within
groups factors, between groups factors, or both. _a_n_o_v_a allows
variable numbers of replications (averaged before analysis) on any
factor. All factors except the random factor must be crossed; some
nested designs are not allowed. Unequal group sizes on between groups
factors are allowed and are solved with the weighted means solution,
however empty cells are not permitted.
_I_n_p_u_t _F_o_r_m_a_t. The input format was designed so that when the user
specifies the role individual data play in the overall design, _a_n_o_v_a
figures out the experimental design. This helps reduce design
specification errors. The input to _a_n_o_v_a consists of each datum on a
separate line, preceded by a list of index labels, one for each
factor, that specifies the level of each factor at which that datum
was obtained. By convention, data are always in the last column, and
indexes for the one allowable random factor must be in the first.
Data can be real numbers or integers. Indexes can be any character
string, so mnemonic labels can simplify reading the output. For
example:
fred 3 hard 10
indicates that "fred" at level "3" of the factor indexed by column two
and at level "hard" of the factor indexed by column three, scored 10.
Indexes and data on a line can be separated by tabs or spaces for
readability. Data from an experiment consists of a series of lines
like the one above. The order of these lines does not matter, so
additional data can be appended to the end of files. Replications are
coded by having more than one line with the same list of leading
indexes. With this information, _a_n_o_v_a determines the number of
factors, the number and names of levels of each factor, and whether a
factor is between groups or within groups so that error terms for F-
ratios can be chosen.
Names of independent and dependent variables can be supplied to _a_n_o_v_a,
providing mnemonic labels for the output. These names may be
truncated in the output. The names should have unique first
characters because that is all that is used in parts of F tables. For
example, in a three factor design, the call to _a_n_o_v_a:
anova subjects group difficulty errors
would give the name "subjects" to the random factor, "group" and
"difficulty" to the next two, and "errors" to the dependent variable.
If names are not specified, the default name for the random factor is
RANDOM, for the dependent variable, DATA, and for the independent
variables, A, B, C, D, etc.
_O_u_t_p_u_t _F_o_r_m_a_t. The output from _a_n_o_v_a includes cell counts, means,
standard deviations, and standard errors for each source not involving
the random factor, a summary of design information, and an F table
testing main effects and interactions. Sums of squares, degrees of
freedom, mean squares, F ratio and significance level are reported for
each F test.
DIAGNOSTICS
_a_n_o_v_a will complain about "Ragged input" if the number of variables in
its input varies. _a_n_o_v_a will not print its F tables if it cannot make
sense out of the the input specification ("Unbalanced factor or Empty
cell"). This can happen if there are missing data (detected when the
cell sizes of all the scores for a source do not add up to the
expected grand total). Unbalanced factors often are due to a
typographical error, but the empty cell size message can be due to an
illegal nested design (only the random factor can be nested).
_a_n_o_v_a uses a temporary file to store its input and will complain if it
is unable to create it. This may be because you are in some other
user's directory that is "write protected."
EXAMPLE
An experiment has two experimental factors: difficulty of material to
be learned, and amount of knowledge a person brings with him or her.
(This design is due to Naomi Miyake.) Each person is given two
learning tasks, one easy and one hard, so task difficulty is a within
groups factor. Two people are experts in the task domain, while three
are novices, so knowledge is a between groups factor with unequal
group sizes. The dependent variable is the amount of time it takes a
person to correctly work through a problem. Data is formatted as
follows: in column one is the name of the person (the random factor);
in column two is the level of the difficulty factor; in column three
is the level of the knowledge factor; and in column four is the time,
in seconds, to solve the problem. Fictitious data follow.
lucy easy novice 12
lucy hard novice 22
ethel easy novice 10
ethel hard novice 15
ricky easy novice 25
ricky hard novice 30
ernie easy expert 7
ernie hard expert 10
bert easy expert 12
bert hard expert 18
The call to _a_n_o_v_a to analyze the data would probably look like:
anova subjects difficulty knowledge time < data
"data" is the name of the file containing the above data. "subjects"
is the random factor so indexes for that factor appear in the first
column. Data, here called "time", must appear in the last column.
"difficulty" is a within groups factor because each person appears at
every level of that factor. In the third column are indexes for
"knowledge", a between groups factor, because no person appears at
more than one level of that factor.
FILES
UNIX /tmp/anova.????
MSDOS anova.tmp
ALGORITHM
Keppel (1973) _D_e_s_i_g_n _a_n_d _A_n_a_l_y_s_i_s: _A _R_e_s_e_a_r_c_h_e_r'_s _H_a_n_d_b_o_o_k.
WARNING
When unequal sized cell designs are used, the cell sizes must be in
the same proportion across all rows and columns of interactions, or
there may be marked distortions and the analysis may be invalid. This
applies only to designs with more than one between groups factor. See
Keppel's discussion of unequal cell designs.
LIMITS
Use the -L option to determine the program limits.
MISSING VALUES
Missing data values (NA) are counted but not included in the analysis.
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.