Illustration of Exploratory Factor Analysis (EFA)

The analysis looks at all the music variables (bigband, blues, blues3, blugrass, classicl, classic3, country, hvymetal, jazz, jazz3, musicals, opera, rap, and rap3) with a view to understanding which subsets might be combined into a "music scale". Four other background variables are also entered: age, rincome91, sex, and educ. Variable labels that describe the variables are included in the dataset. (example designed by Dr. Garson of NC State)

[Data]

Move all the variables in the dataset over to the Variables list.

Go to the Extraction option & click on the Scree plot (we will stay with the other default functions)

Go to the Rotation option and select the Varimax rotation.

Go to the Options window and check the Sorted by Size & Suppress absolute values less than: & type .40

Output

The communalities, below, measure the percent of variance in a given variable explained by all the factors. That is, the communality is the squared multiple correlation for the variable using the factors as predictors. Communality for a variable is the sum of squared factor loadings for that variable (row), and thus is the percent of variance in a given variable explained by all the factors. For full orthogonal PCA, the communality will be 1.0 and all of the variance in the variables will be explained by all of the factors, which will be as many as there are variables. In the communalities chart, SPSS labels this column the "initial" communalities. The "extracted" communalities is the percent of variance in a given variable explained by the factors which are extracted, which will usually be fewer than all the possible factors, resulting in coefficients less than 1.0. (Dr. Garson)

The "Total Variance Explained" table below shows the eigenvalues, which are the proportion of total variance in all the variables which is accounted for by that factor. A factor's eigenvalue may be computed as the sum of its squared factor loadings for all the variables. A factor's eigenvalue divided by the number of variables (which equals the sum of variances because the variance of a standardized variable equals 1) is the percent of variance in all the variables which it explains. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue (<1.0), then it is contributing little to the explanation of variances in the variables and may be ignored as redundant with more important factors. The table shows 18 factors, one for each variable. However, only the first six are extracted for analysis because, under the Extraction options, SPSS was told to extract only factors with eigenvalues of 1.0 or higher.

The "Initial Eigenvalues" and "Extraction Sums of Squared Loadings" columns are the same, except the latter only lists factors which have actually been extracted in the solution. The "Rotation Sums of Squared Loadings" give the eigenvalues after rotation improves the interpretability of the factors (we used Varimax rotation, which minimizes the number of variables which have high loadings on each given factor). Note that the total percent of variance explained is the same (see the cumulative value for factor 6 -- 75.454%) but rotation changes the eigenvalues for each of the extracted factors. That is, after rotation each extracted factor counts for a different percentage of variance explained, even though the total variance explained is the same.

The Cattell scree test, below, plots the components as the X axis and the corresponding eigenvalues as the Y axis. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting the elbow. Where the "elbow" is is somewhat subjective, but in this case one would probably decide only the first three factors were worth retaining in the analysis. If one decided to use the second "elbow," one would retain five factors.

There are alternative criteria for deciding how many factors to retain. The Kaiser rule is to drop all components with eigenvalues under 1.0, which is what was specified under the "Extraction" options, resulting in six factors.

The "Component Matrix," below, gives the factor loadings. This is the central output for factor analysis. The factor loadings, also called component loadings in PCA, are the correlation coefficients between the variables (rows) and factors (columns). Factor loadings are the basis for imputing a label to the different factors. Loadings above .6 are usually considered "high" and those below .4 are "low." Note that the music variables were coded so that high values correspondent to disliking that type of music. Therefore a positive loading corresponds to disliking that type of music, and a negative loading to liking.

The first table below gives the unrotated solution and the second the rotated solution. Normally the rotated solution will be significantly easier to interpret (indeed, often the researcher will not ask for the unrotated matrix, but we requested it here for instructional purposes).

Looking at the rotated matrix, the first factor has high loadings from six music variables: classical, classical(3), opera, Broadway musicals, and had moderate loading on big bands. Because these six music items sort on the same factor, this is a justification for combining these items in a scale which might be called "general music appreciation scale." Naming the factor is a matter of subjectivity and dispute in many cases.

Blues and jazz (four variables) are associated strongly with the second factor.

Respondents income and education are associated strongly with the third factor.

Rap music (2 variables) is associated strongly with the third factor.

The fourth factor is strongly associated with country western and bluegrass, but there is also a moderate tie to highest year of school completed, with more educated respondents less likely to like these types of music.

The fifth factor is associated with Bluegrass & country western.

The sixth factor is strongly associated with heavy metal & age.

The rotated matrix is the easiest to interpret.

Results

    Principal factors extraction with varimax rotation was performed through SPSS on the 15 music preference items and the 4 demographic items. [I'm not including the discussion of assumptions, outliers, & missing data but would be included at this point for publication].

    Six factors were extracted. The total variance accounted for by the six factors was 75.45%. Communality values were well-defined by this factor solutions, with all variables exceeding .45. Loadings of variables on factors are reported in Table 1. Variables are ordered and grouped by size of loading to facilitate interpretation. Loadings under .40 were left blank. The first factor appears to measure "general music appreciate." The second factor is related to "jazz & blues." The third factor is related to "income & education." The fourth factor corresponds to "rap." The firth factor measures "bluegrass & country." The last factor is associated with heavy metal & age."

Table 1
 

Items

Factor 1

Factor 2

Factor 3

Factor 4

Factor 5

Factor 6

Classical Music

.91

 

 

 

 

 

Classical Music (3)

.89

 

 

 

 

 

Opera

.73

 

 

 

 

 

Broadway Musicals

.69

 

 

 

 

 

Bigband Music

.53

 

 

 

 

 

Blues and R&B Music

 

.85

 

 

 

 

Jazz Music (3)

 

.85

 

 

 

 

Jazz Music

 

.85

 

 

 

 

Blues or R & B Music

 

.84

 

 

 

 

Respondent's Income

 

 

.97

 

 

 

Highest Year of School Completed

 

 

.48

 

 

 

Rap Music

 

 

 

.95

 

 

Rap Music (3)

 

 

 

.95

 

 

Bluegrass Music

 

 

 

 

.80

 

Country Western Music

 

 

 

 

.75

 

Heavy Metal Music

 

 

 

 

 

.75

Age of Respondent

 

 

 

 

 

.74