Paper Example Undergraduate 2,185 words

Statistics-Multivariate Analysis Research Data Collected

Last reviewed: December 4, 2008 ~11 min read

Statistics-Multivariate Analysis

Research data collected using the quantitative approach can be analyzed and interpreted in different ways, using either the univariate, bivariate, or multivariate analysis.

Bivariate analysis looks at the relationship between two variables. It is commonly analyzed and interpreted with the aid of the cross-tabulation or cross-tab, allowing the researcher to check the interaction between the two variables under study. The variables under study are called the independent (or predictor) variable and the dependent (or outcome) variable. The interaction between the two is reflected in the cross-tab, and each interaction can be expressed in either frequencies (raw count) or percentages, or both. It is critical in bivariate analysis to establish whether the relationship observed is significant or not. Determining the significance of the relationship is important since it (significance) will determine whether percentage differences in the results are worth analyzing or not. (However, the researcher may opt to look into percentage differences even if the relationship is not significant, for directional or diagnostic use only.) Multivariate analysis, meanwhile, looks at the relationship of more than two (2) variables. What makes this form of statistical analysis useful is that it provides both breadth and depth in looking at the relationship among the variables under study, which could not have been observed when bivariate analysis is used. Analyzing more than two variables is a rigorous and complex process, which is why there are different techniques used under multivariate analysis, such as multiple regression, discriminant analysis, canonical correlation, factor analysis, and cluster analysis, among others.

In multivariate analysis, dependence and interdependence techniques are used, each having its own objective. Under the dependence technique, multivariate analysis looks at the relationship between a variable or a set of variables assigned as the dependent variable(s), and the other set of variables assigned as independent variable(s). That is, the relationship being analyzed is the set of variables Xs, which will explain or predict the dependent variables Ys. An example of an application of this technique is regression, wherein the relationship between two sets of variables is not only analyzed based on its nature and strength, but also informs the researcher about the predictive power of the Xs on the Ys. Through regression, the researcher can also identify the contribution of one or more variable in the independent variables set Xs in the model generated.

Interdependence technique in multivariate analysis looks at the relationship among variables rather than looking at two different sets of variables categorized as dependent or independent variables. That is, interdependence technique treats and analyzes the variables under study as a single set. Most commonly used analysis using this technique is factor analysis, which is mainly used for reducing and summarizing research data into a manageable manner. Factor analysis helps the researcher determine and explain the relationships, specifically correlations, extant among the variables tested. Application of this type of analysis is most useful in market research, wherein psychographic factors and attitude statements are often treated as one set of variables, factor analyzed to generate a smaller set of variables that would somehow reflect, highlight, or explain differences among the variables and across respondent groups or profiles (Weiers, 1984:473).

As explained earlier, linear regression looks into the predictive power of two sets of variables, Xs and Ys. It also aims to provide information about the contribution of one or more variables in the independent variables Xs to the relationship generated. Despite these features of linear regression, multiple linear regression is more powerful than the former because it does not only determine the contribution of the independent variables to the relationship, but it also takes into account the strength of contribution of the independent variables while controlling for other independent variables in the model. Discriminant analysis, meanwhile, is a dependence technique that looks at the relationship between a non-metric or categorical dependent variable and metric (interval) independent variables. This is used when the researcher would like to discriminate between categories in the dependent variable (or among categories in the Y for multiple discriminant analysis).

Canonical correlation analysis is used to measure the strength of the association between the independent and independent sets of variables. This type of analysis is most helpful in validation of test results, wherein the researcher would determine which variables are significantly related, and upon determining this, run further canonical analysis with the identified significant variables (Hair, 1995:187). Multivariate analysis of variance or MANOVA, lastly, is the analysis of two or more metric-dependent variables, measuring (1) variance attributed to between-group differences and (2) variance attributed to within-group differences. This analysis is useful when there are two or more dependent variables related, and when group differences across multiple dependent variables need to be examined.

Multiple regression, when applied to specific situations, could best answer problems wherein it must be determined which among the many extant factors are possibly contributing to a set of outcomes (dependent variables). For example, is variation in sales driven by prices, distribution, or advertising? Is individual perception about a political personality determined by psychographic or demographic profile? And which among the variables in each factor contribute strongly to the formation of this perception? Discriminant analysis is most useful in profiling specific groups of people or individuals based on specific characteristics, which may be demographic, psychographic, lifestyle, and specific categories defined by the researcher to be characteristic of the dependent variable (Malhotra, 1996:617). A popular application of discriminant analysis is in credit card applications, wherein credit card companies assess applicants based on their propensity or likelihood 'to pay' based on each individual/applicant's demographic and psychographic profile.

Canonical correlation analysis can be useful when the researcher would like to determine, for example, usage of multiple credit card ownership and expenditure for each based on independent variables personal and/or household income, marital status, occupation, family size, and educational attainment, among others. MANOVA is applicable in experiments, especially when the researcher studies independent variables that are related to each other in some way. For example, an ad test can use the following approach and apply MANOVA to test for the differences among the dependent variables under study: three (3) groups will be exposed to a commercial, and each group will be tested differently. The first group will be tested on their preference for the brand shown, the second group will be tested on their preference for the company that manufactured the brand, and the last group will be tested on the preference for the ad itself. This way, the researcher can check the variance results between and within the groups tested/exposed to the stimulus (commercial).

As explained earlier, factor analysis is an interdependence technique that is used to reduce and summarize data in a manageable manner. More than just reducing and summarizing data, factor analysis also provides the researcher an idea of the 'structure' or dimensionality of the data tested. Dimensionality in the data means that given a set of variables in a study, after data analysis using factor analysis, these variables are reduced to only a few "different basic characteristics" describing the sample being studied (Weiers, 1984:473). Principal component analysis is a common approach applied to factor analysis, wherein data, once grouped into several factors (characteristics), will have values that will provide the researcher an indication of the spread of data, specifically the maximum possible variation of the data. These values, called factor loadings, are the correlations between the factors and the variables.

Cluster analysis is another multivariate technique that is sometimes used in conjunction with factor analysis (after factor analysis). In other cases, cluster analysis is used to group similar characteristics, concepts, or objects together. It differs from simple groupings using univariate analysis and even discriminant analysis because in cluster analysis, no prior information about the sample is used as basis for the groupings. Groupings are usually identified based on the Euclidian distance of the objects to each other. Once the researcher decides on the number of cluster that will be used for the study, these clusters will then be profiled, either based on demographic, psychographic, usage, and/or other variables not used during the clustering procedure. Multidimensional scaling (MDS) is a technique usually used to demonstrate perceptions or preferences in a visual manner through mapping. MDS is technically defined as a technique wherein "[p]erceived or psychological relationships among stimuli are represented as geometric relationships among points in a multidimensional space" (Malhotra, 1996:696). MDS is useful for the researcher in identifying the number of dimensions extant in an object as perceived among respondents and identify the current and ideal positioning of the object, based on the number of dimensions generated and chosen by the researcher.

Factor analysis is useful when the researcher would like to reduce and summarize attitude statements into their basic characteristics. For example, attitude statements about health may be grouped based on the following characteristics, as indicated by the factor loadings and based also on logical groupings of the researcher: high degree of health consciousness, economy/cheap medical services, trust in medical/health practitioners, and reliance/subsistence to self-medication.

In line with this example, a follow-up cluster analysis could be done to provide a profile to these identified factors in the study. Demographic characteristics may be used to generate this profile. Results generated may show that after cluster analysis, respondents who belong to the upper middle to upper class socio-economic group are identified as having a high degree of health consciousness, while respondents aged between 25 and 25 are the ones who most rely on self-medication. Multidimensional scaling, meanwhile, will be useful in this example by mapping out these attitudes towards health, giving the researcher and user of research an idea about the spread of these attitudes in a multidimensional space, as well as determine the dimensions generated and in which dimensions attitudes are located or positioned. Again, as with cluster analysis, MDS can make use of the demographic characteristics to map against the attitude statements/characteristics (interesting analyses would be characteristics vs. geographic location, educational attainment, age group membership, among others).

Software programs like the SPSS and SAS have expanded its range of product offerings and now features numerous products allowing users to run multivariate analysis to quantitative data. Increased use of multivariate analysis can be attributed to two related causes: (1) first, the prevalence of computer technology as a means to automate specific research processes such as data analysis, and (2) second, the increasing amount, availability, and accessibility of information prompted the use of multivariate analysis and consequently, software programs that feature these techniques.

You’re 81% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime