Statistical Consultant: Cluster Analysis

Statistics Solutions is the country's leader in statistical consulting and cluster analysis. Contact Statistics Solutions today for a free 30-minute consultation.

Cluster analysis can be used in market research problems for satisfying certain purposes. This document will discuss the utilization of cluster analysis.

Cluster analysis can be used to segment consumers on the basis of allowances sought from the purchase of the product. Cluster analysis can be used to identify the homogeneous group of buyers in the market.

Cluster analysis is that kind of technique that is used to discover or understand the structures within a complex body of structures. In other words, cluster analysis gets involved in the segmentation of data. There are three methods or approaches in cluster analysis to serve this purpose: the hierarchical methods, the partitioning methods and the two step procedure.

The hierarchical methods in cluster analysis wrap up the data row-wise. On the other hand, the partitioning method (or the non hierarchical method) in cluster analysis wraps the data into a specified number of segments and further interchanges the variables to improve the measure of effectiveness in the data. Finally, the two step procedure in cluster analysis determines the perfect number of clusters by comparing the values of model choice criteria across different clustering solutions.

The procedure of performing cluster analysis involves formulating the problem, selecting a measure of distance, selecting a procedure for clustering, determining the number of clusters, interpreting the clusters profile and assessing the validity of clustering.

With every analysis, a researcher becomes familiar with different kinds of variables. In the case of cluster analysis, the researcher becomes familiar with variables based on past research. An adaptable measure of distance is selected in cluster analysis. One can use a commonly used measure called Euclidean distance in cluster analysis.

There is a lower triangle matrix called similarity (or the distance coefficient matrix) in cluster analysis. This consists of pair-wise distances between the objects or cases.

The cluster results in cluster analysis are indicated with the help of a graphical display device called a dendrogram. The desirable clusters in cluster analysis are the ones which are widely separated and are explicit.

The researcher working upon cluster analysis should always be aware that in cluster analysis, no clustering solution will be accepted without some assessment of the cluster analysis’s reliability and validity. The procedures used in assessing the reliability of clustering results in cluster analysis are quite complex. These procedures tell the researcher to perform cluster analysis on the same data using some different measures of distance. Then, the researcher compares the results across the measures in cluster analysis in order to determine the stability or the validity of the cluster solutions. The final procedure tells the researcher to perform different methods or approaches of clustering in cluster analysis and then to compare all the procedures or approaches simultaneously. By doing this procedure, the researcher can assess the validity of the cluster results in cluster analysis.

Additionally, a procedure of deleting variables randomly in cluster analysis is done by many researchers. The researcher then performs a clustering based on the reduced set of variables in cluster analysis. Then, a comparison on the result of the ones with original variables and the ones with random variables is done in cluster analysis.

Thursday, August 6, 2009

Cluster Analysis