Research Approach
A quantitative research method is used for this study to gather and analyze data. The method aims to discover behavioral group segments. To further analyze the data, this paper makes use of Descriptive Statistics, Factor Analysis and Cluster Analysis.
According to Ho and Yu (2015), descriptive statistics is the process of utilizing and analyzing summary statistics that quantitatively defined or summarized features from a collection of information or data. In this thesis, descriptive statistics are provided regarding demographics, products, and type of advertisements.
Factor analysis is a way to encapsulate data into only a few parameters in several factors (Brown, 2015). This is also often referred to as “dimension reduction” for this purpose. In this research, factors were reduced, the “dimensions” of the data, to one or more variables that were significant in this analysis.
For this study, cluster analysis was used where cases were joined together and regrouped until the final cluster was analysed to identify which cluster had more weight.
The research relied on non-hierarchical clustering. According to Malinen and Fränti (2014), K-Means clustering analysis intends to partition n objects into k clusters in which every object belongs to the cluster with the nearest mean. This procedure produces exactly k different clusters of the greatest possible distinction within the cluster or group. The best number of clusters k leading to the greatest separation or distance is not known as a priori and must be computed from the data. The main aim of K-Means clustering is to minimize total intra-cluster variance, or, the squared error function.
Data Collection
The collection of the data is achieved through the digital distribution of an activities, interests, opinions and buying behaviour questionnaire (Appendix I) with the purpose of tackling generational lifestyle habits and it is composed of several statements concerning their interests, activities, opinions and buying behaviour in regard to cosmetic products. Considering that the questionnaire was distributed in Taiwan, the survey was made available in both English and Chinese languages.
The questionnaire is formulated in two sections to gain data on specific consumer behaviours towards cosmetics as well as demographic information about the subjects that were part of the study. The first section consisted of 40 questions to help assess the consumer’s buying behaviour towards cosmetics.
Question 1 and 2 had the purpose of depicting which is the most used cosmetic category and which channels, in terms of advertisement, are the most efficient to find out about a new cosmetic product. The questionnaire divides cosmetics in the following categories: Skin care products (creams, body lotions, face masks, gels, oils etc.), Hair care products (shampoos, conditioner, oils, hair coloring products, hair gels, hairspray etc.), Shaving products, Make-up products (also including make-up removers), Deodorants, Toilet soaps, Intimate hygiene products, Lip balms, Nail- care products, Sunbathing products and Tanning products.
The most important promotion channels presented in the questionnaire were: Social media ads, TV, Magazines, Street advertisement (billboards, posters etc.), Mobile ads and Promotional events.
The rest of the 38 questions were composed based on a Likert scale from 1-5 (Strongly agree=1, Agree =2, Neutral=3, Disagree=4, Strongly disagree=5) stating the participant’s level of agreement with a specific statement about lifestyle choices and preferences regarding cosmetics. The questions explore several factors that are considered to be a drive for purchasing a product. To generalize, the participants were asked about brand importance, price incentives, willingness to spend on cosmetics, overall product characteristics starting from design to products attributes and composition, purchases driven from personal recommendations, online reviews, advertisement opinions, social media impact, natural and environmentally friendly cosmetics, online and in-store purchasing preferences. Additionally, questions about their general lifestyle concerning cosmetics were also asked to determine their usage frequency, reliance on cosmetics and overall behavior towards them.
The second section regarding demographics had the purpose to attain information about age, gender, profession and monthly budget spending on cosmetics. The variable
“age” is used in this study to distinguish generation Y and Z from each other.
Sample and Participant Selection
The participants are selected based on the age criteria first and on their residence or origin placed in Taiwan second. The questionnaire further serves as a tool to find out more characteristics about the sample’s behavior towards cosmetics and further segment the generations into groups.
The questionnaire was randomly distributed to 164 individuals between 16 and 38 years old. Subjects over 38 do not belong to generation Y, therefore they are out of scope for this study. Furthermore, subjects under 16 are not taken into consideration because they do not possess significant buying power or their buying behaviour is not reliable for this study. All 164 questionnaires were retrieved and after eliminating only one questionnaire based on incomplete responses, the effective retrieved forms remain a total of 163, specifically 82 participants for generation Y and 81 for generation Z.
Assessment and Measures
A reliability analysis was performed in SPSS for the sample data using Cronbach’s α as a measure. Cronbach’s alpha coefficient was used to determine the reliability and internal consistency of the 4-item Empathy scale. Therefore, the results indicate that scale Empathy has good reliability and internal consistency. The Cronbach’s α coefficient presented by Guilford (1965) must be higher than 0.70, however a coefficient between 0.70 - 0.35 is still tolerable, but a value less than 0.35 should not be accepted.
The sample data of 163 surveys where only the behavioural and psychographic questions were measured, resulted in a coefficient value of 0.872 for Generation Y and 0.811 for Generation Z (Table 3.1), proving that the sample data is very reliable given that Cronbach’s α is higher than 0.70.
Table 3.1.
Reliability for Generation Y & Z data sets.
Cronbach's
Alpha Cronbach's Alpha Based on
Standardized Items N of Items
Generation Y 0.872 0.875 38
Generation Z 0.811 0.806 38
Data Analysis
Descriptive Statistics Table 3.2.
Product Categories & Advertisement Channels for Generation Y
Product categories Yes No
Skin care products (creams, body lotions, face masks, gels, oils etc.)
95.12% 4.88%
Hair care products (shampoos, conditioner, oils, hair coloring products, hair gels, hairspray etc.)
93.90% 6.1%
Shaving products 37.80% 62.2%
Make-up products (also including make-up removers) 71.95% 28.05%
Deodorants 41.46% 58.54%
Toilet soaps 29.27% 70.73%
Intimate hygiene products 30.49% 69.51%
Lip balms 68.29% 31.71%
Nail- care products 37.80% 62.2%
Sunbathing products 54.88% 45.12%
Tanning products 13.41% 86.59%
Advertisement channels Yes No
Social media ads 75.60% 24.40%
TV 41.50% 58.50%
Magazines 25.60% 74.40%
Street advertisement (billboards, posters etc.) 48.80% 51.20%
Mobile ads 42.70% 57.30%
Promotional events 57.30% 42.70%
Table 3.3.
Product Categories & Advertisement Channels for Generation Z
Product categories Yes No
Skin care products (creams, body lotions, face masks, gels, oils etc.)
92.11% 7.89%
Hair care products (shampoos, conditioner, oils, hair coloring products, hair gels, hairspray etc.)
94.74% 5.26%
Shaving products 27.63% 72.37%
Make-up products (also including make-up removers) 73.68% 26.32%
Deodorants 27.63% 72.37%
Toilet soaps 14.47% 85.53%
Intimate hygiene products 22.37% 77.63%
Lip balms 75.00% 25.00%
Nail- care products 39.47% 60.53%
Tanning products 10.53% 89.47%
Advertisement channels Yes No
Social media ads 86.40% 13.60%
TV 48.10% 51.90%
Magazines 25.90% 74.10%
Street advertisement (billboards, posters etc.) 43.20% 56.80%
Mobile ads 60.50% 39.50%
Promotional events 71.60% 28.40%
An analysis on the distribution of the data based on demographics of the participants was conducted. The data comprised of 68.3% female, and 31.7% male in generation Y and 72.8% female and 27.2% male in generation Z dataset. Additionally, on average, generation Y spends more on cosmetics on a monthly basis than generation Z.
Statistically, average monthly expense of generation Y is 1495.12 NTD whether for generation Z is 1166.67 NTD.
Table 3.4.
Gender and Average Monthly Expenses on Cosmetics (NTD)
Segments N Gender Average monthly expense on cosmetics (NTD)
Male Female Min Max Mean
Generation Y 82 31.7 % 68.3 % 0 8000 1495.12
Generation Z 81 27.2 % 72.8 % 100 6000 1166.67
Table 3.5.
Nationality and Profession
Segments N Nationality Profession
Taiwanese Other Student Employed
Generation Y 82 86.6% 14.4% 40.2 % 59.8%
Generation Z 81 93.8% 6.2% 82.7% 17.3%
Factor Analysis
Generation Y dataset.
The conduct of factor analysis for generation Y dataset focuses on lifestyle,
opinions, interests and buying behaviour of the sample. The factors were derived from principal component analysis, extracting only the values with an Eigenvalue higher
than 1 and performing an orthogonal rotation due to the fact that a set of independent variables is of interest for this study. After performing the Reliability analysis, the items could not be reduced according to the total correlation since most of the items displayed a lower value than 0.50. Consequently, the item reduction was conducted based on communalities and the rotated correlation matrix provided by the Factor Analysis. According to the percentage of variance, these three factors make for the explanation of 68.829% of the total data sample taken. The other components -having low-quality scores- were not assumed to represent real traits. Such components were considered “scree” as shown by Figure 3.1.
Figure 3.1: Quality Scores for Generation Y. Adapted from: Own analysis.
A scree plot visualizes the Eigenvalues (quality scores) that were generated.
From the scree plot, the first 3 components had Eigenvalues over 1 and were, therefore, considered as “strong factors.” All other components from component 4, had the Eigenvalues drop off substantially. The sharp drop between components 1-3 and components 4-13 strongly suggests that 3 factors underly this analysis.
The Determination coefficient values were used to assess the extent to which the 3 underlying factors account for the variance on the input variables. For instance, if the statement “I am willing to spend money on expensive cosmetic products” was predicted
from the three components by multiple regression, then the r square equals to 0.581, which is statements’s “Social media helps me choose cosmetic products.” communality.
Variables having low communalities, e.g. lower than 0.50, did not contribute much to measuring the underlying factors, and were removed iteratively
Moreover, each factor has an acceptable reliability, meaning that their Cronbach’s Alpha is greater than 0.70. The KMO and Bartlett’s test also present acceptable values which prove the sufficiency of this data for the analysis.
The Kaiser-Meyers-Oklin (KMO) test examines and evaluates the homogeneity of variables. Bartlett’s test of sphericity tests for correlation among the variables that were used in this study. The Kaiser-Meyers-Oklin value for the instrument was 0.827, and hence the factor analysis was appropriate for the given data set. Bartlett’s test of sphericity chi-square statistics was 555.919, which indicated that the statements are correlated and hence were suitable for structuring.
Resulting from the component matrix, the Pearson correlations between the items and the components are shown. These correlations are called factor loadings. Table 3.6 shows which variables measure which factors and is adapted from the rotated component matrix results in the analysis conducted.
After interpreting all components, the following factors were derived: “Knowledge on cosmetics”, “Social media impact” and “Environmental consciousness”. The variable labels were set after actually adding the factor scores to our data.
Table 3.6.
Results of Exploratory Factor Analysis for Generation Y (N=82) Component
the latest cosmetic products. 0.74 I am willing to spend money on
expensive cosmetic products. 0.702
I am more careful when choosing cosmetic products than other people.
0.692
I google a cosmetic product
online before buying it in store. 0.543 Social media affects my
decision when purchasing cosmetic products.
0.909
Social media helps me choose
cosmetic products. 0.859
The social media peer pressure to look good affects my purchasing decision.
0.74
I purchase cosmetic products because I have seen them in online tutorials.
0.712
I would rather purchase natural cosmetics than non-natural ones.
0.858
I purchase cosmetic brands that care about the environment and animals.
KMO = 0.827; Bartlett’s Test = 555.919; Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization
Generation Z dataset.
Factor analysis for generation Y, is done in the same way as for generation Z and equally focuses on lifestyle, opinions, interests and buying behaviour of the sample. The factors were derived from principal component analysis, extracting only the values with an Eigenvalue higher than 1 and performing an orthogonal rotation due to the fact that a set of independent variables is of interest for this study. After performing the Reliability analysis, the items could not be reduced according to the total correlation since most of the items displayed a lower value than 0.50. Consequently, the item reduction was conducted based on communalities and the rotated correlation matrix provided by the Factor Analysis. KMO and Bartlett’s test presents acceptable values which prove the sufficiency of this data for the analysis.
The KMO value for Generation Z dataset was concluded after several re-runs where it rose from 0.668 to 0.753. The re-runs were done after eliminating non-significant statements that had a very low correlation with the components (< 0.5) by observing the commonalities table. From the KMO of 0.753 and Bartlett's test, which had a p-value of 0.00, it was significant for factor analysis to proceed.
Another value that helps on the definition of factors is also the Eigenvalue. A common rule of thumb was used to select components whose Eigenvalue is at least 1 which is the case for the 4 underlying factors. The other components -having low-quality scores- were not assumed to represent real traits. Such components were considered "scree" as shown in Figure 3.2.
Figure 3.2: Quality Scores for Generation Z. Adapted from: Own analysis.
After eliminating variables according to communalities, we arrive at the final results (Table 3.7). Table 3.7 presents the results of the factor analysis for Generation Z. This table is adapted from the rotated component matrix and shows the Pearson correlations between the items and the components, which are called factor loadings.
After interpreting all components, the following factors were derived for Generation Z dataset: “Knowledge on cosmetics”, “Attitude towards advertisement”, “Social media impact” and “Environmental consciousness”. According to the percentage of
variance, these four factors make for the explanation of 66.9% of the total data sample taken. “% of variance” is the amount of variance attributable to each factor after extraction. This value was significant to the finding of this research, and therefore, the four factors which show what influences customers to buy cosmetic products were determined. Moreover, each factor has an acceptable reliability, meaning that their Cronbach’s Alpha is greater than 0.70 or rounded up to this value.
Table 3.7.
Results of Exploratory Factor Analysis for Generation Z (N=81) Component Social media helps me choose
cosmetic products. .774
Social media affects my decision when purchasing cosmetic products.
.807 The social media peer pressure
to look good affects my purchasing decision.
.769 I google a cosmetic product
online before buying it in store. .817 I always look at the online
reviews when buying a cosmetic product.
.824 I purchase cosmetic brands that
care about the environment and animals.
.811 I like to try/test cosmetic
products before I buy them. .612
I like to talk to my friends
about cosmetics. .649
I think for a long time before
buying a new cosmetic product. .743 I am more careful when
choosing cosmetic products than other people.
.735 I like to see ads about cosmetic
products. .759
I like cosmetic advertisements
with a foreign celebrity. .775
I like cosmetic advertisements
with an Asian celebrity. .778
I am willing to spend money on
expensive cosmetic products. .528
I would rather purchase natural cosmetics than non-natural ones.
.795
% of Variance 31.87 14.057 12.802 8.177
Eigenvalues 4.781 2.108 1.92 1.227
Cronbach’s Alpha .843 .758 .788 .654
KMO = 0.753; Bartlett’s Test = 538.725; Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.
Statements that portrayed knowledge on cosmetics were loaded to component 1, those that indicated advertisements influenced customers were loaded to component 2, and those showing social media influenced customers were loaded into component 3.
Finally, component 4 contained questions portraying customers who are environmentally conscious about cosmetics and purchase accordingly. Therefore, these new factors were further used for the segmentation of Generation Z according to their attitudes to shopping for different cosmetic products. All the factors found are acceptable considering their individual reliability test.
Cluster Analysis
A cluster analysis using SPSS software is conducted in order to discover further market segments that display the same behaviour or characteristics. Initially, a hierarchical cluster analysis was performed in order to discover how many potential clusters could be defined for this data sample. The results from this initial analysis suggested that a total of three clusters was of interest for this study, therefore a further K-means cluster analysis with a predetermined number of clusters is conducted to discover the different groups and their characteristics. The outcome of the analyses for each generation are further explored below.
Generation Y dataset.
Table 3.8.
Distances between Final Clusters (Generation Y)
Cluster 1 2 3
1 9.41 3.997
2 9.41 6.191
3 3.997 6.191
Cluster 1 and 2 are the furthest and different from each other, while clusters 1 and 3 are the least different from each other. The ANOVA table (Table 3.9) shows the significance of a statements’ influence on the formation of the cluster. The significance level is (Sig < 0.05). The questions with Sig value less than 0.05 had significant influence in the formation of the clusters.
Table 3.9.
ANOVA Table for Generation Y
Mean Square
Df Mean Square
df F Sig.
The brand of a cosmetic product is very important to me.
4.159 2 0.472 79 8.812 0
I purchase cosmetic products based on price.
0.135 2 0.757 79 0.178 0.837 I am willing to spend money
on expensive cosmetic products.
17.601 2 0.706 79 24.93 0
I buy whatever cosmetic product that is on sale.
0.587 2 1.404 79 0.418 0.66
I purchase cosmetic products if I like their advertisement.
6.749 2 1.258 79 5.365 0.007 I purchase cosmetic products
upon recommendation.
3.44 2 0.622 79 5.533 0.006
I do not pay attention to the attributes of a product when making a purchase. (e.g.
smooth skin, silky hair, volume etc.)
8.118 2 1.022 79 7.943 0.001
Social media helps me choose pressure to look good affects my purchasing decision.
16.535 2 0.755 79 21.89 0
I google a cosmetic product online before buying it in store.
23.927 2 0.659 79 36.284 0
I purchase cosmetic products because I have seen them in online tutorials.
I pay attention to the ingredients when I buy cosmetic products.
6.601 2 1.084 79 6.087 0.003
I purchase cosmetic products that have a nice design.
10.012 2 0.878 79 11.405 0 I always look at the online
reviews when buying a
I like to try/test cosmetic products before I buy them.
6.592 2 1.026 79 6.424 0.003 I think for a long time before
buying a new cosmetic product.
6.446 2 1.106 79 5.826 0.004
I like cosmetic products that
do not have side effects. 2.355 2 0.575 79 4.098 0.02
It is important to me to know the latest cosmetic products.
23.269 2 0.752 79 30.94 0
I feel that cosmetic products are not essentially important to me.
14.084 2 1.134 79 12.42 0
I rely on cosmetic products to
look good. 12.735 2 1.113 79 11.445 0
I like cosmetic products that
smell good. 0.034 2 1.112 79 0.031 0.97
I attend events where my hair, skin and make-up should look good.
6.501 2 1.119 79 5.812 0.004
I am more careful when choosing cosmetic products than other people.
14.194 2 0.579 79 24.525 0
I only buy the essential cosmetic products.
6.663 2 1.016 79 6.555 0.002 I don’t do research when
buying cosmetic products.
16 2 1.138 79 14.059 0
I like receiving free samples to know a product. I shop around for the best
value on cosmetics. 12.1 2 1.163 79 10.408 0
I prefer an Asian brand rather than a foreign one.
0.983 2 0.75 79 1.31 0.276
The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the differences among cases in different clusters. The observed significance levels are not corrected for this and cannot be interpreted as tests of the hypothesis that the cluster means are equal.
The number of cases in each cluster found in generation Y dataset are as shown in Table 3.10.
Table 3.10.
Number of Cases in each Cluster
Cluster N
1 35
2 11
3 36
Valid 82
After clustering the data, the findings showed that most statements were grouped in cluster 3. This means that out of 82 valid cases, 36 were similar in one way or another.
The ANOVA test also showed which questions played a significant role in clustering the generation Y survey participants. The statements that served to further cluster generation Y included brand importance, willingness to spend on expensive cosmetics, attitude towards advertisements, online reviews and information, social media impact, product attributes and composition, location of purchase, product design and finally the overall presence of cosmetics in their lifestyles. Variables that had no significance in clustering were product price, sales on cosmetics, smell and preference of purchasing natural or environmentally friendly cosmetics. This will be further explored in the next chapter.
Generation Z dataset.
Table 3.11.
Distances between Final Cluster Centers
Cluster 1 2 3
1 8.163 5.533
2 8.163 4.688
3 5.533 4.688
Table 3.11 shows the Euclidean distances between the final cluster centers. Greater distances between clusters correspond to greater dissimilarities. Clusters 1 and 2 were most different and further apart, while clusters 2 and 3 were the least different. The ANOVA table (Table 3.12) shows the significance of the questions in influencing the
formation of the clusters. (Sig < 0.05), questions with Sig value less than 0.05 had a significant influence in the formation of the clusters.
formation of the clusters. (Sig < 0.05), questions with Sig value less than 0.05 had a significant influence in the formation of the clusters.