RUNNING HEAD: The Rosenberg Self-Esteem Scale

Title
On the Factor Structure of the Rosenberg (1965) General Self-esteem Scale

1

Abstract
Since its introduction, the Rosenberg General Self-esteem scale (RGSE, Rosenberg, 1965) has been
one of the most widely used measures of global self-esteem. We conducted four studies to
investigate (a) the goodness of fit of a bifactor model positing a general self-esteem factor (GSE)
and two specific factors grouping positive (MFP) and negative items (MFN), and (b) different kinds
of validity of the GSE, MFN and MFP factors of the RSGE. In the first study (n= 11,028), the fit of
the bifactor model was compared with those of nine alternative models proposed in literature for the
RGSE. In Study 2 (n = 357), the external validities of GSE, MFP and MFN were evaluated using
objective grade point average data and multi-method measures of prosociality, aggression, and
depression. In Study 3 (n = 565), the across-rater robustness of the bifactor model was evaluated. In
Study 4, measurement invariance of the RGSE was further supported across samples in three
European countries, Serbia (n= 1,010), Poland (n= 699), and Italy (n= 707), and in the United States
(n = 1,192). All in all, psychometric findings corroborate the value and the robustness of the
bifactor structure and its substantive interpretation.
Keywords. Bifactor model; self-esteem; Rosenberg self-esteem scale; method effects; method
factors.

2

Self-esteem reflects an overall subjective evaluation of personal worth (Marsh & O’Mara,
2008; Rosenberg, 1965). A considerable amount of research has investigated the nature of this
construct(Baumeister, Campbell, Krueger, &Vohs, 2003), which represents one of the most popular
individual differences constructs in psychology (see Donnellan, Trzesniewski, & Robins, 2011, for
a review). Self-esteem, similar to any other psychological construct, is a latent variable that is not
directly observable. Yet individuals’ standing on the latent attribute can be inferred through their
answers to statements intended to describe internal positive and negative states, such as feelings and
emotions about the self (Borsboom, Mellenbergh& van Heerden, 2003).
Since its introduction, the Rosenberg General Self-esteem scale (RGSE; Rosenberg, 1965)
has been one of the most popular and widely used measures of global self-esteem (Blascovich
&Tomaka, 1991; Donnellan et al., 2011; Schmitt &Allik, 2005).According to PsycInfo, the
instrument has been cited 3,016 times during the last five years (2010-2014).The scale assesses the
“feeling that one is good enough” (Rosenberg, 1965, p. 31), and consists of 10 items with a high
degree of face validity. A large body of empirical evidence supports the internal consistency of the
instrument (Byrne, 1983), its predictive validity (Kaplan, 1980), and its equivalence over time
(Marsh, Scalas, &Nagengast, 2010; Motl & DiStefano, 2002). The popularity of the 10-item RGSE
has been due in part to its long history of use, its uncomplicated language, and its brevity (it takes
only 1 or 2 minutes to be completed).
In addition to its privileged place in the literature, the RGSE offers other potential
advantages. For example, it was developed in accordance with the recommended strategy of
building instruments with a balanced number of positively and negatively worded items (Paulhus,
1991). This approach helps to address acquiescence response bias (Marsh, 1996). One perhaps
unexpected drawback of this otherwise desirable property is that balanced scales can introduce
complexities with regard to the dimensionality of the measure. Thus, it is not surprising that over
the years, several authors have championed different structural models for the RGSE involving
multiple factors. Recent studies have demonstrated that the deviation from the unidimensionality
observed for the RGSE is mostly due to the effect of items’ wordings (Marsh et al., 2010; Tomás &
3

Oliver, 1999). The upshot of this psychometrically-oriented debate is a lack of consensus on how
the observed factors should be conceived in substantive terms (Tafarodi& Milne, 2002).
The Dimensionality of the Rosenberg (1965) Self-esteem Scale
Several researchers have acknowledged the need to consider one or more method factors,
along with general self-esteem (GSE), in testing the dimensionality of the RSGE (Kuster & Orth,
2013; Orth, Robins, &Widaman, 2011). In the vast majority of studies, factors associated with
positively and negatively worded items have been considered as methodological artifactsthat should
be controlled to obtain a satisfactory model fit, and thus authors did not indulge in further
speculations on the foundations of these factors (Kuster & Orth, 2013; Orth et al.,2011).
The present research was designed to provide additional insight into the nature of the latent
factors associated with positively and negatively worded items of the RGSE. The starting point was
the notion that the label “method effect” is inadequate to depict these factors. Method effects refer
to defects in method for assessing constructs (Fiske, 1987). Typical method effects are the inflated
correlations between unrelated traits due to the use of the same informant (e.g. self-report; see
Kenny &Kashy, 1992). These kinds of measurement artifacts are unrelated to substantive construct
variance. However, the so-called method effect factors for the RGSE have shown psychometric
properties similar to the substantive GSE factor (see Motl & DiStefano, 2002).In this regard, a
number of investigators who have analyzed method effects associated with the RGSE have been
able to demonstrate: (a) the convergent validity across instruments for the factors viewed as
reflecting method effects (Horan, DiStefano, &Motl, 2003); (b) their long-term stability (Motl&
DiStefano, 2002; Saada, Bailly, Joulain, Hervé, & Alaphilippe, 2013); and (c) their criterion-related
validity (Quilty, Oakman, &Risko, 2006). Other researchers using other instruments have further
demonstrated that method factors are stable across observers (Alessandri, Vecchione, Tisak &
Barbaranelli, 2011)and moderately heritable (Alessandri et al., 2009).
A Theoretical Interpretation
Alessandri, Vecchione, Donnellan and Tisak (2013) recently offered an overarching
interpretation of the three factors assessed by the ten items of the RGSE in terms of a bifactor
4

model(Chen, West & Sousa, 2006; Reise, Morizot, Hays, 2007). Bifactor models can be considered
when (a) there is a general factor accounting for the commonality of the items; (b) there are
multiple domain specific factors; and (c) both the common factor and the domain specific factors
are interesting for researchers (Chen et al., 2006). In the case of RGSE, the proposed bifactor model
is comprised by a general self-esteem factor plus two substantive specific factors. Alessandri et al.
(2013) suggested that the method factor associated with negatively worded items (MFN) shares
similar characteristic as the self-derogation factor described by Kaplan and Pokorny (1969), who
interpreted it as, “the expression of intense negative affect towards general self-conception” (p.
425). This interpretation has been repeatedly put forth by several other authors (e.g., Epstein,
Griffin &Botvin, 2004; Kaplan, 1978; Kaplan, Martin & Robbins,1982), who referred to the
negative items of the RGSE as “self-derogation” and found that these items predict adolescents’
drug use, aggression, and violence, and are associated with perceived low levels of self-efficacy.
The method factor associated with positively worded items (MFP), by contrast, seems to capture
self-competence, an aspect of self-evaluation linked with the individual’s appraisal of his or her
own abilities (Diggory, 1966; Gecas, 1971). This interpretation is in line with the observation made
by Tafarodi and Swann (1995) that, “high self-competence has an intrinsically positive and
evaluative character” and “is cognitively characterized by the presence of a generalized expectancy
for success” (p. 325).
An important caveat in embracing such an interpretation is that the work of Alessandri et al.
(2013) addressed a daily version of the RGSE. In their measure, items and instructions were
modified so that participants were instructed to give the response that best reflected how they felt at
the moment they completed the measure. Thus, it is not entirely clear if their results translate to the
general version of RGSE. Instead, an attractive feature of this interpretation is that it casts the two
MFN and MFP specific factors in a manner consistent with previous empirical research that
obtained self-derogation and self-enhancement measures from the RGSE item pool. However, if
MFP and MFN assessed with the bifactor model represent measures of self-derogation and selfcompetence, they should be considered “purified” versions of those used in the past that were
5

obtained by merely summing either the positive items and the negative items.This procedure leaves
room for contamination of the scale scores used in previous studies because variance due to GSE is
not partialled-out of scales for MFP and MFN. This problem is avoided by the bifactor approach.
The Present Research
We conducted four studies with the aim to address several important psychometric issues
pertaining to the use of the RGSE. The first study was designed to provide empirical support for the
bifactor model described above. The goodness fit of this model was evaluated and comparedwith
those of nine alternative models. These models were already identified and tested in previous
studies (see, among others, Marsh, 1996, Marsh et al., 2010, Motl & DiStefano, 2002, and Tomás&
Oliver, 1999). However, most researchers considered only a subset of models at a time (see Motl &
Di Stefano, 2002), or used slightly different (Marsh et al., 2010) or shortened (i.e., Marsh et al.
1996) versions of the RGSE. Moreover, the appropriateness of the model proposed by Tafarodi and
colleagues (Tafarodi & Milne, 2002; Tafarodi& Swann, 1995) was rarely investigated empirically
(for exceptions, see Alessandri et al., 2013, and Marsh et al., 2010). Thus it seemed appropriate to
compare all alternative models addressed in the literature, by using a standard version of the RGSE
and a large representative sample (Study 1). The ten competing models are presented in detail in the
first study. Subsequent studies were designed to corroborate the psychometric properties of the
hypothesized model, along with the convergent and discriminant validity of the three factors (Study
2), their convergent validity across raters (Study 3),and their equivalence across cultures (Study 4).
Findings from these studies were expected to support the aforementioned theoretical arguments
regarding a bifactorial structure for the RGSE.
Study 1
This study was designed to compare the fit of the hypothesized model with that of nine
alternative models presented in the literature for the RGSE. The ten competing models are
presented in Figure 1. The first model is the one-factor model, which has been found to be
inadequate in most prior works (see Byrne &Shavelson, 1986; Shevlin, Bunting, & Lewis, 1995, for
exceptions). Model 2 posits two correlated factors, capturing positively and negatively worded
6

items, respectively. The emergence of these two factors has been interpreted as resulting from a
methodological artifact (Carmines & Zeller, 1979), or as reflecting a substantive distinction
between positive and negative self-esteem (Kaplan &Pokorny, 1969; Openshaw, Thomas, &
Rollins, 1981; Owens, 1994). Model 3 posits two factors based on the distinction between transient
(corresponding to a rather unstable evaluation of self-.esteem) and general (corresponding to
general self-esteem) evaluations introduced by Kaufmann, Rasinski, Lee and West (1991). Model 4
represents the two components that were described by Kaplan and Pokorny (1969) as defence of
individual self-worth, or defensive self-enhancement, and self-derogation. Models 5 through 9
include models that posit a GSE factor and use several methods to control for method effects due to
item wording (Marsh et al., 2010; Tomás& Oliver, 1999). Following Marsh (1996), method effects
are defined herein as the variance linked to the nature of the measurement procedures, namely, the
use of negatively and positively worded items to assess general self-esteem. These variance
components can be modelled either (1) as correlations among equally worded items uniqueness, or
(2) as additional factors influencing items sharing the same wording. Accordingly, Model 5 posits
correlated uniqueness among the residual variances of negatively worded items, whereas Model 6
posits correlated uniqueness among the residual variances of positively worded items1. Model 7
posits a trait factor representing self-esteem plus a method factor underlying negatively worded
items. Model 8 posits a trait factor representing self-esteem plus a method factor underlying
positively worded items (Tomás& Oliver, 1999). Psychometrically, Models 7 and 8 used a version
of the Correlated Trait-Correlated Method Minus One (CT-C(M-1)) framework (Eid, Lischetzke,
Nussbeck, &Trierweiler, 2003), where method refers to the direction of item wording. Both models
contain one method factor less than the number of methods included. In Model 7, the positive
wording method was chosen as the comparison standard and dropped from the model. Thus, the
latent self-esteem factor represents true score variance of the positively worded items. In Model 8,
1

We did not test a model with correlated uniqueness among both positively and negatively worded

items because, as noted by Marsh (1996), this model “is not identified (this is an inherent limitation
with the model and has nothing to do with the particular data being tested)” (p. 816).
7

by contrast, the negative wording method was chosen as the comparison standard. As a
consequence, the latent self-esteem factor represents true score variance of the negatively worded
items. Both models posited method factors as uncorrelated with the trait factor (i.e., self-esteem). In
Model 9 (the hypothesized model), each item is explained by a trait factor (GSE) plus two method
factors (i.e., MFP and MFN). This model is essentially a classical bifactor model (Chen et al., 2006;
Reise, et al. 2007), which accounted for the covariation among RGSE items in terms of a broad
general factor (self-esteem) reflecting the overlap across all ten items, and two correlated factors
that capture item wording effects. Model 10 is the five-factor model introduced by Tafarodi and
colleagues (Tafarodi & Milne, 2002; Tafarodi & Swann, 1995), which includes a general factor of
self-esteem, the two factors of self-acceptance and assessment, plus MFN and MFP.
Method
Participants and Procedures
The present study included 11,028 participants (43.7% males) recruited within a crosssectional nationwide survey conducted in Italy to examine the attitudes of children, parents,
grandparents, and teachers with regard to lifelong learning. Participants were recruited by a team of
researchers and agreed to complete a set of questionnaires at their homes. For their participation in
the study, participants were offered feedback about their psychological profile. The age range was
from 15 to 85 years, with a mean of 38.17 years (SD=18.95). Participants were residents in various
geographic areas of the country: 28% represent northern Italy, 40% central Italy, and 32% southern
Italy. They varied widely in demographic and socioeconomic backgrounds, although the sample
was homogeneous in terms of ethnicity (all participants were Caucasian). Fifty-four percent were
married, and 46% were either unmarried, divorced, or widows/widowers. Eleven percent were in
professional or managerial ranks, 22% were merchants or operators of other businesses, 36% were
skilled workers, 27% were unskilled workers, and 4% were retired. Years of education ranged from
5 to 18 years (M=10.32;SD=3.56); 34% completed only elementary or junior high school, 55%
completed high school, and 11% earned a university degree.
Measures
8

Self-Esteem. The RGSE is comprised of 10 items (Rosenberg, 1965) that measure the extent
to which participants feel they possess good qualities and have achieved personal success. This
scale has been validated in Italian (
et al., 2012).Items were preceded with the following statement: "Below is a list of
statements dealing with your general feelings about yourself."Each item is scored on a 4-point scale
ranging from 1 “strongly disagree” to 4“strongly agree,” Alpha coefficients were not computed
because they are potentially inappropriate given the proposed multidimensional structure of the
RGSE (see Sijtsma, 2009). Estimates of reliability derived from the measurement model are
presented in the results section.
Statistical Analysis
To investigate the fit of alternative models, Confirmatory Factor Analysis (CFA) was used,
using the program Mplus7.11 (Muthén & Muthén, 2012). Whereas there has been considerable
debate in the literature concerning the use of maximum likelihood estimation (ML) with ordinally
scaled variables treated as continuous (West, Finch, & Curran, 1995), different simulation studies
have found ML to perform well with variables with four or more categories (Bentler & Chou,
1987), and under less than optimal analytical conditions (for example, in the presence of small
samples sizes and moderate departures from normality). ML has been also the elective method
suited in previous studies addressing the psychometric properties of the RGSE (e.g., Marsh, 1996;
Marsh et al., 2010; Tomás& Oliver, 1999). Because the hypothesis of multivariate normality was
non-tenable in the present sample (see Figure 2 and the Online Appendix A for more information
about multivariate skewness and kurtosis across the four studies), we employed the Satorra-Bentler
(1988) scaled chi-square statistic (SBχ2) and standard errors, which takes into account the nonnormal distribution of the data (Mplus estimator=MLM)2. The same software and estimation
method were used in subsequent studies. Because the chi-square is highly sensitive to the size of the

2

As a sensitivity test, we also ran some of the models using the WLS estimator. The parameter

estimates were nearly identical, and thus we present results obtained using the MLM estimator.
9

sample,theχ2likelihood ratio statistic was supplemented with other indices of model fit, such as the
Comparative Fit Index (CFI), and the Root Mean Square Error of Approximation (RMSEA) with
associated 95% confidence intervals (CI). We accepted CFI values greater than .95 and RMSEA
values lower than.08 (Kline, 2010). The Akaike Information Criterion (AIC, Burnham & Anderson,
1998) was used to compare the alternative non-nested models proposed for the RGSE. The lower
the AIC, the better the fit of the model.
Results and Discussion
Table 1 reports goodness of fit indices for alternative models. Because Model 10 was notidentified, it was excluded from further consideration. As expected, Model 9 showed the best fit.
Unstandardized and completely standardized loadings for Model are presented in Figure 2, Panel A.
These loadings ranged (in a standardized metric) from .35 to .80 (M= .53; SD= .15) for the GSE,
from-.14 to.59 (M = .37; SD = .19) for the MFP, and from .30 to .69 (M = .46; SD = .18) for the
MFN. The positively worded item (henceforth labeled PO)number 7 (PO7),as well as PO6, did not
loaded significantly on MFP(all other items were significant).Of interest, MFP and MFN were not
significantly correlated (-.05, p = .21). Setting this correlation to zero did not degrade model fit
(SBΔχ2(1) = .74, p < .39).
Deconstructing RGSE Variance and Estimating Overall Scale Reliability
In order to clarify the relative weight of each factor in explaining the covariance among the
RGSE items, unstandardized estimates were used to perform a formal variance decomposition at
both item and scale levels. Procedures used for variance decomposition are detailed in the Online
Appendix C, and a detailed presentation of variance decomposition at the item level is given in
Online Appendix B, Table 1. Following the common definition of scale reliability as the ratio
between the “variance of the true” score and the “total score variance” (Lord &Novick, 1956),we
computed the total scale reliability for the RSGE as the sum of the variances of the latent factors of
GSE, MFP, and MFN divided by the total scale variance (see Online Appendix C). This index is
more appropriate than the Cronbach’s alpha coefficient for multidimensional scales(see Sijtsma
2009, and Alessandri et al., 2013). Results showed that items PO6 and PO7 were almost completely
10

composed by GSE variance (plus measurement error). Items NE10 and NE9 were primarily
measures of MFN, whereas item PO1 mostly captured MFP variance. All other items (i.e.,
PO2,PO4, NE3,NE5,and NE8) presented a higher proportion of GSE variance than of MFP or MFN
variance, plus measurement error. At the scale level, the RGSE functioned mainly as a measure of
GSE (about 67% of reliable variance); MFP (about 18%) and MFN (about 2%) explained a lower
proportion of variance. Scale reliability was quite high, being .87, above the value of .80,which is a
common result when Cronbach’s alpha is applied to the RGSE (Blascovich & Tomaka, 1991).
In conclusion, this study supported Model 9 as the best fitting model for the RGSE items.
Results from variance decomposition indicated that, at the scale level, the variance of the RGSE
items was mostly captured by GSE, MFP and measurement error. MFN explained only a small
proportion of variance. We discuss these results in detail in the General Discussion section.
Study 2
This study evaluated the construct validity of the bifactorial model including GSE, MFP,
and MFN in predicting different constructs, such as grade point average (GPA), prosocial behavior,
aggression, and depressive symptoms levels. These constructs were selected because they have been
consistently linked to self-esteem in previous studies. Specifically, high levels of self-esteem have
been related to high GPA (Baumeister et al., 2003), low levels of aggressive behavior (e.g.,
Donnellan, Trzesniewski, Robins, Moffitt &Caspi, 2005), high levels of prosocial orientation
(Bartko & Eccles, 2003), and low levels of depressive symptoms (Donnellan, et al. 2011). Indeed,
different authors have theorized that self-esteem might in fact represent one of the most important
determinants of GPA, aggression, prosociality and depression (for a discussion, see Donnellan et
al., 2011). These variables may, therefore, be considered as appropriate outcomes to test the
construct validity of the general self-esteem factor.
Of interest, Alessandri et al. (2013) recently provided proof of differential associations of
daily GSE, MFP and MFN with the above factors. In this study, the daily GSE latent construct was
negatively associated with depressive symptoms levels; MFP was positively related to GPA; and
MFN was negatively associated to implicit self-esteem (i.e., non-conscious, automatic, over-learned
11

self-evaluations affectively charged, see Alessandri, et al. 2013). The present aimed to assess
whether Alessandri et al.'s(2013) findings generalize to the classical version of the RGSE,
introduced by Rosenberg (1965) to assess trait self-esteem.
In light of the aforementioned results, and of the large literature linking self-esteem to GPA,
aggression, prosociality and depression, we expected a negative relation between depressive
symptoms and GSE. Furthermore, we expected a positive relation of GSE with prosociality and a
negative relation with aggression, but no relations of MFP and MFN with prosociality and
aggression. Whereas literature widely supports our statements about GSE, we speculated that the
portion of variance captured by MFP and MFN taps very specific sets of core self-attitudes
involving sensitive emotionally loaded, but more specific, aspects of self-evaluations. MFN, in
particular, seems to tap the experience of a variety of negative emotions associated to a general
negative view of oneself, as implied by lower scores on implicit self-esteem (Alessandri et al.,
2013). Accordingly, we hypothesized that this factor would reveal a significant positive association
with levels of depressive symptoms. Finally, MFP seems to tap a subjective specific evaluation of
one’s own competence and abilities. Indeed, this factor has been found to significantly and
positively predict GPA in previous studies (Alessandri et al., 2013). Therefore, it seemed reasonable
to expect a positive and significant association between MFP and GPA. To reduce the risk of
common method bias, we obtained other-ratings along with self-reports for prosociality, aggression,
and depressive symptoms levels. This multi-trait-multi-method design (Campbell and Fiske 1959)
is likely to increase the validity of results.
Method
Participants and Procedures
The sample was composed of 357 university students (52% females) from Italy recruited by
several psychology majors as a part of a course assignment. This is a different sample than in Study
1.Participants were aged from 19 to 22 years (M = 21.01;SD =.97), and were all Caucasian. They
were contacted and administered the RGSE along with other measures of interest (i.e., GPA,
depression, aggression, prosociality). Each participant was required to bring one peer rater who
12

knew them “very well” with them at the designated time for completing the questionnaire. The peer
raters (n = 357) were described by participants as friends or colleagues. They responded to two
items asking: (1) how well they knew the target participant, and (2) to what degree they felt
emotionally close to him or her. Possible responses ranged from 1 (not at all) to 10 (very much).
The mean scores were 9.01 (SD = 1.76) for the first item and 9.22 (SD = 1.98) for the second item.
This suggests that, on average, raters felt close to the target and knew them very well. Each pair
(i.e., participant plus peer rater) completed the questionnaires at the same time during specially
scheduled sessions. They were separated to prevent sharing of information and informed that they
would not be able to view the other’s questionnaires to prevent possible data manipulation. In this
study, we did not ask informants to report on self-esteem of participants, but only on the outcomes
of interest.
Grade point average. Grade point average was assessed using a single question asking
participants to report their actual academic grade point average.
Depression. Participants rated their depressive symptoms using the CES-D(Radloff, 1977).
This 20-item scale measures the symptoms that characterize depression, such as despondency,
hopelessness, loss of appetite and interest in pleasurable activities, sleep disturbance, crying bouts,
loss of initiative, and self-deprecation. For each symptom, respondents rated the frequency of
occurrence during the past week, using a Likert scale that ranged from 1 = “rarely or none of the
time (less than 1 day)” to 4 = “most or all of the time (5-7 days)”(α = .90).The same items, worded
in third person, were completed by the informants in regard to the target participant (α = .91).
Aggression. Aggression was assessed using the 29-item Buss-Perry Aggression
Questionnaire (AQ; Buss & Perry, 1992). Participants ranked each statement (e.g., “If I have to
resort to violence to protect my rights, I will”, “I can’t help getting into arguments when people
disagree with me,” and “When frustrated, I let my irritation show”) from "extremely
uncharacteristic of me (1)" to "extremely characteristic of me (5)”(α = .89). Friends/colleagues rated
participants on the same items worded in the third person(α = .93).

13

Prosociality. Participants rated (1 =“never/almost never true”; 5 = “almost always/always
true”)their prosociality on a 16-item scale that assesses the degree of engagement in actions aimed
at sharing, helping, taking care of others’ needs, and empathizing with their feelings (Caprara,
Steca, Zelli, &Capanna, 2005; α= .92). These same items, worded in third person, were completed
by the informants (α=.96).
Results and Discussion
Model 9 resulted in a good data fit: SBχ2(24) = 42.87; CFI = .986, TLI =.975, RMSEA =
.045(.020,.067). Loadings (presented in full detail in Figure 2, Panel B) ranged (in a completely
standardized metric) from .44 to .87 (M= .63; SD = .14) for GSE, from -.13 to .51 (M = .36; SD =
.21) for MFP, and from .24 to .65 (M= .44; SD = .17) for MFN (Table 2). As in Study 1, only
itemsPO6 and PO7 loaded significantly only on GSE. Only item NE10showed a primary loading on
MFN. Other items loaded primarily on GSE, although the primary loadings were quite similar in
size than the secondary loadings, differently from the first study. A further difference from Study 1
was that the primary loadings of itemsNE9 and PO1 were on GSE rather than on MFN and MFP,
respectively. MFP and MFN were not significantly correlated. As in Study 1, fixing this correlation
to zero did not degrade model fit significantly (SBΔχ2(1) = 3.23, p= .07).
Variance decomposition and scale reliability
Results from variance decomposition replicated those for Study 1 with some caveats (see
Online Appendix B, Table 1). First, GSE explained more variance for itemNE9 and item NE10 than
in Study 1. Second, GSE explained a consistently higher proportion of variance for all items. At the
scale level, GSE explained the higher proportion of variance (about 78%), MFP about 11%, and
MFN about 2%. Overall scale reliability was.91.
Empirical correlates of GSE, MFP and MFN
After having established the best fitting model, aggression, prosociality, and depression
were added as latent factors loaded by self- and other reports. Correlated uniquenesses were
included among the same items when reported by different informants (i.e., self and other). These
correlations allow for associations between the same item assessed with different reporter that are
14

due to the content shared by the couple of items (Kenny & Kashy, 1992). GPA was added as an
observed variable. Then, GSE, MFP, and MFN were specified as predictors of the other variables.
This model (Figure 3) resulted in a good fit,SBχ2(89) = 114.65, p>.05, CFI = .97,TLI = .96,
RMSEA = .042(.01,.06). Prosocial behavior was positively and significantly predicted by GSE
(.20;p <.05), but not with MFP (.06;p <.05) and MFN (.03;p <.05). Aggression was significantly
and negatively predicted by GSE (-.53;p <.05), but not with MFP (-.06; p <.05)or MFN (.04;p
<.05). Depression was negatively predicted by GSE (-.63;p <.05), positively predicted by MFN
(.32; p <.05), and unrelated with MFP (.04; p <.05). GPA was positively predicted by MFP (.21; p
<.05) and not predicted by GSE (.05; p <.05) or MFN (.07; p <.05). Summarizing, this study
provided further support for Model 9 as the best fitting model and corroborated the predictive and
discriminant validity of GSE, MFN and MFP factors. We discuss in more detail these results in the
“General Discussion” section.
Study 3
The third study investigated the robustness of the RGSE structure across self-and otherratings. As stated above, previous studies commonly relied on self-reports, and were therefore
unable to disentangle artifactual from substantive sources of covariance among RGSE items. A test
of the substantive nature of GSE, MFP, and MFN can be performed by using different methods for
assessing self-esteem, such as self- and other-ratings. This procedure allows computation of a multimethod matrix (Campbell & Fiske, 1959) in which the variance common to different informants
would represent construct variance and the correlations between measures of traits obtained by selfand other-ratings would reflect the substantive nature of the trait. We hypothesized that the
substantive nature of GSE, MPN and MFN would be further supported by both (1) the emergence
of the same factors in both self- and other-ratings, and (2) substantial inter-rater agreement (i.e.,
high and significant correlations between self- and other ratings of the same factors).
Method
Participants and Procedures
Participants were 565Italian adults (56% females) ranging in age from 19 to 61 years
15

(M=38.51;SD=10.91),and were all Caucasian. This is a different sample than in Studies 1 and 2.
Years of education ranged from 8 to 18; 18% completed junior high school, 60% completed high
school and 22% earned a university degree. Each participant was required to bring with them one
peer rater who knew them “very well” at the designated time for completing the questionnaire.
Procedures were identical to those described for Study 2: The peer raters (n = 565) were described
by the participants as friends or colleagues. They responded to the same two items as in Study 2,
designed to assess their knowledge of the target participant and their feeling of closeness to her/him
(using a scale from 1 to 10). On average, informants knew the target reasonably well (M = 8.54; SD
= 1.12), and felt emotionally close to them (M = 9.01; SD =1.09).
Measures
Self-esteem. Participants completed the 10 items of the RSGE, as in previous studies. The
same 10 items, worded in third person, were also completed by informants.
Results and Discussion
Model 9showed a good fit for both self-ratings: SBχ2(24) = 32.93, p= .13; CFI = 1.00, TLI
=.990, RMSEA = .025(<.01,.045), and other- ratings:χ2(24) = 41.67, p> .05; CFI = .990, TLI
=.983, RMSEA = .034(.01,.052). For self-report, the pattern of loadings was similar to that found
in Studies 1 and 2.In particular, completely standardized loadings (presented in full detail in Figure
2, Panel C) ranged from .31 to .90 (M = .56, SD = .17) for GSE, from -.22 to .64 (M = .10, SD =
.36) for MFP, and from .39 to .70 (M = .52, SD = .13) for MFN. PO6 and PO7 were markers of
GSE and, along withPO4, did not load significantly on MFP (all other loadings were
significant).For other-report data, loadings showed a quite similar pattern to that found for selfreported data. In particular, completely standardized loadings (presented in full detail in Figure 2,
Panel D) ranged from .38 to .61 (M = .41, SD = .09) for GSE, from -.07 to .36 (M = .15, SD = .23)
for MFP, and from .26 to .50 (M = .37, SD = .11) for MFN.PO6 and PO7 were markers of GSE and
did not loaded significantly on MFP(all other loadings were significant).MFP and MFN were not
significantly correlated in either the self-or the other-rated versions of the RGSE.Zeroing this
correlation did not degrade model fit for either the self- (SBΔχ2(1) = 1.88, p= .17) or for the other16

rated(SBΔχ2(1) = .03, p= .86) versions of the scale. All in all, despite minor differences, the structure
of loadings for the self- and the other-version of the RGSE was similar.
Variance decomposition and scale reliability
For self-report, results from variance decomposition replicated those for Study 1 (see Online
Appendix B, Table 1). At the scale level, GSE explained the higher proportion of variance (about
70%), MFP about 17%, and MFN about 2%. Scale overall reliability was.90. These results were
closely replicated for other-report at the item level. The exception was a higher proportion of GSE
variance with respect to MFP variance for item PO1.At the scale level, GSE explained the higher
proportion of variance (about 72%), MFP about 15%, and MFN about 2%. Scale reliability was .90.
Factors convergence across methods
The presence of a similar pattern of loadings in both methods of assessment suggests that the
same model was obtained for self- and other-rated data. To assess the degree of correspondence
across raters, we built a single-group Correlated-Uniqueness model (CT-CU; Kenny & Kashy,
1992) that included the hypothesized structure of the RSGE for both self- and other-ratings (see
Figure 4). In this model, the correlations between the latent factors of GSE, MFP and MFN reported
by the self and peers were freely estimated. The degree of inter-rater agreement was investigated by
looking at these correlations. Correlations between item-specific residuals in self-evaluations and
the corresponding residuals in other evaluations were also estimated. This allowed us to take into
account the association between the same item assessed by a different rater that is not accounted for
by the convergence in the underlying latent trait, but is due, for example, to the shared content of
the items. Finally, the across-rater covariance of GSE with both MFP and MFN was set to zero.
As a preliminary step, we examined the measurement invariance of the bifactor model
across raters.As the difference between two scaled chi-squares for nested models is not distributed
as a chi-square, the tenability of the constraints imposed for testing measurement invariance was
examined with the scaled difference chi-square (Satorra & Bentler, 2001). Moreover, as the SBΔχ2
test has substantial power in large samples (Kline, 2010), we supplemented this statistic with the
ΔCFI. In this regard, Cheung and Rensvold (2002) wrote that “it makes no sense to argue against
17

the usefulness of the chi-square and rely on various goodness-of-fit indices (GFI) to evaluate the
overall model fit, and then argue for the usefulness of the chi-square instead of various GFIs to test
for measurement invariance” (p. 252). On the basis of their simulation study, the authors
recommended that investigators consider a difference in CFI larger than .01 as indicative of a
meaningful change in model fit. Although we present both SBΔχ2 and ΔCFI, we based our
decisions on the equivalence of the models on the latter index, in accordance with the suggestion of
Cheung and Rensvold (2002).
The configural invariance model[χ2(133) = 182.20, p> .05, CFI = .987, TLI = .981,RMSEA
= .03(.02-.04)] showed a good fit to the data. We therefore proceeded with tests of measurement
invariance, by constraining factor loadings to be equal across raters (metric invariance model). In
this model, we freed the variances of the self-rated factors. This model showed a good fit [χ2(150)
=241.72, p> .05, CFI = .975, TLI =.969, RMSEA = .03(.03-.04)], although it was substantively
different from the configural model(i.e., Δχ2(17)= 55.51, p < .01;ΔCFI = -.012). Partial metric
invariance was established after allowing itemsPO4 andNE10 to have different loadings for selfand other-rated MFP and MFN[χ2(147) =198.61, p> .05, CFI = .986; TLI = .982, RMSEA =
.03(.02-.04); Δχ2(14)= 16.64, p = .28, ΔCFI =.001)].Next, we constrained item intercepts to be
equal across raters (scalar invariance model). In this model, we freed the latent means of the selfrated factors, keeping the means of the other-rated factors fixed to zero. Accordingly, the estimated
means of self-rated factors can be interpreted as the difference relative to other-ratings. Moreover,
because the measurement unit corresponds to the standard deviation of self-rated factors, these
scores correspond to standardized mean differences (SMDother).[χ2(152) =217.67, p> .05, CFI =
.982, TLI =.978, RMSEA = .03(.02-.04); Δχ2(5)= 20.63, p < .01,ΔCFI =-.004)]. At this point,
correlations suggested a high degree of convergence among observers for GSE and a moderately
high convergence for both MFP and MFN (Figure 3). Of interest, we found no mean-level
differences between self- and other-ratings of GSE, but significant differences for MFP
(SMDother.42;p < .05) and MFN (SMDother-.20;p < .05). These differences suggest that individuals

18

tend to overestimate their own competences, while underestimating their tendency to self-derogate
in comparison to an external observer.
To summarize, results from this study supported (1) the robustness of Model 9 across
methods of administration, (2) a good degree of convergence between self- and other-rated GSE,
MFP, and MFN, (3), the existence of mean-level differences between self- and other-ratings for
MFP and MFN (but not for GSE).
Study 4
The aim of this study was to investigate the cross-cultural invariance of the bifactorial
structure of the RGSE and its generalizability across four different language versions of the
instrument, (i.e., English[US], Italian, Polish and Serbian) and thus across three European and a
non-European country such as the US. These samples were chosen due to an established
collaboration among scientists from respective countries. These cultures are deeply different in
terms of language, ways of living, and cultural traditions. For example, Italians score relatively
higher than Poland and U.S. on values related to egalitarianism(Schwartz, 2006). In turn, Polish
people score higher in values related to social embeddedness and to respect of the hierarchy than
people in Serbia, Italy, and U.S. (Schwartz, 2006). The culture of the U.S. is, instead, especially
high in affective autonomy and mastery compared with the rest of the countries (Schwartz, 2006).
Different values on the cultural level are linked to different self-construals on the individual level
(Schwartz, 2006).We predicted that factor loadings and intercepts would be equivalent across the
samples, which would indicate that (1) Model 9 replicates across countries, and (2) mean scores on
the RGSE can be reliably compared across countries. A previous study (Schmitt &Allik, 2005)
provided information regarding plausible mean level differences in GSE, MFP and MFN across
these four countries. Schmitt and Allik (2005) reported higher levels of GSE, MFP and MFN, in
Serbia, followed by U.S., Italy and finally by Poland. In their study, measures of GSE, MFP and
MFN were computed as the sum of the items, all positively scored. Thus, despite possible
differences arising from the method used to compute measures of GSE, MFP and MFN, we
expected to replicate these results.
19

Method
Participants and procedures
U.S. participants were 520 men and 672 women ranging in age from 18 to 28 years (M =
18.62;SD = 2.52).Serbian participants were 501 men and 509 women between 19 and 29 years of
age (M = 23.12;SD= 4.63).Polish participants were 354women and 345 men ranging in age from 18
to 35 years (M = 21.55;SD = 2.13). In Italy, participants were 386 women and 321 men ranging in
age from 18 to 28 years (M = 19.21;SD = 1.40).This is a different sample than in Studies 1, 2, and 3.
All participants were college students and homogeneous in terms of ethnicity (all participants were
Caucasian).Data were collected as part of a course assignment at the Arizona State University
(United States), at the “Sapienza”, University of Rome(Italy), and at the Catholic University of
Lublin (Poland), and at the University of Novi Sad (Serbia). Students from Italy and U.S. received
course credits for their participation in the study.In all countries, we administered well validated
versions of the RGSE already available in English (Rosenberg, 1965), Polish (Łaguna, LachowiczTabaczek, Dzwonkowska, 2007), Serbian (Opačić, 1993), and Italian (Caprara, et al. 2012)
languages. Items were presented in the same order in all samples.
Results and Discussion
Results showed that Model 9 yielded a good fit within each of the four countries (Table
1).Factor loadings (factors loadings were presented in the Online Appendix A) on GSE were all
significant (M = .55; SD = .13), ranging from .25 (Italy) to .87 (Poland). Factor loadings of items
PO6 and PO7 on MFP were non-significant in U.S., Serbia, and significant (but negatively signed)
in Italy and Poland. The remaining loadings on MFP were positive and significant (M = .32, SD =
.12), ranging from .22 (U.S.) to .59 (Italy). Factor loadings on MFN are all positive and significant
(M = .44, SD = .18), ranging from .15 (U.S.) to .72 (Poland). In all countries, only itemsPO6 and
PO7 were pure markers of GSE, as in previous studies. However, four additional items emerged as
relatively pure markers of GSE in one or two countries. These were items NE3 and NE5 (U.S. and
Poland) and items PO1 and PO2 (U.S.). Item NE9 consistently showed a primary loading on MFN

20

in all countries. All other items loaded primarily on GSE (although the primary loadings are not
apparently higher than the secondary).
Variance decomposition and scale reliability
Results from variance decomposition replicated those of previous studies (Online Appendix
B, Table 2):items PO6 and PO7 were almost completely composed of GSE variance (plus
measurement error) in all countries. The highest proportion of MFN variance was observed for
NE10 in all countries, and for NE9 in U.S., Italy, and Poland. All other items were composed
primarily of GSE variance, with exception of PO1 in Italy, which was composed primarily by MFP
variance. At the scale level, GSE explained the highest proportion of variance (from 60% in Italy to
77% in U.S.), followed by MFP (from 10% in U.S. to 19% in Italy), and MFN (from 1% in U.S. to
8% in Italy). Scale overall reliability was high: .88 (U.S.), .87 (Italy), .86 (Poland), and .86 (Serbia).
Cross-cultural measurement invariance
As a next step, we used a multi-group CFA to assess the cross-cultural invariance of Model
9, following the same procedure as in Study3. The configural and the metric invariance models
fitted the data well (Table 1). However, the addition of equality constraints on item loadings
substantially worsened model fit (Table 1).Partial weak invariance was reached after releasing the
equality constraints of (1) the factor loading of the negatively worded item NE3 on MFN in Serbia
and Poland; (2) the factor loading of negatively worded item NE5 on MFN in Serbia and the U.S.;
(3) the factor loading of the positively worded item PO7on MFP in Poland and Serbia; (4) the factor
loading of NE10 on MFN in the U.S.; (5) the factor loading of NE8on MFN in Poland; (6) the
factor loading of NE8 in Serbia on GSE; (7) the factor loading of PO7 in Poland on GSE; and (8)
the factor loading of PO6 in Italy on GSE. This partial-metric-invariant model fit the data well, and
was not appreciably different from the configural invariance model according to the ΔCFI index
(Table 1).The scalar model invariance fit the data well, although significantly worse than the
previous partial scalar model (Table 1). To obtain scalar invariance, we released constraints
imposed on the intercepts of (1) NE9 in the U.S., (2) PO4 in Serbia, and (3) NE10 in Poland (Table
1).As a final step, we compared latent means for GSE, MFP and MFN. We selected the U.S. as the
21

reference group because previous findings have consistently reported higher levels of GSE for U.S.
than for other countries (see Schmitt &Allik, 2005).Accordingly, the estimated latent means can be
interpreted as standardized mean differences with respect to the U.S. sample (SMDusa).Constraining
latent means to equality resulted in a significant decrement of the fit, ΔCFI=.019, Δχ2(9)= 154.64; p<
.05. Eventually, equality constraints were maintained (1) for GSE between Serbia and Poland, and
(2) for MFP in Italy and Serbia, ΔCFI=.00;Δχ2(2)=.07, p =.97. The meaning of these constraints is
that Serbia and Poland did not differ in mean levels of self-esteem, and Italy and Serbia did not
differ in mean levels of MFP. As expected, the highest levels of self-esteem were found in the U.S.
and Italy (SMDusa = -.21, p = .14), followed by Serbia (SMDusa =-.30, p< .05), and Poland (SMDusa
=-.31, p< .05). The highest levels of MFP were found in Italy (SMDusa =.16, p< .05) and Serbia
(SMDusa =.14, p< .05), followed by the U.S. (SMDusa =.00) and Poland (SMDusa =-.07,p = .40).
Higher levels of MFN were found in Poland (SMDusa =.68,p< .05) and Serbia (SMDusa =.30,p< .05),
followed by the U.S (SMDusa =.00) and Italy (SMDusa =-.08,p = .30).In summary, this study
supported the viability of Model 9 across four different countries, along with the robustness of the
measurement properties of the RGSE, in terms of partial measurement invariance. Finally, this
study revealed a substantive pattern of mean level differences in GSE, MFP and MFN mean levels.
These results are discussed in full detail below.
General Discussion
The RSES is perhaps the most widely used instrument to assess self-esteem (Donnellan et
al., 2011; Gray-Little, Williams, and Hancock, 1997; Schmitt & Allik, 2005). It has been included
in important longitudinal studies (e.g., the Americans’ Changing Lives, Longitudinal Study of
Generations, or the National Longitudinal Survey of Youth) and probably has been subjected to
more psychometric analyses and empirical validation than any other self-esteem measure (Wylie,
1989). The goal of this investigation was to address unresolved issues related to the dimensionality
of the instrument by testing alternative models. Below we provide a detailed discussion of
resultsand suggestions for future research.
Is a One-factor Model Appropriate to Describe the RGSE Structure?
22

We compared ten structural solutions to the RGSE supported by previous studies (Marsh et
al., 2010). Our data clearly suggest that the RGSE does not have a one-factor structure. Across
different samples, raters, and cultures, results suggested that there are three dominant latent factors
assessed by the RGSE—a general self-esteem factor (GSE), plus two specific factors associated
with negatively (MFN), and with positively worded items (MFP). When computed with an
appropriate index, scale reliability was adequate, according to current standards (Kline, 2010).
These results have several implications for the use of the RGSE scale. First, the overall
mean score on the RGSE does not take into account the presence of specific factors associated with
positively and negatively worded items might generate biased results. As an example, we reestimated the model reported in Study 2, using the poor fitting Model 1 (see Alessandri et al., 2013,
for a similar approach). Of course, we are aware that a misspecified model may lead to biased
parameter estimates (Kline, 2010), but we believe that these data can be useful to understand the
effect of biases associated with collapsing scores. In this model, the significant relations of GSE
with aggression, depression, and prosociality remained unaltered. Yet, unlike what we found by
using a bifactorial model (i.e., Model 9), GPA was significantly predicted by GSE. All in all, it is
likely that collapsing in a single factor the proportion of variance belonging to different constructs
is responsible, for example, for some of the problems raised in the debate surrounding the relation
between GSE and GPA (Donnellan et al., 2011). This relation might be significantly attenuated
when measures of GSE and MFP are confounded. Anyway, it is possible that imposing a
unidimensional model for the RGSE items can mistakenly generate spurious relations between GSE
and other variables. A more nuanced approach in future studies might be the use of appropriate
methodologies to clarify how GSE, MFP, and MFN are related to outcomes of interest. On this
regard, even though the RGSE scale appears to be composed of three factors does not mean that
researchers should consider each of them in future studies. Instead, it seems reasonable that
researchers focus their attention on the factor that best matches their research interests. However,
we recommend that researchers compute measures of GSE using the appropriate bifactor model.

23

In short, we obtained evidence that the three factors assessed by the RGSE reflect different
psychological features. Most importantly, an interpretation of the general factor as “general selfesteem”, of MFP as a measure of “self-competence,” and of MFN as “self-derogation” seems
sensible in light of the data, and in line with our initial expectations. From a general stance, these
results align with earlier intuitions of Tafarodi and Swann (1995), and of Kaplan and Pokorny
(1969), who surmised that positive and negative worded items from the RGSE assessed aspects of
self-evaluations different from general self-esteem. Moreover, these data confirm previous studies
suggesting that MFP and MFN reflect different psychological features (DiStefano & Motl, 2009;
Quilty et al., 2006). In other words, these factors do not simply capture pure method variance that is
unrelated with substantive variables. Seemingly stable individual tendencies, apart from GSE level,
also influence individuals’ responses to RGSE items. In sum, it seems safe to conclude at this point
that individuals’ answers on items of the RGSE are influenced by a combination of individual selfesteem, perceptions of self-confidence (positively worded items), and feelings of self-derogation
(negatively worded items), plus, of course, measurement error.
Are MFP and MFN the Product of Biases Associated with the Method of Assessment?
We submitted the RGSE to a relatively strong test of the robustness of its structure. Based
on the results, we can confidently assert that the RGSE structure is relatively stable across observers
and cultures. Taxing tests of measurement invariance provided evidence of configural and (partial)
metric and scalar invariance. The newest and perhaps most interesting feature of this study was the
examination of cross-rater invariance of the RGSE. First, we found that psychometric properties
were basically preserved when the RGSE was used to evaluate others’ rather than one’s own selfesteem. In addition, we found a moderately high degree of cross-rater convergence for GSE (see
also Robins, Hendin & Trzesniewski,2001), and a moderate degree of convergence for both MFP
and MFN. Whereas care always should be taken when evaluating the nature of convergence among
psychometric factors (see Alessandri et al., 2010, for a discussion and a similar point), we believe
that from a theoretical point of view, these findings, along with the data corroborating the crosscultural generality of the model, suggest that it is highly debatable to consider MFP and MFN the
24

result of spurious method variance. We contend, instead, that the term specific factor should be
considered more appropriate when describing these factors, in line with their psychometric status
when considered under the lens of a classical bifactor model (see Chen et al., 1996).
Other interesting findings of this analysis concern the accuracy with which levels of GSE,
MFP, and MFN can be reliably assessed by an external observer. According to our results, it seems
that observers were quite accurate in evaluating GSE, and, in fact, we found non-significant mean
differences between self- and other-rated GSE. Instead, it seems that individuals (i.e., those self
reporting), in comparison to observers, generally hold a more positive attitude toward their
perceived capacities (i.e., higher MFP), and have a less negative attitude toward themselves (i.e.,
lower MFN). These results accord nicely with previous studies showing that self-assessments are
usually biased in the direction of positively distorted evaluations (e.g., Alicke & Govorun, 2005).
However, these results require further empirical replication and validation.
What is the Relative Importance of Each Factor?
One important feature of our study is that it offers a decomposition of observed variance into
GSE, MFP, MFN, and measurement error components. Across studies, we found that the RGSE
scale reflects a preponderance of self-esteem variance (M= 70.06%;min = 66.54%; max = 77.22%).
Although the MFP (M = 15.00%; min = 9.79%; max = 18.06%) and MFN (M = 3.19%; min =
1.45%; max = 4.17%) represent a small portion of variance in the items, they should not necessarily
be considered less important factors because they were predictive of distinct outcomes. Instead,
these results simply indicate that the RGSE provides a valid measure of GSE, and a less efficient
measure of MFP and MFN. However, the presence of MFP and MFN should be acknowledged and
explicitly modelled. Blending these variance components together has the potential to impact the
research questions that psychologists pose and, as a result, the trustworthiness of the answers they
obtain. Finally, looking at the item levels, we noted that only two items were consistently pure
(although noisy, and in fact contaminated by a high proportion of measurement error) measures of
global self-esteem. As it stands, these results seem to corroborate the idea that, at times, using short
measures to assess widely acknowledged constructs like self-esteem may be advantageous and
25

possible (Burisch, 1984; Robins et al., 2001). Future studies might test the idea that these two items
represent useful pure markers of self-esteem.
Is the Bifactor Structure Sufficiently Stable?
A significant contribution of this study is to the understanding of the degree of cross-cultural
invariance of the RGSE items. All in all, we found a satisfactory degree of invariance, pointing to a
satisfactory degree of robustness of the psychometric properties of the bifactor model. We also
found a pattern of interesting mean-level differences in GSE, MFP and MFN. In part, our results
accord nicely with what reported by Schmitt and Allik (2005). For example, participants from the
U.S. reported higher levels of self-esteem than participants from European countries. However, our
results were unique because we tested the ways in which MFP and MFN differed across countries.
For example, people in Italy and Serbia reported higher scores on MFP than in the US, and those
from Poland resulted in higher scores on MFN than in the U.S. Because scores on the MFP and
MFN factors can be interpreted as a measures of self-competence and self-derogation, the pattern
seems to mimic quite well differences across countries in values of egalitarianism, social
embeddedness, and autonomy(Schwartz, 2006). However, more data and representative samples are
necessary to put the above differences into a broader theoretical framework of reliable cultural
differences in self-derogation and self-competence. Most importantly, the bifactor structure of the
RGSE needs to be further evaluated for generalizability across a wider range of countries and
cultures.
What are the Practical implications of the Model?
We believe that our research could have wide relevance for the field of general self-esteem
research, which has classically relied on the RGSE scale. In the context of theory building, for
example, researchers should be aware that the scale does not measure a single, general factor. It
should be noted that present results do not necessarily question the practical usefulness of the RGSE
in applied contexts, and do not call into question its usefulness for screening purposes, given that
variance decomposition revealed that the GSE factor explained the lion’s share of items’ variance.
However, we surmise that using a total score as a basis for screening individuals at risk for low self26

esteem might be suboptimal, as the total score may be contaminated by different sources of
variance. Recognizing the different variance components tapped by the ten RGSE items may give
practitioners additional useful information.
In addition to these practical and theoretical benefits, there are other potential conceptual
advances that follow from carefully considering our bifactorial model. For instance, theories in
developmental psychology might benefit from simultaneously investigating the individual
differences in the development of each component of the RGSE to achieve a richer understanding
of how the evaluative components of the self develop over time. Likewise, theories in clinical
psychology could try to link each of the three latent factors composing the RGSE psychometric
structure to the same clinical phenomena (i.e., depression, anxiety, etc.) to better understand the
relations between different components of the evaluative self-concept.
There are also important psychometric considerations that follow from our results. For
instance, the bifactor model can be viewed as incorporating hypotheses concerning the way that
individuals answer the RGSE items. In this sense, variance decomposition at the item level
represents a suitable way to understand the degree of precision with which each item does capture
different components of self-evaluations. Results from this analysis might be further suited to build
a more refined understanding of the different factors that shape one’s evaluation of the self.
Conclusion
Even though several studies have investigated the dimensionality of RGSE, the issue seems
at present to be far from being definitively settled. We hope to have provided new data supporting
the view that this instrument is best represented by a bifactor model (Chen, et al. 2006), which is the
only model available in the literature able to capture the multifaceted sources of influence reflected
in these items. Each factor included in the model (i.e., GSE, MFP, MFN) has an apparently
straightforward interpretation that, of course, could be further refined in future studies. In particular,
the generalizability of observed results should be extended to countries with different cultural
background than Serbia, Poland, Italy, and the US. We emphasize that our current results do not
necessarily hamper the value of previous studies based on the RGSE total score. It is difficult on the
27

basis of this study to know to what extent the use of a more appropriate structure for the RGSE
would determine change to results in studies using a one-factor model. However, one should bear in
mind that the RGSE scale measures three distinct aspects of self-evaluations. Depending on the
specific research goals, the results of our study might provide a basis for redefining the current scale
to arrive at distinct measures of each of these factors.

28

References
Alessandri, G., Vecchione, M., Donnellan, M.B., &Tisak, J. (2013). An application of the LCLSTM framework to the self-esteem instability case. Psychometrika, 4, 769792.10.1007/s11336-013-9326-4
Alessandri, G., Vecchione, M., Fagnani, C., Bentler. P.M., Barbaranelli, C., Medda, E., … Caprara,
G.V. (2010). Much more than model fitting? Evidence for the heritability of method effect
associated with positively worded items of the revised Life Orientation Test. Structural Equation
Modeling, 17, 642-653.doi:10.1080/10705511.2010.510064
Alessandri, G., Vecchione, M., Tisak, J., & Barbaranelli, C. (2011). Investigating the nature of
method factors through multiple informants: Evidence for a specific factor? Multivariate
Behavioral Research, 46,625-642.doi:10.1080/00273171.2011.589272
Alicke, M.D., &Govorun, O. (2005). The better-than-average effect. In M.D. Alicke, D.A. Dunning,
& J.I. Krueger, (Eds.), The self in social judgment. New York: Psychology Press.
Bartko, W.T., &Eccles, J.S. (2003). Adolescent participation in structured and unstructured
activities: A person-oriented analysis. Journal of Youth and Adolescence, 32, 233241.doi:10.1023/A:1023056425648
Baumeister, R.F., Campbell, J.D., Krueger, J.I., &Vohs, K.D. (2003). Does high self-esteem cause
better performance, interpersonal success, happiness, or healthier lifestyles? Psychological
Science in the Public Interest, 4, 1-44.doi:10.1111/1529-1006.01431
Bentler, P.M. & Chou, C.P. (1987). Practical issues in structural modeling. Sociological Methods &
Research, 16, 78-117

Blascovich, J., &Tomaka, J. (1991). Measures of self-esteem. In J.P. Robinson, P.R. Shaver, & L.S.
Wrightsman (Eds.). Measures of personality and social psychological attitudes (vol.1, pp. 115160). San Diego, CA: Academic Press.
Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2003). The theoretical status of latent
variables. Psychological Review, 110, 203-219.doi:10.1037/0033-295X.110.2.203.

29

Burisch, M. (1984). Approaches to personality inventory construction. A comparison of merits.
American Psychologist, 39, 214-227.doi:10.1037/0003-066X.39.3.214
Burnham, K.P. & Anderson, D.R. (2004). Multimodel inference: understanding AIC and BIC in model
selection. Sociological Methods and Research, 33, 261-304.doi:10.1177/0049124104268644
Buss, A.H., & Perry, M.P. (1992).The aggression questionnaire. Journal of Personality and Social
Psychology, 63, 452-459.doi:10.1037/0022-3514.63.3.452
Byrne, B.M. (1983). Investigating measures of self-concept. Measurement and Evaluation in
Guidance, 16, 115-2.
Byrne, B.M., &Shavelson, R. (1986). On the structure of adolescent self-concept. Journal of
Educational Psychology, 7, 474-81.doi:10.1037/0022-0663.78.6.474
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81-105.
Caprara, G.V., Alessandri, G., Trommsdorff, G., Heikamp, T., Yamaguchi, S., & Suzuki, F. (2012).
Positive orientation across countries. Journal of Cross Cultural Psychology, 43, 7783.doi:10.1177/002202211142225
Caprara, G.V., Steca, P., Zelli, A., & Capanna, C. (2005). A new scale for measuring adult’s
Prosociality. European Journal of Psychological Assessment, 21, 77-89.doi:10.1027/10155759.21.2.77.
Carmines, E.G. & Zeller, R.A. (1979). Reliability and validity assessment. Newbury Park, Sage.
Chen, F.F., West. S.G., & Sousa, K.H. (2006). A comparison of bifactor and second-order models of
quality of life. Multivariate Behavioral Research, 41, 189-225.doi:10.1207/s15327906mbr4102_5
Cheung, G.W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural Equation Modeling, 9, 235-255.
Diggory, J.C. (1966). Self-evaluation: Concepts and studies. New York: Wiley.
DiStefano, C. & Motl, R.W. (2009). The relationship between personality factors and item phrasing.
Personality and Individual Differences, 46, 309-313.doi:10.1016/j.paid.2008.10.020

30

Donnellan, M.B., Trzesniewski, K.H., & Robins, R.W. (2011). Self-esteem: Enduring issues and
controversies. In T. Chamorro-Premuzic, S. von Stumm, & A. Furnham (Eds.). The WileyBlackwell handbook of individual differences (pp.718-746). New York: Wiley-Blackwell.
Donnellan, M.B., Trzesniewski, K.H., Robins, R.W., Moffitt, T.E., & Caspi, A. (2005). Low selfesteem is related to aggression, antisocial behavior, and delinquency. Psychological Science, 16,
328-335.doi:10.1111/j.0956-7976.2005.01535.x
Eid, M., Lischetzke, T., Nussbeck, F.W. &Trierweiler, L.I. (2003). Separating trait effects from
trait-specific method effects in multitrait-multimethod models: A multiple-indicator CT-C(M-1)
model. Psychological Methods, 8, 38-60.doi: dx.doi.org/10.1037/1082-989X.8.1.38
Epstein, J.A., Griffin, K.W., &Botvin, G.J. (2004). Efficacy, self-derogation, and alcohol use
among inner-city adolescents: Gender matters. Journal of Youth & Adolescence, 33, 159166.doi:10.1023/B:JOYO.0000013427.31960.c6
Fiske, D.W. (1987). Construct invalidity comes from method effects. Educational and
Psychological Measurement, 47, 285-336.doi:10.1177/0013164487472001
Gecas, V. (1971). Parental behavior and dimensions of adolescent self-evaluations. Sociometry, 34,
466-482.doi:10.2307/2786193
Horan, P.M., DiStefano, C., & Motl, R.W. (2003). Wording effects in self-esteem scales:
Methodological artifact or response style? Structural Equation Modeling, 10, 435455.doi:10.1207/S15328007SEM1003_6
Kaplan, H., &Pokorny, A. (1969). Self-derogation and psychosocial adjustment. Journal of Nervous
and Mental Disease, 149,421-434.doi:10.1097/00005053-196911000-00006
Kaplan, H.B. (1978). Deviant behavior and self-enhancement in adolescence. Journal of Youth and
Adolescence.7, 253-277.doi:10.1007/BF01537977
Kaplan, H.B. (1980). Self-attitudes and deviant behavior. Santa Monica, CA: Goodyear.
Kaplan, H.B., Martin, S.S., & Robbins, C. (1982). Application of a general theory of deviant
behavior: Self-derogation and adolescent drug use. Journal of Health and Social Behavior, 23,
274-294.doi:10.2307/2136487
31

Kaufman, E., Rasinski, K.A., Lee, R., & West, J. (1991). National Education Longitudinal Study of
1988. Quality of the responses of eighth-grade students in NELS88. Washington, DC: U.S.
Department of Education.
Kenny, D.A., & Kashy, D.A. (1992). Analysis of the multitrait-multimethod matrix by confirmatory
factor analysis. Psychological Bulletin, 112, 165-172.doi:10.1037/0033-2909.112.1.165
Kline, R.B. (2010). Principles and practices of structural equation modeling. New York: Guilford.
Kuster, F. &Orth, U. (2013). The long-term stability of self-esteem: Its time-dependent decay and
nonzero asymptote. Personality and Social psychology Bulletin,39,677690.doi:10.1177/0146167213480189
Łaguna, M., Lachowicz-Tabaczek, K., Dzwonkowska, I. (2007). Skalasamooceny SES
MorrisaRosenberga – polskaadaptacjametody [The Rosenberg Self-Esteem Scale: Polish
adaptation of the scale]. PsychologiaSpołeczna [Social Psychology], 2, 164-176.
Lord, F.M. &Novick, M.R. (1968). Statistical theories of mental test scores. Reading MA:
Addison-Welsley Publishing Company.
Marsh, H.W. (1996). Positive and negative self-esteem: A substantively meaningful distinction or
artifactors? Journal of Personality and Social Psychology, 70, 810-819.doi:10.1037/00223514.69.6.1151
Marsh, H.W., & O’Mara, A. (2008). Reciprocal effects between academic self-concept, self-esteem,
achievement, and attainment over seven adolescent years. Personality and Social Psychology
Bulletin, 34, 542-552.doi: 10.1177/0146167207312313
Marsh, H.W., Scalas, L.F., & Nagengast, B. (2010). Longitudinal tests of competing factor
structures for the Rosenberg Self-Esteem Scale: Traits, ephemeral artefacts, and stable response
styles. Psychological Assessment, 22, 366-381.doi: 10.1037/a0019225
Motl, R.W., & DiStefano, C. (2002). Longitudinal invariance of self-esteem and method effects
associated with negatively worded items. Structural Equation Modeling, 9,562578.doi:10.1207/S15328007SEM1003_6
Muthén, L., & Muthén, B. (2004). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén.
32

Opačić, G. (1993). Porodične varijable i koncept o sebi kod adolescenata [Family variables and
self-concept of adolescents]. Research Report, University of Belgrade, Belgrade, Serbia.
Openshaw, D.K., Thomas, D.L., & Rollins, B.C. (1981). Adolescent self-esteem: A
multidimensional perspective. Journal of Early Adolescence, 1, 273282.doi:10.1177/027243168100100306
Orth, U., Robins, R.W., &Widaman, K.F. (2012). Life-span development of self-esteem and its
effects on important life outcomes. Journal of Personality and Social Psychology, 102, 12711288.doi:0.1037/a0025558
Owens, T.J. (1994). Two dimensions of self-esteem. American Sociological Review 59:391407.doi:10.2307/2095940
Paulhus, D.L. (1991). Measurement and control of response bias. In J.P. Robinson, P.R. Shaver &
L.S. Wrightsman, (Eds.), Measures of personality and social psychological attitudes, (Vol. 1, pp.
17–59). Academic Press, San Diego.
Quilty, L.C., Oakman, J.M., &Risko, E. (2006). Correlates of the Rosenberg self-esteem scale
method effects. Structural Equation Modeling, 13, 99-117.doi:10.1207/s15328007sem1301_5
Radloff, L.S. (1977). The CES-D Scale: A self-report depression scale for research in the general
population. Applied Psychological Measurement, 1, 385-401.doi:10.1177/014662167700100306
Reise, S.P., Morizot, J., & Hays, R.D. (2007). The role of the bifactor model in resolving
dimensionality issues in health outcomes measures. Quality of Life Research, 16, 1931.doi:10.1007/s11136-007-9183-7
Robins, R.W., Hendin, H.M., & Trzesniewski, K.H. (2001). Measuring global self-esteem:
Construct validation of a single-item measure and the Rosenberg Self-Esteem Scale. Personality
and Social Psychology Bulletin, 27, 151-161.doi:0.1177/0146167201272002
Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: University Press.
Saada, G.K., Bailly, Y., Joulain, N., Hervé, M., &Alaphilippe, C.A. (2013). Longitudinal factorial
invariance of the Rosenberg Self-Esteem scale: Determining the nature of method effects due to
item wording. Journal of Research in Personality, 47, 406-416.doi:10.1016/j.jrp.2013.03.011
33

Satorra, A., & Bentler, P.M. (1988). Scaling corrections for chi-square statistics in covariance
structure analysis. ASA 1988 Proceedings of the Business and Economic StatisticsSection (308313). Alexandria, VA: American Statistical Association.
Satorra, A., & Bentler, P.M. (2001). A scaled difference chi-square test statistic for moment
structure analysis. Psychometrika, 66, 507-514.
Schmitt, D.P., & Allik, J. (2005). Simultaneous administration of the Rosenberg Self-Esteem Scale
in 53 nations. Journal of Personality and Social Psychology, 89, 623-642.doi:10.1037/00223514.89.4.623
Schwartz, S.H. (2006). Value orientations: Measurement, antecedents and consequences across
nations. In R. Jowell, C. Roberts, R., Fitzgerald & E. G (Eds.), Measuring attitudes crossnationally (pp. 169-203). London: Sage
Shevlin, M.E., Bunting, B.P., & Lewis, C.L. (1995). Confirmatory factor analysis of the Rosenberg
Self-Esteem Scale. Psychological Reports, 76, 707-710.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach's alpha.
Psychometrika, 74, 107-120.doi:10.1007/s11336-008-9101-0
Tafarodi, R.W., & Milne, A.B. (2002). Decomposing global self-esteem. Journal of Personality, 70,
443-483.doi: 10.1111/1467-6494.05017
Tafarodi, R.W., & Swann, W.B., Jr. (1995). Self-liking and self-competence as dimensions of
global self-esteem: Initial validation of a measure. Journal of Personality Assessment, 65, 322342.doi:10.1207/s15327752jpa6502_8
Tomás, J., & Oliver, A. (1999). Rosenberg's self-esteem scale: Two factors or method effects.
Structural Equation Modeling, 6, 84-98.doi:10.1080/10705519909540120
West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with non-normal variables:
Problems and remedies. In R. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and
applications (pp. 56-75). Thousand Oaks, CA: Sage.

Wylie, R.C. (1989). Measures of self-concept. Lincoln: University of Nebraska Press.

34

Table 1. Goodness of fit of alternative models for the RGSE(study 1) and cross-cultural invariance
of the best-fitting Model 9 (study 4).
Study 1: Goodness of fit of alternative models
χ2

df

CFI TLI

Model 1

7344.73*

35

.74

.67 .14(.13-.14)

200715.82

Model 2

3977.51*

34

.86

.82 .10(.10-.11)

196095.73

Model 3

9854.04*

34

.73

.65 .16(.16-.17)

200708.80

Model 4

3625.19*

34

.87

.83 .10(.09-.10)

195656.56

Model 5

1038.68*

25

.97

.95 .06(.05-.06)

191911.43

Model 6

3318.52*

25

.88

.79 .11(.10-.11)

195299.37

Model 7

2707.53*

30

.90

.86 .09(.08-.09)

194413.31

Model 8

3625.47*

30

.87

.81 .10(.10-.11)

195642.32

Model 9

483.40*

24

.98

.97 .04(.04-.05)

191493.54

RMSEA

AIC

Study 4: Cross-Cultural Invariance of Model 9
SBχ2

df

CFI TLI

RMSEA

13.52

24

1.00 1.02

.01(<.01-.01)

26.02

24

1.00 1.00

.01(<.01-.04)

35.12

24

.99

.98

.04(<.01-.06)

37.34*

24

.98

.97

.04(.01-.06)

Configural

142.17*

96

.993 .986 .026(.016, .035)

Metric

414.77*

147

.963 .954 .051(.045, .056)

280.74* 51 -.030

MetricPartial

268.74*

136

.984 .978 .037(.030, .044)

128.24* 40 -.009

Scalar

407.26*

147

.964 .955 .050(.044, .056)

178.83* 11 -.020

ScalarPartial

302.89*

144

.980 .974 .039(.033, .046)

40.69*

U.S
Serbia
Poland
Italy

ΔSBχ2

Δdf ΔCFI

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

8

-.004

Note.Model 10 was not-identified, it was excluded for further consideration.
* p< .05.
35

Figure 1. Alternative factorial models for Rosenberg General Self-Esteem Scale.

Note. PO = positively worded item; NE = negatively worded item; MFP = method factor associated
with positively worded items; MFN = method factor associated with positively worded items.
36

Figure 2. Unstandardized (in parentheses) and completely standardized loadings for Model 9 in the
first three studies

Note. Statistically significant (p< .05) coefficients are in bold and on solid lines; statistically not
significant (p> .05) coefficients are in italics and on dotted lines. Coefficients outside rounded
brackets are unstandardized; coefficients inside rounded brackets are completely standardized;
PO1 =I feel that I’m a person of worth, or at least on an equal plane with others; PO2 = I feel that I
have a number of good qualities; PO4 =I am able to do things as well as most other people; PO6 =
On the whole, I am satisfied with myself; PO7 =I take a positive attitude toward myself; NE10 = At
times I think I am no good at all; NE9 = I certainly feel useless at times; NE8 = I wish I could have
more respect for myself; NE5 = I feel I do not have much to be proud of; NE3 = All in all, I am
inclined to feel that I am a failure.
37

Figure 3. Predictive value of GSE, MFP, and MFN factors. All estimates presented in the figure are completely standardized

Note. Solid lines represent significant paths; dashed lines represent non-significant paths. S = self-rated; O = other rated; es-eo = error terms
associated with self-rated (es) and other rated items (eo).PO = positively worded item (self-rated); NE = negatively worded item (self-rated).GSE =
general self-esteem (self-rated); MFP = positive wording specific factor; MFN = negative wording specific factor. GPA = grade point average;
AGG = aggression; PRO = prosociality; DEP = depression. To simplify interpretation of direct effect of MFN, negative items were not reversely
scored. Items are indexed by their position in the scale.
38

Figure 4. Convergence of GSE, MFP and MFN across self- and other ratings. All estimates presented in the figure are completely standardized

Note. S = Self = self-rated; O= other rated; es = error terms associate with self rated items; eo = error terms associate with other rated items; PO =
positively worded item; NE = negatively worded item. GSE = general self-esteem; MFP = the specific factor associated with positively worded
items; MFN = the specific factor associated with negatively.
39