Psychology as a Natural Science. Part I: Measurement

I wished, by treating Psychology like a natural science, to help her to become one.

William James

The Problem

For more than a century, Psychologists have struggled to make their discipline a ‘proper science’.  From introspection, to behaviorism and then to cognitivism, Psychology has fallen somewhat awkwardly between the biological and social sciences. Suffering existential doubt, and always looking over their shoulders, Psychologists never quite found a place of comfort at the high table of Science. Contributing to this liminal status have been three issues, measurement, theory, and paradigm.

In this article, I discuss measurement in Academic Psychology. The branch of Academic Psychology that is usually held up to be the most ‘scientific’ is Psychometrics, otherwise known as ‘Psychological Measurement’. Bizarrely, it is also the largest thorn in the side of Academic Psychology considered as a science. I explain some of the reasons for this curious state of affairs below.

S. S. Stevens – “Mass Delusion”

Attributes of the physical world are measured quantitatively. Attributes of the psychological world are more ‘sticky’ to deal with. For good reason, psychologists are unable to measure many of the most interesting psychological attributes in any direct and objective manner. Unfortunately, measurement in Psychology is an ‘Emperor’s clothes’ story.  The early years as an infant science were spent paddling at the shallow end of the pool with attempts to make psychophysics and ability testing the showcases of a new quantitative science. But it was all downhill from there on.

In spite of limited successes, Psychology’s ‘measurement problem’ has never been satisfactorily resolved. S.S. Stevens’ Handbook of Experimental Psychology (1951) invoked ‘operationism’ as a potential solution and, since that time, Psychologists have assumed as an act of faith that measurement is the assignment of numbers to attributes according to rules. Sadly, Stevens’ solution is a mass delusion, a sleight of mind.

Joel Michell: “Thought Disorder”

Among his many in-depth writings about Psychological measurement, Joel Michell (1997) summarized the situation thus: “…establishing quantitative science involves two research tasks: the scientific one of showing that the relevant attribute is quantitative; and the instrumental one of constructing procedures for numerically estimating magnitudes. From Fechner onwards, the dominant tradition in quantitative Psychology ignored this task. Stevens’ definition rationalized this neglect. The widespread acceptance of this definition within Psychology made this neglect systemic, with the consequence that the implications of contemporary research in measurement theory for undertaking the scientific task are not appreciated…when the ideological support structures of a science sustain serious blind spots like this, then that science is in the grip of some kind of thought disorder.” (Michell, 1997).

A ‘kind of thought disorder’ – strong terms but it is true.

It is apparent that numbers can be readily allocated to attributes using a non-random rule (the operational definition of measurement) that would generate ‘measurements’ that are not quantitatively meaningful. For example, numerals can be allocated to colours: red = 1, blue = 2, green = 3, etc. The rule used to allocate the numbers is clearly not random, and the allocation therefore counts as measurement, according to Stevens. However, it would be patent nonsense to assert that ‘green is 3 × red’ or that ‘blue is 2 × red’, or that ‘green minus blue equals red’. Intervals and ratios cannot be inferred from a simple ordering of scores along a scale. Yet this is how psychological measurement is usually carried out.

Stevens’ oxymoronic approach aimed to circumvent the requirement that only quantitative attributes can be measured in spite of the self-evident fact that psychological constructs such as subjective well-being are nothing like physical variables (Michell, 1999, Measurement in Psychology). However, positivist psychometricians blithely treat qualitative psychological constructs as if they are quantitative in nature and as amenable to measurement as physical characteristics without ever demonstrating so. For more than 60 years many psychologists have lived in a make-believe world where ‘measurement’ consists of numbers allocated to stimuli on ordinal or Likert-type scales. This feature alone cuts off at its roots the claim that Psychology is a quantitative science on a par with the natural sciences.

Measurement can be defined as the estimation of the magnitude of a quantitative attribute relative to a unit (Michell, 2003). Before quantification can happen, it is first necessary to obtain evidence that the relevant attribute is quantitative in structure. This has rarely, if ever, been carried out in Psychology. Unfortunately, it is arguably the case that the definition of measurement within Psychology since Stevens’ (1951) operationism is incorrect and Psychologists’ claims about being able to measure psychological attributes can be questioned (Michell, 1999, 2002). Contrary to common beliefs within the discipline, psychological attributes may not actually be quantitative at all, and hence not amenable to coherent numerical measurement and statistical analyses that make unwarranted assumptions about the numbers collected as data.

Psychometric Myth

Psychometricians often make the precarious assumption that ordinal scales constitute a valid description of underlying quantitative attributes, that psychological attributes are measurable on interval scales.  Otherwise there can be no basis for quantitative measurement across large domains of the discipline. Michell (2012) argued that: “the most plausible hypothesis is that the kinds of attributes psychometricians aspire to measure are merely ordinal attributes with impure differences of degree, a feature logically incompatible with quantitative structure. If so, psychometrics is built upon a myth (p. 255).

This view is supported by Klaas Sijtsma (2012) who argues that the real measurement problem in Psychology is the absence of well-developed theories about psychological attributes and a lack of any evidence to support the assumption that psychological attributes are continuous and quantitative in nature.

Scientific Psychosis

A person with delusions of grandeur can be labeled as suffering from psychosis. But what if a whole discipline has delusions of grandeur? In this case the term ‘Scientific Psychosis’ would not seem inappropriate.

Using ordinal data as if they are interval or ratio scale data leads to incorrect inferences and false conclusions. Using totals and averages requires data to be on an interval scale. Performing parametric analyses on ordinal data can produce biased estimates of variances, covariances, and correlations and spurious interaction effects.

Yet these practices are regular, everyday occurrences in Academic Psychology. I am not talking about first year undergraduate lab classes. I am talking about people at all levels from illustrious professors at Harvard, Yale, Princeton, Oxford and Cambridge.  They not only regularly break the basic rules of measurement themselves on a wholesale basis, they negligently train their students to do it also.

If the received wisdom about measurement in Academic Psychology is characterised as mass delusional, thought disordered and confused, we have a serious problem, a very serious problem. And the problem seems to be getting worse. We can quite justifiably call this syndrome: ‘Scientific Psychosis’.

Thurstone: Ratio Scaling

To be consistent with its claim to be a science, psychologists must use measures that preserve the requirements of a ratio scale, namely, that there are meaningful ratios between measurements. For example, if you have a cold and took three paracetamol tablets today and four yesterday, you could say that the frequency today was ¾ or .75 of what it was yesterday. Measuring objects by using a known scale and comparing the measurements works well for properties for which scales of measurement exist. L L Thurstone (1927) used the method of pair comparisons to derive scale values for any set of stimulus objects with the Law of Comparative Judgement which states:

Screen Shot 2018-08-17 at 17.19.26

In his ‘Analytic Hierarchy Process’, Saaty (2008) also uses direct comparisons between pairs of objects to establish measurements for intangible properties that have no scales of measurement. The value derived for each element depends on what other elements are in the set.  Relative scales are derived by making pairwise comparisons using numerical judgments from an absolute scale of numbers (e.g. 0-9). Measurements to represent comparisons define a cardinal scale of absolute numbers that is stronger than a ratio scale.

Intuitive measurement is something that we take for granted in everyday life. However the way intuitive measurement works may be far from intuitive.  Consider how we are able estimate and compare magnitudes of objects, even when we have never actually seen these objects. For example, how do we compare the sizes of animals such as lions and hippos and judge which is larger or which is smaller? One theory of this process that appears to be especially accurate is described below.

Reference Point Theory

One theory of the estimation and comparison of magnitudes assumes there are implicit minimal and maximal reference points at the extreme ends of the distribution. As a special case of the Law of Comparative Judgement, the theory assumes that stimulus objects are represented by distributions with variances that increase with distance from the reference point contained in the question (Marks, 1972).

Screen Shot 2018-08-17 at 17.22.19

DM with JL PhDThis photo from 1969 shows the author and ‘subject’ with the basic apparatus and stimuli from Experiments 7 and 8 of the author’s doctoral research at Sheffield University, ‘An Investigation of Subjective Probability Judgements’.

Keith J Holyoak

In 2014, Reference Point Theory received strong empirical support from a team at UCLA under the leadership of Keith J Holyoak.  Keith is not only a Distinguished Professor but he is Editor of Psychological Review.  Chen, Lu and Holyoak (2014) present a model of how magnitudes can be acquired and compared based on BARTlet, a simpler version of ‘Bayesian Analogy with Relational Transformations’ (BART, Lu, Chen, & Holyoak, 2012). The authors concluded that Reference Point Theory provided the best fit to their data:

“BARTlet provides a computational realization of a qualitative hypothesis proposed four decades ago by Marks (1972)…The reference-point hypothesis implies that the congruity effect results from differences in the discriminability of magnitudes represented in working memory, rather than a bias in encoding (e.g., Marschark & Paivio, 1979) or a linguistic influence (Banks et al., 1975). BARTlet provides a well-specified mechanism by which reference points can alter discriminability in direct judgments of discriminability (Holyoak & Mah, 1982) as well as speeded tasks (p. 46).”

download-3

As well as being a Distinguished Professor at UCLA, and editing Psychological Review, Keith J Holyoak is also a poet and translator of classical Chinese poetry.  Kudos!

“The greatest scientists are artists as well.” (Albert Einstein).

“The greatest scientists are artists as well,” said Albert Einstein

Follow me at:

1 Comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.