Sir Francis Galton, a cousin of Charles Darwin, developed the first tests designed to measure intelligence. Saltan, a naturalist and mathematician, was interested in individual differences. He invented the correlation coefficient (which plays such an important role in psychology) and developed the ideas behind fingerprints and eugenics.

Galton administered a battery of tests-measuring such variables as head size, reaction time, visual acuity, memory for visual forms, breathing capacity, and strength and hand grip-to over 9000 visitors to the London Exhibition in 1884.

His somewhat strange collection of tests reflected his belief that superior intelligence was accompanied by superior physical vigor. He undoubtedly was disappointed to discover that eminent British scientists could not be distinguished from ordinary citizens on the basis of their head size, and that strength of grip was not much related to other measures of intelligence. Galton’s tests did not prove very useful.

The intelligence test as we know it today was formulated by the French psychologists Alfred Binet (1857-1911)., The French government asked Binet to devise a test that would detect those children too slow intellectually to profit from regular schooling.

ADVERTISEMENTS:

He assumed that intelligence should be measured by tests requiring reasoning and problem solving, rather than perceptual-motor skills. In collaboration with Theodore Simon (1873-1961) another French psychologist, Binet published a scale in 1905, which he revised in 1908 and again in 1911. These Binet scales are the direct predecessors of contemporary intelligence tests.

1. Aptitude Tests:

What a person can do now and what he might do given appropriate training are not the same. We do not expect a premedical student to remove an appendix, or a preflight trainee to fly a jet. But, we do expect each to have the potential for acquiring these skills.

The distinction between a capacity to learn and an accomplished skill is important in appraisal. Tests designed to measure capacities, that is, to predict what one can accomplish with training are called aptitude tests; they include tests of general intelligence as well as tests of special abilities.

ADVERTISEMENTS:

Aptitude tests designed to predict performance over a broad range of abilities are called intelligence tests. Other aptitude tests measure more specific abilities, mechanical aptitude tests measure various types of eye-hand coordination; musical aptitude tests measure discrimination of pitch, rhythm, and other aspects of musical sensitivity that are predictive of musical performance with training; and clerical aptitude tests measure efficiency at number-checking and other skills that have been found to be predictive of an individual’s later achievement as an office clerk.

Many aptitude tests have been constructed to predict success in specific jobs or vocations. Since, the Second World War the armed forces have devised tests to select pilots, radio technicians, submarine crews, and many other specialists.

Aptitude is usually, measured by a combination of tests. Pilot aptitude tests include not only measures of mechanical knowledge but also, tests of spatial orientation, eye-hand coordination, and other skills.

A combination of tests used for prediction is known as a test battery. Scores from individual tests are weighted to get the best possible prediction. Scores on the tests that predict well count more than scores on tests that predict less well

ADVERTISEMENTS:

If an eye-hand coordination test predicts pilot success better than a spatial orientation test, scores in eye-hand coordination will be weighted more heavily than scores in spatial orientation.

2. Achievement Tests:

Tests that tell what one can do now are achievement tests. An intelligence test that predicts how well you will do in college is an aptitude test; examinations given’ at the end of a course to see how much you have learned are achievement tests. Both are ability tests.

Although, achievement tests are most commonly used in school and government examinations, they are also used to assess what has been learned in preparation for the practice of a specialty, such as law, medicine, or accounting

ADVERTISEMENTS:

The consequences of these achievement tests are very important the person who takes them, the successful candidate will receive a degree or a license to practice or an opportunity to enter a desired career the one who fails may find many paths blocked.

If the tests are in any way inappropriate, their use may lead to social injustice, It is crucial that examinations be well conceived so that they measure what they are intended to measure and their scores represent fairly the abilities of the candidate who takes the tests.

Psychologists are interested in the development of achievement tests for two reasons. First, there is much demand for such tests, especially in education and in government. Second, achievement tests furnish a standard against which to judge the predictive effectiveness of aptitude tests. To devise an aptitude test for pilot success, we first need a standard of excellent flying against which to measure the aptitude. Otherwise we have no way of checking predictions.

If professor’s assigned college grades whimsically instead of on the basis of student’s achievement the course, it would be futile to try to predict grades from an aptitude battery. Thus, achievement tests furnish a standard, or criterion, for the prediction of aptitudes. With improved achievement examinations, predictions can be made more efficiently. Of course, other criteria such as success in a job can be used. Then the measure of success serves as a measure of achievement.

ADVERTISEMENTS:

3. Binet’s Method. A Mental-Age Scale:

Binet assumed that a dull child was like a normal child but, retarded in mental growth; he reasoned that the dull child would perform on tests like a normal child of younger age. Binet decided to scale intelligence as the kind of change that ordinarily comes with growing older.

Accordingly, he devised a scale of units of mental age. Average mental-age (MA) scores correspond to chronological age (CA), that is, to the age determined from the date of birth. A bright child’s MA is above his CA; a dull child has an MA below his CA. The mental-age scale is easily interpreted by teachers and others who deal with children differing in-mental ability.

(a) Item Selection:

ADVERTISEMENTS:

Because, the intelligence test is designed to measure brightness rather than the results of special training, it must consist of items that do not assume any specific preparation. In other words, the intelligence test is designed to be an aptitude test rather than an achievement test, and it must be constructed accordingly.

There are two chief ways to find items for which success is uninfluenced by special training. One way is to choose novel items with which an untaught child has as good a chance to succeed as one who has been taught at home or in school.

Figure illustrates novel items in this particular case the child is asked to choose figures that are alike with the assumption that the designs are unfamiliar to all children.

The second way is to choose familiar items, with the assumption that all those for whom the test is designed have had the requisite prior experience to deal with the items. The following problem provides an example of supposedly familiar item:

This item is “fair” only for children who know the English language, who can read, and who understand all the words in the sentence. For such children, detection of the fallacy in the statement becomes a valid test of intellectual ability.

Many of the items on an intelligence test of the Binet type assume general familiarity. A vocabulary test, for example, appears in almost all the scales. Familiarity with the standard language of the test if necessarily assumed.

The intelligence test is in some respects a crude instrument, for its assumptions can never be strictly met. The language environment of one home is never exactly that of another, the reading matter available to the subjects differs, and the stress upon cognitive goals varies.

Even the novel items depend upon perceptual discriminations that may be acquired in one culture or subculture and not in another. Despite the difficulties, items can be chosen that work reasonably well.

The items included in contemporary intelligence tests are those that have survived in practice after many others have been tried and found defective. It should be remembered, however, that intelligence tests have been validated by success in predicting school performance within a particular culture.

(b) Item Testing:

It is not enough to look at an item and to decide that it requires intelligence to answer it successfully. Some “tricky” or “clever” items turn out to be poor because of the successes or failures that occur through guessing. More pedestrian items, such as matters of common information, sometimes turn out to be most useful. These are “fair” items if all have had a chance to learn the answers.

How did Binet and those who came after him know when they had hit upon a good item? One method of testing an item is to study the changes in proportions of children answering it correctly at different ages. Unless older children are more successful than younger ones in answering the item, the item is unsatisfactory in a test based on the concept of mental growth.

A second method of testing an item is to find out whether the results for it correspond to the results on the test as a whole. This can be done by correlating success and failure on the item with the score made on the remaining items. If all items measure something in common, then every single item ought to contribute a score that correlates with the total score.

These two requirements for an acceptable item (increases in percentage passing with age and correlation with total score) reflect both validity and reliability. The first requirement is an indirect way of guaranteeing validity being based on the inference that what we mean by intelligence should distinguish an older child from a younger one; the second requirement is a guarantee of reliability through internal consistency of the measures.

(c) Contemporary Binet Tests:

The tests originally developed by Binet underwent sew revisions in this country, the first by Goddard in 1011. For many years the best-known and most widely used revision was that made by Terman at Stanford University in 1916, commonly referred to as Stanford-Binet. The test was revised in 1937,1960 and 1972.

The procedure for testing in first to establish the child’s basal mental age, the mental-age level at which he passes all items. Two months of mental age are then added for each item passed at higher age levels. Consider, for example, the child who passes all items at the mental-age level of six years. If the child then passes two items at the seven-year level, four months are added passing an additional item at the eight-year level adds two more months.

This particular child will have an earned mental age of six years and six months, regardless of chronological age. The test allows for some unevenness in development, so that two children can earn the same mental age by passing different items on the test.

4. Intelligence Quotient (IQ):

Terman adopted a convenient index of brightness that was suggested by the German psychologist William Stern (1871-1938). This index is the intelligence quotient, commonly known by its Initials IQ. It expresses intelligence as a ratio of mental age to chronological age:

Chronologial age (CA) The 100 is used as a multiplier to remove the decimal point and to make the IQ have a value of 100 when MA equals CA. It is evident that, if the MA lags behind the CA, the resulting IQ will be less than 100: if the MA is above the CA, the IQ will be above 100.

How is the IQ to be interpreted? The distribution of IQs follows the form of curve found for many differences among individuals, such as differences in height; this is the bell-shaped “normal” distribution curve shown in Figure. In this curve most cases cluster around a Midvale, tapering off to a few at both extremes. The adjectives commonly used to describe the various IQ levels are given in.

In 1960 and subsequent revisions of the Stanford-Binet, the authors introduced a method of computing the IQ from tables. The meaning of an IQ remains essentially be same as before, but the table permit corrections to allow the IQ at any age to be interpreted somewhat more exactly. It is now arranged so that for each age the IQ averages 100 and has a standard deviation of 16?

A modem IQ is merely a test score adjusted for the age of the person being tested. It is therefore, no longer a “quotient” at all, but the expression IQ persists because of its familiarity and convenience.

5. Tests with More than One Scale:

Tests following the pattern originated by Binet use a great assortment of items to test intelligence, and a pass or a fail on one kind of item is scored the same as a pas of a fail on another. But, those who are skilled in the use and scoring of the tests learn much more than the final IQ.

They may not special strengths and weaknesses; tests of vocabulary, for example, may be passed at a higher level than tests o manipulating form boards. These observations lead to the conjecture that, what is being measured is not one simple ability but, a composite of abilities.

One way to obtain information on specific kinds of abilities, rather than a single mental-age score, is to separate the items into more than one group and to score the groups separately. The Wechsler Adult Intelligence Scale (described in Table) and the Wechsler Intelligence Scale for Children use items to those in the Binet tests but they divide the total test into two parts-a verbal scale and a performance scale-according to the content of the items. A performance item is one that requires manipulation or arrangement of blocks, beads, pictures, or other materials in which both stimuli and responses are nonverbal. The separate scaling of the items within one test is convenient for diagnostic purposes.

In general, the full scale (verbal and performance) and the verbal scale of the Wechsler Scales yield scores corresponding closely to the scores of the Stanford-Binet. The verbal scale of the Wechsler correlates .77 with its performance scale.