What are The Four Levels of Training Evaluation?



Perhaps the best known training methodology is Kirkpatrick's Four Level Evaluation Model of reaction, learning, performance, and impact.

Level One - Reaction :

As the word implies, evaluation at this level measures how the learners react to the training. This level is often measured with attitude questionnaires that are passed out after most training classes. This level measures one thing: the learner's perception (reaction) of the course.

Learners are keenly aware of what they need to know to accomplish a task. If the training program fails to satisfy their needs, a determination should be made as to whether it's the fault of the program design or delivery.

This level is not indicative of the training's performance potential as it does not measure what new skills the learners have acquired or what they have learned that will transfer back to the working environment. This has caused some evaluators to down play its value. However, the interest, attention and motivation of the participants are critical to the success of any training program. People learn better when they react positively to the learning environment.

When a learning package is first presented, rather it be e- learning, classroom training, CBT, etc., the learner has to make a decision as to whether he or she will pay attention to it. If the goal or task is judged as important and doable, then the learner is normally motivated to engage in it. However, if the task is presented as low-relevance or there is a low probability of success, then a negative effect is generated and motivation for task engagement is low.

This differs somewhat from Kirkpatrick. He writes, "Reaction may best be considered as how well the trainees liked a particular training program" (1996). However, the less relevance the learning package is to a learner, then the more effort that has to be put into the design and presentation of the learning package. That is, if it is not relevant to the learner, then the learning package has to "hook" the learner through slick design, humor, games, etc. This is not to say that design, humor, or games are not important.

However, their use in a learning package should be to promote the "learning process," not to promote the "learning package" itself. And if a learning package is built of sound design, then it should be help the learners to fix a performance gap. Hence, they should be motivated to learn! If not, something went dreadfully wrong during the planning and building processes! So if you find yourself having to hook the learners through slick design, then you probably need to reevaluate the purpose of the learning program.

Level Two - Learning :

This is the extent to which participants change attitudes, improve knowledge, and increase skill as a result of attending the program. It addresses the question: Did the participants learn anything? The learning evaluation requires post-testing to ascertain what skills were learned during the training. In addition, the post-testing is only valid when combined with pre-testing, so that you can differentiate between what they already knew prior to training and what they actually learned during the training program.

Measuring the learning that takes place in a training program is important in order to validate the learning objectives. Evaluating the learning that has taken place typically focuses on such questions as:

1. What knowledge was acquired?

2. What skills were developed or enhanced?

3. What attitudes were changed?

Learner assessments are created to allow a judgment to be made about the learner's capability for performance. There are two parts to this process: the gathering of information or evidence (testing the learner) and the judging of the information (what does the data represent?). This assessment should not be confused with evaluation. Assessment is about the progress and achievements of the individual learners, while evaluation is about the learning program as a whole.

Evaluation in this process comes through the learner assessment that was built in the design phase. Note that the assessment instrument normally has more benefits to the designer than to the learner. Why? For the designer, the building of the assessment helps to define what the learning must produce.

For the learner, assessments are statistical instruments that normally poorly correlate with the realities of performance on the job and they rate learners low on the "assumed" correlatives of the job requirements. Thus, the next level is the preferred method of assuring that the learning transfers to the job, but sadly, it is quite rarely performed.

Level Three - Performance (behavior) :

In Kirkpatrick's original four-levels of evaluation, he names this level "behavior." However, behavior is the action that is performed, while the fined results of the behavior are the performance. Gilbert said that performance has two aspects - behavior being the means and its consequence being the end. If we were only worried about the behavioral aspect, then this could be done in the training environment. However, the consequence of the behavior (performance) is what we are really after - can the learner now perform in the working environment?

This evaluation involves testing the students capabilities to perform learned skills while on the job, rather than in the classroom. Level three evaluations can be performed formally (testing) or informally (observation). It determines if the correct performance is now occurring by answering the question, "Do people use their newly acquired learning on the job?"

It is important to measure performance because the primary purpose of training is to improve results by having the students learn new skills and knowledge and then actually applying them to the job. Learning new skills and knowledge is no good to an organization unless the participants actually use them in their work activities. Since level three measurements must take place after the learners have returned to their jobs, the actual Level three measurements will typically involve someone closely involved with the learner, such as a supervisor.

Although it takes a greater effort to collect this data than it does to collect data during training, its value is important to the training department and organization as the data provides insight into the transfer of learning from the classroom to the work environment and the barriers encountered when attempting to implement the new techniques learned in the program.

Level Four - Results :

This is the final results that occur. It measures the training program's effectiveness, that is, "What impact has the training achieved?" These impacts can include such items as monetary, efficiency, moral, teamwork, etc.

While it is often difficult to isolate the results of a training program, it is usually possible to link training contributions to organizational improvements. Collecting, organizing and analyzing level four information can be difficult, time-consuming and more costly than the other three levels, but the results are often quite worthwhile when viewed in the full context of its value to the organization.

As we move from level one to level four, the evaluation process becomes more difficult and time-consuming; however, it provides information that is of increasingly significant value. Perhaps the most frequently type of measurement is Level one because it is the easiest to measure. However, it provides the least valuable data.

Measuring results that affect the organization is considerably more difficult, thus it is conducted less frequently, yet it yields the most valuable information.

Each evaluation level should be used to provide a cross set of data for measuring training program.

The first three-levels of Kirkpatrick's evaluation - Reaction, Learning, and Performance are largely "soft" measurements, however decision-makers who approve such training programs, prefer results (returns or impacts). That does not mean the first three are useless, indeed, their use is in tracking problems within the learning package:

1. Reaction informs you how relevant the training is to the work the learners perform (it measures how well the training requirement analysis processes worked).

2. Learning informs you to the degree of relevance that the training package worked to transfer KSAs from the training material to the learners (it measures how well the design and develop- processes worked).

3. The performance level informs you of that the learning can actually be transferred to the learner's job (it measures how well the implementation process worked).

Note the difference in "information" and "returns." That is, the first three-levels give you "information" for improving the learning package. While the fourth-level gives you "impacts." A hard result is generally given in dollars and cents, while soft results are more informational in nature, but instead of evaluating how well the training worked, it evaluates the impact that training has upon the organization. There are exceptions. For example, if the organizational vision is to provide learning opportunities (perhaps to increase retention), then a level-two or level- three evaluation could be used to provide a soft return.

This final measurement of the training program might be met with a more "balanced" approach or a "balanced scorecard", which looks at the impact or return from four perspectives:


A measurement, such as an ROI, that shows a monetary return, or the impact itself, such as how the output is affected. Financial can be either soft or hard results.


Improving an area in which the organization differentiates itself from competitors to attract, retain, and deepen relationships with its targeted customers.


Achieve excellence by improving such processes as supply-chain management, production process, or support process.

Innovation and Learning:

Ensuring the learning package supports a climate for organizational change, innovation, and the growth of individuals.