In education, certification, counselling, and many other fields, a test or an exam (short for examination) is a tool or technique intended to measure students' expression of knowledge, skills and/or abilities. A test has more questions of greater difficulty and requires more time for completion than a quiz. It is usually divided into two or more sections, each covering a different area of the domain or taking a different approach to assessing the same aspects.
A standardized test is one that compares the performance of every individual subject with a norm. The norm may be established independently, or by statistical analysis of a large number of subjects.
The number of planets in the solar system is:
a) 7 b) 8 c) 9 d) 10
Test authors generally create incorrect response options, often referred to as distracters, which correspond with likely errors. For example, distracters may represent common misconceptions that occur during the developmental process. The construction of effective distracters is a key challenge that must be faced in order to construct multiple-choice items that possess strong psychometric properties. Well-designed distracters, considered in combination, can attract considerably more than 25% of the weakest students, so reducing the effects of guessing on total scores. The construction of such items may in some cases require some skill and experience on the part of the item developer.
A graph showing the functioning of a multiple-choice question is shown in Figure 1. The x-axis represents an ability continuum and the y-axis the probability of any given choice. The grey line maps ability to the probability of a correct response according to the Rasch model, which is a psychometric model used to analyse test data. The correct response in the example shown in Figure 1 is E. The proportion of students along the ability continuum who chose the correct response is highlighted in pink. The graph shows the proportion of students opting for other choices along the range of the ability continuum, as shown in the legend. The proportion of students at about on the scale who responded correctly to this item is approximately 0.1, which is below the proportion expected if students were purely guessing.
An attractive feature of multiple-choice questions is that they are particularly easy to score. Machines such as the Scantron and software grading of computer-based tests can be performed automatically and instantly, which is particularly valuable for situations where there aren't enough graders available to grade a large class or large-scale standardized test.
This format is not, however, appropriate for assessing all types of skills and abilities. Multiple-choice questions often create an overemphasis on simple memorization and deemphasize processes and comprehension, and they leave no room for disagreement or alternate interpretation, making them particularly unsuitable for humanities such as literature and philosophy.
Free-response questions do not pose as much of a challenge to the test author, but evaluating the responses is a different matter. Effective scoring involves reading the answer carefully and looking for specific features, such as clarity and logic, which the item is designed to assess. Often, the best results are achieved by awarding scores according to explicit ordered categories which reflect an increasing quality of response. Doing so may involve the construction of marking criteria and support materials, such as training materials for markers and samples of work which exemplify categories of responses. Typically, these questions are scored according to a uniform grading rubric for greater consistency and reliability.
At the other end of the spectrum, scores may be awarded according to superficial qualities of the response, such as the presence of certain important terms. In this case, it is easy for test subjects to fool scorers by writing a stream of generalizations, non sequiturs that incorporates the terms that the scorers are looking for.
A practical examination may be administered by an examiner in person (in which case it may be called an audition or a tryout) or by means of an audio or video recording. It may be administered on its own or in combination with other types of questions; for instance, many driving tests in the United States include a practical examination as well as a multiple-choice section regarding traffic laws.
Tests of the sciences may include laboratory experiments (practicals/laboratory sessions) to make sure that the student has learned not only the body of knowledge comprising the science but also the experimental methods through which it has been developed. Again, the use of explicit criteria is generally beneficial in the marking of practical examinations or performances.
Despite such issues, tests are less susceptible to cheating than other tools of learning evaluation. Laboratory results can be fabricated, and homework can be done by one student and copied by rote by others. The presence of a responsible test administrator, in a controlled environment, helps to guard against cheating.
Additionally, in some cases, high-stakes testing induces examinees to rise to meet the exam's high expectations. Generally, the term high-stakes is reserved for tests that are used as a basis for competitive entry into future courses, including tests which are highly weighted within selection criteria that are used for entrance into university courses.
The SAT has also been criticized for an alleged racial bias; ethnic minorities supposedly fare worse on the exam than they should. As a result, it began to fall out of favor in the late 1990s, with increasing emphasis on standardized tests that measure actual knowledge. Some of these replacements have likewise come from the College Board, but many states have taken the initiative to design tests of their own. The ACT examination, introduced in 1959 as a competitor to the SAT, also features more knowledge-based questions, and is accepted as an alternative to the SAT for admission to many United States colleges. Many colleges are also placing more emphasis on measures of long-term performance such as the high-school grade point average, the difficulty of classes taken in high school, and teacher letters of recommendation.
There are also other high-stakes exams at higher educational levels, like; Fundamentals of Engineering exam administered by National Council of Examiners for Engineering and Surveying (NCEES).
Airasian, P. (1994) "Classroom Assessment," Second Edition, NY" McGraw-Hill.
Cangelosi, J. (1990) "Designing Tests for Evaluating Student Achievement." NY: Addison-Wesley.
Grunlund, N (1993) "How to make achievement tests and assessments," 5th edition, NY: Allyn and Bacon.
Haladyna, T.M. & Downing, S.M. (1989) Validity of a Taxonomy of Multiple-Choice Item-Writing Rules. "Applied Measurement in Education," 2(1), 51-78.
Eksamen | Examen (evaluación estudiantil) | מבחן | 学力検査 | Eksamen | 測驗
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Test (student assessment)".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world