Construct validity: General notion of validity, but without empirical evidence.
Content validity: A test has content validity when its content tests all the language skills and/or structures it should be testing.
Criterion-related validity: A test has criterion-related validity if the results on the test agree with those provided by some independent and highly dependable assessment of the candidate’s ability.
Concurrent validity: A test has concurrent validity if the test and the criterion are administered at about the same time
Predicative validity: A test has predicative related validity if it can predict the future performance of candidates.
Face validity: A test has face validity if it looks as if it measures what it is supposed to measure.
Reliability coefficient: The reliability coefficient is the quantification of the reliability of a test; it allows us to compare the reliability of different tests.
Scorer reliability: A test has scorer reliability when it is possible to quantify the level of agreement given by same or different scorers on different occasions by means of scorer reliability coefficient.
Backwash is the effect that tests have on learning and teaching.
Practicality: A test is practical if it is easy, quick and cheap to administer, score, and interpret.
When a test have content validity its content constitutes a representative sample of the language skills, proper relevant structures, etc. with which it is mean to be concerned. If we want to know if a test has content validity, we need a specification of skills structures that it is mean to cover.
There are some components of test validity, first we have to consider, the grater a test’s content validity, so this is the more likely it is to be an accurate measure of what it is supposed to measure. And we have also to consider if the items in the test are representative of all structures and skills in question.
There are two kinds of criterion related validity. First, Concurrent validity: It is established when the test and the criterion are administered at about the same time; besides, it might revel a strong relationship between students’ performance on the test and supervisors’ assessment of the reading ability. Second, predictive validity: It concerns the degree to which a test can predict candidates’ future performance. We have to consider that is difficult to predict when there are so many factors such as intelligence knowledge, motivation, health etc.
One way to obtain evidence about the construct validity of our test is to investigate what test takers actually do when they respond to an item, so we can use two methods to gather this information: think aloud and retrospection. In the think aloud, test takers voice their thoughts as they respond to the item. In retrospection, they try to recollect what their thinking was as they respond.
Validity in scoring
If we want that our tests have validity we have to consider the manner that it has been done for example in a reading test may call for short written responses, if these responses take into account spelling and grammar, then it is not valid. By measuring more than one ability, it makes the measurement less accurate.
A test to face validity has to measures what is supposed to measure. If a test has to measure pronunciation but it doesn’t need that the student speaks then this exam might be thought to lack face validity. So to make tests more valid we have to write explicit specification for the test and whenever feasible, use direct testing; besides, make sure that scoring of responses relates directly to what is being taste and finally, do everything possible to make the test reliable.
To have reliability in a test we should consider two important components the performance of the candidates from occasion to occasion, and the reliability of scoring. Firstly we will consider the reliability coefficient, it is like validity coefficients because thy allow us to compare the reliability of different test. The ideal reliability coefficient test is 1.The test with a reliability coefficient of 1 is one which would give precisely the same results for particular set candidates regardless of when it happened to be administered.
It is possible to quantify the level of agreement given by the same or different scorers on different occasions by mean of scorer reliability coefficient. In the case of the multiple choice test, the scorer reliability coefficient would be 1. When scoring requires no judgment, and could in principle or in practice be carried out by computer, the test is said be objective. We must consider that if the scoring of a test is not reliable, then the test cannot be reliable either, so there is a relationship between test reliability and scorer reliability.
If we want to make test more reliable, we must consider the performance from candidates then these suggestions could help us to score reliability: Exclude items which do not discriminate well between weaker and strong students, do not allow candidates too much freedom, write unambiguous items, provide clear and explicit instructions, use items that permit scoring which is as objective as possible, Identify candidates by numbers, not name, etc.
Backwash is the effects that test have on learning and teaching; back wash is seen as a part of the impact a rest may have on learners and teachers, on educational systems in general, and on society at large. We have to ensure the test is known and understood by students and teachers because however good the potential backwash effect of a test may be, the effect will not be fully realized if students and those responsible for teaching don not know and understand what the test demands of them. It is important to give students specific comments (good and bad) and feedback during and after the test.
Counting the cost
One important and desirable quality of test which trips quiet readily off the tongue of many testers, after validity and reliability, is that of practicality. What will be the cost of not achieving beneficial backwash? The production and distribution of sample test and teachers training will also be costly. Some experts will argue, therefore that such procedures are impractical. When we compare costs, efforts and avoid the waste of time in activities that are not going to give us beneficial backwash effects. So we have to continue supplying information to increase test reliability in order to be more practical and more benefits.
Validity: for me validity is when the teacher puts in an exam all the knowledge that the students are supposed to know.
Reliability: is when he teacher makes an exam that is easy for students to understand. Also if a teacher applies that exam two times at the same person in two different occasions the student will have to had the almost the same result as the first time.
Practicality: is when the teacher makes an exam that is not to easy that the students will answer in 5 to 10 min or to hard that it will take 10 hours. It is supposed to be an exam according to the needs of the students.
Achieving beneficial backwash
It has some principals, one is that teachers should make oral tests but usually they don’t, it also says that the teachers should not focus only in the test because there are a lot of other material to assess; we as teachers need to focus in direct testing; teachers need to be careful with the activities they put in the test, they need to make sure that the activities are right for that level; we also need to make tests with and objective, for example, what they have achieved in a certain period of time; we also need to make sure that the test is understandable, that the students understand what they are going to do; for me the most important is that teachers should need to be making different tests as the years past, I hate when the teachers give the same tests to students year thru year.