Listed below is my response to this week’s discussion question. I look forward to your comments and questions.
As odd as this may sound, one of my favorite methods for testing is test-retest. As I have said before, this allows me to learn where I did and did not convey the concepts clearly when teaching. The same holds true for research. With test-retest, researchers have the ability to learn did the measure capture the construct that is being studied (
Statistics.com
, 2025b). Test-retest is given to the same group of participants at two different times and the scores are compared (Statistics.com, 2025b). The end goal of test-retest is to have a high correlation (Statistics.com, 2025b), thereby showing that the relationship between what is being studied and the participants’ understanding or ability to demonstrate what is being studied did not change over time (Statistics.com, 2025b). With regards to test-retest, this is how reliability is established (Statistics.com, 2025). In statistics, reliability seeks to understand if X occurs during this instance, what is the likelihood that x will occur again when re-administered (Statistics, 2005a).
With regards to Alternative-forms reliability, two different versions of the same test are given to participants (Statistics How
To.com
, 2025). In order for this method to be effective and bias free, each test has to have the same amount of questions and cover the same topics (Statistics How To.com, 2025). The participant’s scores are recorded and then compared. (Statistics How To.com, 2025). I admit I chuckled when reading about this because this is what I used to do to prevent cheating when I taught large classes and the students sat close together. One test would be labeled A and the other test would be labeled B. When I had 60 students in a classroom, I had a version C of the test. Every now and then I wrote, this is the correct response for test C and you have test B.
To test the reliability of raters a rubric could be utilized. The purpose of the rubric is to ensure that everyone involved in the grading understands the criteria being measured and how the criteria are being measured. Once the raters complete the rubric, the rubrics are collected and the results are calculated. If the results have similar results within the rubric, such as the raters all selected the same criteria within the rubric, the raters are shown to be consistent and have a “high inter-rater reliability” (Covidence, n.d., para.3), whereas a “low inter-rater reliability indicates that the raters have different interpretations or criteria for evaluating the same phenomenon” (Covidence, n.d., para.3). An example of this would be evaluating a student presentation, 3 people select good, 4 select very good, and 2 select excellent with regards to the student being prepared and able to talk about their research without reading from a powerpoint. The difference indicates that different people, even though written, still interpreted the criteria differently.
References
Covidence.com
. (n.d.). What is inter-rater reliability? Retrieved April 1, 2025 from,
https://support.covidence.org/help/what-is-inter-rater-
reliability#:~:text=Inter%2Drater%20reliability%20is%20a,a%20particular%20phenomenon%20or%20behaviour.
Statistics.com.(2025a). Reliability. Retrieved April 1, 2025 from,
Statistics.com. (2025b). Test-Retest Reliability. Retrieved April