Will Online Exams Be Reduced Due to ChatGPT?

OpenAI’s ChatGPT technology has revolutionized AI with its advanced language processing capabilities. However, its widespread availability brings both benefits and risks, including the potential for misuse. Educators and researchers must be aware of these risks. Students may misuse ChatGPT by relying on it to complete assignments without understanding the concepts, leading to a lack of critical thinking skills, or by using it to plagiarize, which undermines the educational process. Researchers may misuse ChatGPT to manipulate or fabricate data, which can compromise research integrity and have significant consequences.

ChatGPT: The End of Online Exam Integrity?

We exceeded expectations just by turning up for the exams.
J.K. Rowling

As a teacher, I relied on online exams, particularly during the pandemic. However, with the advent of ChatGPT, online exams have become more challenging. While this technology has advanced language processing capabilities that can provide useful tools for learning, it also carries the potential for abuse. Students may attempt to cheat on exams by using ChatGPT to generate answers, which undermines the value of education and the learning experience.

Educators must remain vigilant and proactive in safeguarding the validity and reliability of online assessments. Ultimately, it is essential to preserve the integrity of online exams and ensure that students are evaluated based on their own knowledge and skills rather than their ability to manipulate technology.

As ChatGPT continues to develop, it has become increasingly proficient at answering complex questions and solving difficult math problems, posing a significant threat to the integrity of online exams. Here’s a list of exams the new technology has passed:

The Uniform Bar Exam

OpenAI reported that GPT-4 scored 298 out of 400, which is in the 90th percentile. Comparatively, GPT-3.5, the technology behind ChatGPT, scored in the 10th percentile on the bar exam. The passing score for the bar exam differs by state, but in New York, a score of 266, approximately the 50th percentile, is required to pass according to The New York State Board of Law Examiners.

The SAT

According to OpenAI, GPT-4 received a score of 710 out of 800 on the SAT Reading & Writing section, which is in the 93rd percentile. Meanwhile, GPT-3.5 received a score of 670 out of 800, ranking in the 87th percentile.

For the math section, GPT-4 earned a score of 700 out of 800, putting it in the 89th percentile, whereas GPT-3.5 scored in the 70th percentile, according to OpenAI.

Overall, GPT-4 scored 1410 out of 1600, while the average score for the SAT in 2021 was 1060 according to a report from the College Board.

GRE

GPT-4’s scores on the Graduate Record Examinations (GRE) varied significantly across different sections. OpenAI reported that it scored in the 99th percentile on the verbal section of the exam and in the 80th percentile on the quantitative section. However, it only scored in the 54th percentile on the writing test.

Similarly, GPT-3.5 received marks within the 25th percentile and 63rd percentiles for the quantitative and verbal sections, respectively, and also scored in the 54th percentile on the writing test, according to OpenAI.

USA Biology Olympiad Semifinal Exam

The USA Biology Olympiad is a highly-regarded national science competition that attracts some of the most talented biology students in the country. According to the USABO’s website, the first round is an open online exam that lasts 50 minutes and is taken by thousands of students nationwide.

The second round, called the Semifinal Exam, consists of a 120-minute test with three sections that include multiple choice, true/false, and short answer questions, as explained by USABO. The top 20 students who score the highest on the Semifinal Exam will advance to the National Finals, according to USABO’s guidelines.

OpenAI reported that GPT-4 scored in the 99th to 100th percentile on the 2020 Semifinal Exam for the USA Biology Olympiad.

AP Exams

GPT-4 has successfully completed several Advanced Placement (AP) exams, which are college-level exams that high school students can take and are administered by the College Board. The College Board assigns scores between 1 and 5, with a score of 3 or higher typically considered a passing grade.

According to OpenAI, GPT-4 received a score of 5 on the AP exams for Art History, Biology, Environmental Science, Macroeconomics, Microeconomics, Psychology, Statistics, US Government, and US History. It also scored a 4 on the AP exams for Physics 2, Calculus BC, Chemistry, and World History.

AMC Exams

The Mathematical Association of America administers the AMC 10 and 12 exams, which are 75-minute tests consisting of 25 questions covering various mathematical topics such as algebra, geometry, and trigonometry. These exams are typically taken by high school students.

According to the MAA’s website, the average score for the fall 2022 AMC 10 was 58.33 out of 150 total points, while the average score for the AMC 12 was 59.9 out of 150 total points. In comparison, GPT-4 scored a 30 on the AMC 10 and a 60 on the AMC 12, which places it between the 6th to 12th percentile for the AMC 10 and the 45th to 66th percentile for the AMC 12, as reported by OpenAI.

And many more.

Minimizing The Risks

As with any innovation, ChatGPT is accompanied by risks that need to be addressed. In order to minimize these risks and maximize the benefits, our education system needs to focus on integrating ethical and responsible use of these tools into the curricula. This would involve monitoring their impact on student learning and development, and ensuring that they are being used in an appropriate manner. Additionally, it would be wise to limit the use of online exams and return to physical exams whenever possible. While online exams offer a certain level of convenience, they also provide opportunities for cheating and undermine the value of education. Therefore, a balanced approach to the use of technology in education is necessary to ensure its effectiveness and sustainability.