GPT-3.5 vs GPT-4

02/04/2023

Recently, I've been experimenting with the two available versions of Chat-GPT. At the moment, access to the development and usage of external plugins is limited to a waitlist, which I'm yet to be invited to. In the meantime, I'll be delving into a captivating comparison of these AI counterparts, highlighting the intriguing distinctions I've discovered.

Variations in their responses

Since embarking on my fitness journey, I decided to pose three fitness-related questions to both versions of Chat-GPT. One question was open-ended, while the other two offered a choice between two options. Interestingly, each version provided distinct answers to the multiple-choice questions and employed different mathematical methods and perspectives when addressing the open-ended query.

Divergence in protein intake recommendations

I initially inquired about my daily protein consumption, and I have included the exact question and response here. GPT-3.5 - GPT-4

Both answers were backed by scientific evidence, and the recommended protein intake per kg varies slightly. However, upon conducting my own research, I found that GPT-4's suggestion of 1.8-2.0 grams per kg was more accurate. I also observed that GPT-4 consistently includes a disclaimer for such questions, advising users to consult an expert.

Upon seeking input from friends about the precision of the two responses, they unanimously agreed that GPT-4 provided a superior and more precise answer. Furthermore, GPT-4 remained focused on the query without making extraneous recommendations like "doing resistance training," which I had already mentioned in my question.
GPT-3.5 vs GPT-4
0 - 1

Addressing leg soreness and gym attendance

My second question was straightforward: I was experiencing severe leg soreness and wondered if I should go to the gym for a chest workout or not. The options were either to go to the gym or to skip it. Intriguingly, each version provided a different recommendation. Here are the exact question and responses: GPT-3.5 - GPT-4

I was taken aback, as it was the first time I felt GPT-3.5 had provided an incorrect answer. I conducted the same accuracy assessment with my friends, and we unanimously agreed that GPT-4 offered the correct advice. From a scientific standpoint, GPT-4's response aligns with my understanding as well (though, as a disclaimer, remember that you are reading an engineer's blog post, not a physician's).
GPT-3.5 vs GPT-4
0 - 2

Choosing between a 10km race and a half-marathon

For my final question, I asked whether I should participate in a 10km race or a half-marathon. Both versions provided their own distinct answers and approaches for the two options: GPT-3.5 - GPT-4

In this case, there was no clear winner, as I found both suggestions to be reasonable. However, I appreciated GPT-4's response for its motivational tone. When consulting my friends, their preferences were divided, with most favoring GPT-3.5's response. Ultimately, I decided to follow GPT-3.5's recommendation and opted for the 10km race, as it would be my first, and I wanted to take things one step at a time.
GPT-3.5 vs GPT-4
1 - 2

Conclusion

I'm curious whether the development of each version has stopped or if they are still doing tweaks. Should I pose the same questions again, would the responses differ? I might revisit the question in a few months to find out. For now, I generally prefer GPT-4 and I eagerly await the release of GPT-5. What are your thoughts?

Tags: thoughts