A couple weeks ago my wife started probing ChatGPT’s abilities. An early foray suggested that it didn’t entirely appreciate climate bathtub dynamics. She decided to start with a less controversial topic:

If there was a hole that went through the center of the moon, and I jumped in, how long would it take for me to come out the other side?

Initially, it’s spectacularly wrong. It gets the time-to-distance formula with linear acceleration right, but it has misapplied it. The answer is wrong by orders of magnitude, so it must be making a unit error or something. To us, the error is obvious. The moon is thousands of kilometers across, so how could you possibly traverse it in seconds, with only the moon’s tiny gravity to accelerate you?

At the end here, we ask for the moon’s diameter, because we started a race – I was building a Vensim model and my son was writing down the equations by hand, looking for a closed form solution and (when the integral looked ugly), repeating the calculation in Matlab. ChatGPT proved to be a very quick way to look up things like the diameter of the moon – faster even than googling up the Wikipedia page.

Since it was clear that non-constant acceleration was wrong, we tried to get it to correct. We hoped it would come up with F = m(me)*a = G*m(moon)*m(me)/R^2 and solve that.

Ahh … so the gigantic scale error is from assuming a generic 100-meter hole, rather than a hole all the way through to the other side. Also, 9.8 m/s^2 is Earth’s surface gravity.

Finally, it has arrived at the key concept needed to solve the problem: nonconstant acceleration, a = G*M(moon)/R^2 (where R varies with the jumper’s position in the hole).

Disappointingly, it crashed right at the crucial endpoint, but it’s already done most of the work to lay out the equations and collect the mass, radius and gravitational constant needed. It’s still stubbornly applying the constant acceleration formula at the end, but I must say that we were pretty impressed at this point.

In the same time, the Vensim model was nearly done, with a bit of assistance on the input numbers from Chat GPT. There were initially a few glitches, like forgetting to reverse the sign of the gravitational force at the center of the moon. But once it worked, it was easily extensible to variations in planet size, starting above or below the surface, etc. Puzzlingly the hand calculation was yielding a different answer (some kind of trivial hand computation error), but Matlab agreed with Vensim. Matlab was faster to code, but less interactive, and less safe because it didn’t permit checking units.

I’d hesitate to call this a success for the AI. It was a useful adjunct to a modeler who knew what they were doing. It was impressively fast at laying out the structure of the problem. But it was even faster at blurting out the wrong answer with an air of confidence. I would *not* want to fly in a plane designed by ChatGPT yet. To be fair, the system isn’t really designed to do physics, but a lot of reasoning about things like the economy or COVID requires some skills that it apparently doesn’t yet have.