Why “vibe physics” is the ultimate example of AI slop

The most fundamental of all the sciences is physics, as it seeks to describe all of nature in the simplest, most irreducible terms possible.

Both the contents of the Universe, as well as the laws that govern it, are at the heart of what physics is, allowing us to make concrete predictions about not only how reality will behave, but to accurately describe the Universe quantitatively: by telling us the amount, or “how much” of an effect, any physical phenomenon or interaction will cause to any physical system. Although physics has often been driven forward by wild, even heretical ideas, it’s the fact that there are both

  • fundamental physical entities and quantities,
  • and also fundamental physical laws,

that enable us to, quite powerfully, make accurate predictions about what will occur (and by how much) in any given physical system.

Over the centuries, many new rules, laws, and elementary particles have been discovered: the Standard Models of both particle physics and cosmology have been in place for all of the 21st century thus far. Back in 2022, the first mainstream AI-powered chatbots, known as Large Language Models (or LLMs), arrived on the scene. Although many praised them for their versatility, their apparent ability to reason, and their often surprising ability to surface interesting pieces of information, they remained fundamentally limited when it comes to displaying an understanding of even basic ideas in the sciences.

Here in 2025, however, many are engaging in what’s quickly becoming known as vibe physics: engaging in deep physics conversations with these LLMs and (erroneously) believing that they’re collaborating to make meaningful breakthroughs with tremendous potential. Here’s why that’s completely delusional, and why instead of a fruitful collaboration, they’re simply falling for the phenomenon of unfettered AI slop.

This example of a feedward network (without backpropagation) is an example of a restricted Boltzmann machine: where there is at least one hidden layer between the input layer and the output layer, and where nodes are only connected between different layers: not between nodes of the same layer, representing a tremendous step forward in creating today’s AI/LLM systems.
Credit: The Nobel Committee for Physics, 2024

There are, to be sure, a tremendous number of things that AI in general, and LLMs in particular, are exceedingly good at. This is due to how they’re constructed, which is something that’s well-known but not generally appreciated. While a “classical” computer program involves:

  • a user giving an input or a series of inputs to a computer,
  • which then conducts computations that are prescribed by a pre-programmed algorithm,
  • and then returns an output or series of outputs to the user,

the big difference is that an AI-powered program doesn’t perform computations according to a pre-programmed algorithm. Instead, it’s the machine learning program itself that’s responsible for figuring out and executing the underlying algorithm.

What most people fail to recognize about AI in general, and LLMs in particular, is that they are fundamentally limited in their scope of applicability. There’s a saying that “AI is only as good as its training data,” and what this generally means is that the machine learning programs can be extremely powerful (and can often outperform even expert-level humans) at performing the narrow tasks that they are trained on. However, when confronted with questions about data that falls outside of what they’re trained on, that power and performance doesn’t generalize at all.

Based on the Kepler lightcurve of the transiting exoplanet Kepler-1625b, we were able to infer the existence of a potential exomoon. The fact that the transits didn’t occur with the exact same periodicity, but that there were timing variations, was our major clue that led researchers in that direction. With large enough exoplanet data sets, machine learning algorithms can now find additional exoplanet and exomoon candidates that were unidentifiable with human-written algorithms.
Credit: NASA GSFC/SVS/Katrina Jackson

As an example, you can train your AI on large data sets of human speech and conversation in a particular language, the AI will be very good at spotting patterns in that language, and with enough data, can become extremely effective at mimicking human speech patterns and conducting conversations in that language. Similarly, if you trained your AI on large data sets of:

  • images of Caucasian human faces,
  • images of spiral galaxies,
  • or on gravitational wave events generated by black hole mergers,

you could be confident that your artificial intelligence algorithm would be quite good at spotting patterns within those data.

Given another example of a similar piece of data, your AI could then classify and characterize it, or you could go an alternative route and simply describe a system that had similar properties, and the well-trained AI algorithm would do an excellent job of generating a “mock” system that possessed the exact properties that you described. This is a common use of generative AI, which succeeds spectacularly at such tasks.

With a large training data set, such as a large number of high-resolution faces, artificial intelligence and machine learning techniques can not only learn how to identify human faces, but can generate human faces with a variety of specific features. This crowd in Mauerpark, Berlin, would provide excellent training data for the generation of Caucasian faces, but would perform very poorly if asked to generate features common to African-American faces.
Credit: Loozrboy/flickr

However, that same well-trained AI will do a much worse job at identifying features in or generating images of inputs that fall outside of the training data set. The LLM that was trained on (and worked so well in) English would perform very poorly when presented with conversation in Tagalog; the AI program that was trained on Caucasian faces would perform poorly when asked to generate a Nigerian face; the model that was trained on spiral galaxies would perform poorly when given a red-and-dead elliptical galaxy; the gravitational wave program trained on binary black hole mergers would be of limited use when confronted with a white dwarf inspiraling into a supermassive black hole.

And yet, an LLM is programmed explicitly to be a chatbot, which means one of its goals is to coax the user into continuing the conversation. Rather than be honest with the user about the limitations of its ability to answer correctly given the scope of its training data, LLMs confidently and often dangerously misinform the humans in conversation with them, with “therapist chatbots” even encouraging or facilitating suicidal thoughts and plans.

Still, the success of LLMs in areas where they weren’t explicitly trained, such as in vibe coding, has led to people placing confidence in those same LLMs to perform tasks where their utility hasn’t been validated.

This graphical hierarchy of mathematical spaces goes from the most general type of space, a topological space, to the most specific: an inner product space. All metrics induce a topology, but not all topological spaces can be defined by a metric; all normed vector spaces induce a metric, but not all metrics contain normed vector space; all inner product spaces induce a norm, but not all normed vector spaces are inner product spaces. Mathematical spaces play a vital role in the math powering artificial intelligence.
Credit: Jhausauer/public domain

To be sure, the concepts of artificial intelligence and machine learning do have their place in fields like physics and astrophysics. Machine learning algorithms, when trained on a sufficiently large amount of relevant, high-quality data, are outstanding at spotting and uncovering patterns within that data. When prompted, post-training, with a query that’s relevant to such a pattern found within that data set, the algorithm is excellent at reproducing the relevant pattern and utilizing it in a way that can match the user’s query. It’s why machine learning algorithms are so successful at finding exoplanets that humans missed, why they’re so good at classifying astronomical objects that are ambiguous to humans, and why they’re good at reproducing or simulating the physical phenomena that’s found in nature.

But now we have to remember the extraordinary difference between describing and deriving in a field like physics. With a large library of training data, it’s easy for an LLM to identify patterns: patterns in speech and conversation, patterns that emerge within similar classes of problems, patterns that emerge in the data acquired concerning known objects, etc. But that doesn’t mean that LLMs are competent at uncovering the underlying laws of physics that govern a system, even with arbitrarily large data sets. It doesn’t mean that LLMs understand or can derive foundational relationships. And it doesn’t mean that LLMs aren’t more likely to “continue a conversation” with a user than they are to identify factually correct, relevant statements that provide meaningful, conclusive answers to a user’s query.

source  bigthink.com

Please Donate Below To Support Our Ongoing Work To Defend The Scientific Method

PRINCIPIA SCIENTIFIC INTERNATIONAL, legally registered in the UK as a company incorporated for charitable purposes. Head Office: 27 Old Gloucester Street, London WC1N 3AX. 

Trackback from your site.

Leave a comment

Save my name, email, and website in this browser for the next time I comment.
Share via
Share via