Vibecoding and LLM 'Pedagogics'

Posted by:

  • Avatar of Konstantin

    Konstantin

LLMs require a human operator with deep understanding of a problem before they can generate useful code

I recently stumbled on this post - a good read that makes the point about being lured into a pitfall of believing that LLMs provide a shortcut around deep learning for their human operator.

Software development rarely begins by writing code. Usually, the first step is actually a loop (of sorts): understanding the problem, examining requirements, exploring edge cases, weighing architecture trade-offs, business constraints, maintenance costs, and the human impact of each decision.

Even when tackling a small ticket fix, the choice of what to code still depends on project knowledge, domain understanding, technology experience, and lessons learned over time etc.

In the "old days", before search engines became LLMs in disguise, the hardest part was often figuring out how to word a query. Naming the problem in a way that gets you the right results was itself a step towards finding the solution. In a way that hasn’t changed much - only now, humans are responsible for checking what the output of the LLMs for the things they got right and the things they got wrong.

The responsibility of verifying what the LLM got wrong continues to be with their human operators.

xkcd: Wisdom of the Ancients
Wisdom of the Ancients - https://xkcd.com/979/

In my own vibe coding journeys, I’ve discovered  that LLMs are most helpful when I have a deep understanding of the problem, and the solution. It’s really quite simple: the more I understand the problem, the better I can describe the solution, edge cases and approach in my prompt, leading the bot to generate a better solution.

Coming to a level of understanding before diving into writing code remains a key step in the development process. It’s also the reason why there is no magic substitute for deep knowledge of systems, patterns and tools. You, the human in the equation remain in control and are responsible for guiding the bot towards a good solution.

AIs don't innovate all by themselves. Hand them something novel or non-trivial and they’ll struggle. Together with a friend of mine, who spends a great deal of time using AVFoundation and other media APIs, we like to joke about how even Claude can't manage to correctly orient video frames from an iPad camera...


For those outside iOS: the reason why image and video orientation is hard is because AVFoundation is a very old and complex API which over the years has accumulated a lot of (not documented) edge cases. The iPad is a strange kind of device as it's not always easy to know if it's being held in landscape or portrait mode and if the user had "locked" the orientation of the device from Control Centre. So a lot of tiny details that need to play together in order to achieve an intuitive implementation.


In public repositories, we can see different approaches to solving this challenge. Some more complete than others - everyone codes with some constraints and goals in mind, there is no need to over-engineer something if it's not useful for a use case.

So if you prompt Claude to “fix the video orientation,” it will do something, not necessarily the same thing every time and not necessarily the way you expect. It has no way of knowing or understanding which approach is correct, or even how far you want it to go.

To get a good solution, you'd need to include (at least) some of the following:

  • The type of devices your app will be running, explicitly mentioning orientation modes you support.
  • That using the resolution of the device screen or video feed is not sufficient (it used to be OK when there were 3 models of iPhone, but today we have many flavours in different sizes).
  • Information on how the view will be updated with the video frames, at what rate and aspect ratio.
  • Define exactly how the video frames will be resized - is the view going to fit the frames or are the frames going to be adjusted to fit the view?
  • How is the layout going to be expressed in SwiftUI/AutoLayout
  • If you're planning features around that e.g. take a picture - where in the video pipeline you expect to obtain, scale and compress the image etc.
  • ... you get the idea

It's still up to us, the humans with technical and domain knowledge to provide the correct information in a prompt, to review the LLMs output and guide it towards next steps. Finally, it's also our responsibility (and there really is no way around that, sorry) to review the LLM's output and bugfix it before shipping :).

In other words, blindly trusting an LLM's output is the equivalent of copying the first solution you find on the internet without questioning if it’s the right one.

xkcd: Wisdom of the Ancients

Tags