How I improved AI-generated Python code by setting clear instructions

The prototype

I recently built a Python prototype in a hurry.

My main focus was simple: get the correct output.

I gave the agent the input specification, the expected output, and the data manipulation rules. What I did not define clearly enough were the implementation expectations or code quality boundaries.

At the time, that felt reasonable.

The goal was not to build a polished production system. The goal was to validate an idea quickly and create a minimum working application. The result was around 1,500 lines of Python that did what it needed to do.

A few days later, I came back to the code after we decided the solution was worth using.

That is when the problems became obvious.

What I found

The code worked, but the quality was poor:

1,500 lines in a single file, even though the code had clearly separable and reusable parts
Raw SQL embedded directly in the code, with values concatenated into queries instead of using parameter binding
Long functions, some over 300 lines, that were hard to read and even harder to test
String literals everywhere, with no clear grouping for related constants like table column names
Function and variable names that were either too vague or too short to explain intent

It was a good reminder: a working prototype can still carry a lot of design debt.

So I treated it as a refactoring exercise.

I created a more directed context for Codex, explaining what needed to change and how I wanted the code to be shaped. I used GPT-5.5 with medium reasoning, and the result was much better than the original version.

The useful part was not only the refactoring itself.

The useful part was realizing what guidance I should have given before generating the first version.

Why I created an instruction file

After that refactoring, I created an instruction file for future Python work.

Its purpose is simple: give the agent direction and boundaries before it starts writing code.

Because I usually prefer object-oriented code over large collections of functions, I also added some OOP guidance. I based part of that guidance on one of my favorite OOP books: 99 Bottles of OOP by Sandi Metz.

Not because every Python script needs classes.

But because when a prototype starts becoming a real tool, structure starts to matter.

The repository

I also extracted these lessons into a small public repository:

python-agent-coding-guidelines

The goal of the repository is simple: collect Python-focused coding guidelines that can be given to an AI coding agent before it starts generating code.

I do not see this as a perfect or final rule set. It is more like a practical starting point based on one concrete experience: an AI-generated prototype that worked, but needed serious refactoring before it became maintainable.

What the instruction file does

The instruction file gives the agent a few clear expectations before writing Python code.

It focuses on things I should have made explicit earlier:

Keep code modular instead of growing one large file
Prefer readable names that explain intent
Avoid long functions that mix too many responsibilities
Separate SQL from business logic where it makes sense
Use safe parameter binding instead of building SQL with string concatenation
Group related constants instead of scattering string literals everywhere
Prefer testable structure over quick inline solutions
Use object-oriented design when it helps express the domain more clearly

I also added guidance inspired by 99 Bottles of OOP by Sandi Metz, especially around small objects, clear responsibilities, and naming things by the role they play rather than by incidental implementation details.

The point is not to force every script into an object-oriented shape.

The point is to give the agent enough direction so it does not optimize only for “make it work” when the code may later become something people need to read, change, and trust.

Contributions are welcome

I published the repository because I think these instruction files get better when they are challenged by real usage.

If you have seen similar problems in AI-generated Python code, or you have rules that helped you get better results from coding agents, contributions are welcome.

This could be:

A new guideline
Better wording for an existing one
An example of a bad pattern and a better alternative
Suggestions for testing, packaging, SQL handling, or object design
Feedback on rules that are too strict or not practical enough

My goal is to keep the repository useful and grounded in real engineering work, not turn it into a long theoretical style guide.