JW's accumulated tips for using llms as a dev

These came out of various workshops, talks, and conversations at the AI worlds fair SF 2024. For example, the GitHub copilot folks gave multiple sessions, there were other sessions dedicated just to developer productivity, etc.

Understand what coding LLMs are good at:

They are excellent at translation. Given input material in form X, produce output in form Y.
They are excellent at reproducing and remixing commonly seen coding tasks
They are great at generating feasible starting points for solutions given well-described, constrained problems.
They are helpful for assisting in the higher-level design process and architectural thinking behind software development through brainstorming, thinking things through out loud, raising points that are obvious in retrospect that you didn't think of, etc. Talking through your task with the LLM ahead of time can save a lot of time by surfacing issues up front especially if you give it good prompts (see good prompts below).

Understand what they are not good at:

They don't have strong analytical reasoning capability.
As the length of input increases (and output, which becomes input for the next tokens), the more likely something critical in that context will be missed.
Unreliable reasoning and missed context details are a huge part of why "agents" (semi-autonomous AI actors that use LLMs for planned workflows and multi-step tasks) are extremely unreliable.
It also means they are bad at writing secure code. You can use LLMs to help brainstorm around security issues in code when directly prompted, but it's not reliable and will very happily hallucinate.

This being said, how to use them well?

Use good prompts. Include:

Context: what's the task
Intent: what is your goal and purpose in mind
Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result.
Specificity: be as specific as possible and state all expectations, known constraints, requirements, use cases, etc.
For translation tasks (e.g take X and generate Y), examples help (aka one- or few-shot prompting)
Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc."
For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. (For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. link)

Regeneration / temperature:

For any nonzero temperature, regenerate at least a little to see how much variation you're getting. A second or third result might do the trick the first didn't. It will also show you faults with your prompt and how you might sharpen it up.
Use low or zero temperature for most coding tasks especially for a very thorough prompt, when doing language domain translation tasks, etc.
Increase temperature for chat use cases and conversational utility
Higher temp is rare in dev but can be useful for speculative, creative, wild brainstorming tasks at the higher level when thinking through task or coming up with alternative solutions.

Good human coding practices lead to better results from AI:

Use good names and natural language in the code. The variable name "annual_revenue" is a better than "rev" because it invokes the LLM's finance domain knowledge.
Use functional programming. Smaller units of code that have no side effects are easier not just for humans but also for LLMs to reason about, write tests for, debug, etc., because they don't rely on, or mutate state in, distant unseen places.
Separate concerns, e.g. config from logic, presentation from control code, etc, allowing the LLM to focus on a single problem domain when possible.
Be consistent. Generated code tries to follow your existing code style.
Docs and comments, even if AI-generated, provide context to future prompts over that code, not just for other devs. If the code is fully or mostly generated from a prompt, include that as a comment or docstring.
Code comments that give examples of input/output and use cases are very helpful.
Generate test cases by asking the LLM to use an adversarial mindset. See "role statements" above. Have it act as an adversary and explicitly identify edge cases and write tests from that point of view.

LLMs don't have to focus on just code in order to be useful for dev:

Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. Remember they are excellent translators, so how can you model a problem as one of language translation instead of code gen?
Have a bag of tricks for types and formats of the output you might ask for, and try a different format if you're not getting what you want - the results might usefully suprise you. For example, ask for a YAML file to represent a DSL over the problem, or code that generates code.
Ask the LLM to perform world simulation: you can ask it to act as your program or solution architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events.

Build small utilities. Collect what works into personal toolkits that generate compound interest:

If a task can reasonably be scripted, generate, run, and throw away lots of small scripts, keep and iterate on the ones that are useful more than once.
Roll prompts you use often into scripts that take command line arguments
Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone.
Explore the expanding space of CLI LLM tools like Chatblade

Specific tips for Github copilot:

Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time.
Learn the chat commands including /explain, @github, @workspace and use them often.

jwhiting/jw-dev-llm-tips.md