Last active
July 18, 2024 21:51
Revisions
-
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 2 additions and 2 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,9 +64,9 @@ Build a personal toolkit that you can invest in and generate compound interest. - Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone. - Explore the expanding space of CLI LLM tools like Chatblade ## Allocate 10-20% to trying new, uncomfortable things Or more at first... we're all busy and have tickets to close, but, you'll never see the benefit of new tools if you don't deviate from your regularly scheduled program and try new ideas. We're in time of extreme acceleration in productivity but very little guidance to realize it, but the ROI is worth it both immediately and in the long-term. There's a lot of tools emerging all the time and they are getting better and better. Go beyond the big corporate products (Github Copilot etc) and look into smaller projects and tools as well which can be really useful. Just get started. ## Specific tips for Github copilot -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 4 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -64,6 +64,10 @@ Build a personal toolkit that you can invest in and generate compound interest. - Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone. - Explore the expanding space of CLI LLM tools like Chatblade ## Allocate 10-20% to trying new things Or more at first... we're all busy and have tickets to close, but, you'll never see the benefit of new tools if you don't deviate from your regularly scheduled program and try new ideas. There's a lot of tools emerging all the time and they are getting better and better. Go beyond the big corporate products (Github Copilot etc) and look into smaller projects and tools as well which can be really useful. Just get started. ## Specific tips for Github copilot - Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time. -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ # JW's accumulated tips for using LLMs as a dev These mostly came out of various workshops, talks, and conversations at the AI World's Fair SF 2024. For example, the GitHub Copilot sessions, and there were other sessions dedicated to developer productivity such as Manuel Odenhal's session ([link](https://github.com/go-go-golems/go-go-workshop/blob/main/2024-06-24%20-%20Workshop%20AI%20Programmer%20Handout.pdf)). I've aggregated the thinking on my own added more of my own thoughts as I've been incorporating more AI tools as a developer. ## What LLMs are good at -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 6 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -12,11 +12,13 @@ These mostly came out of various workshops, talks, and conversations at the AI W ## What they are not good at - They don't have strong analytical reasoning capability. - They miss details, especially as the length of input increases (and output, which becomes input for the next tokens). - Therefore they can't generate perfect, bug-free, secure code, which requires strong reasoning and ability to reliably catch all details. - This is also huge part of why "agents" (semi-autonomous AI actors that use LLMs for planned workflows and multi-step tasks) are extremely unreliable. Since they don't reason well and miss details, don't waste time expecting them to do that. Plan for it. How many devs are not using AI because they got a wrong answer? Aggressively generate and plan to throw away most of it, you still save time. This being said, how to get the best results? ## Use good prompts -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 24 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,20 +1,27 @@ # JW's accumulated tips for using LLMs as a dev These mostly came out of various workshops, talks, and conversations at the AI World's Fair SF 2024. For example, the GitHub Copilot sessions, and there were other sessions dedicated to developer productivity. I've since added more of my own thoughts as I've been incorporating more AI tools as a developer. ## What LLMs are good at - They are excellent at translation. Given input material in form X, produce output in form Y. - They are excellent at reproducing and remixing commonly seen coding tasks - They are great at generating feasible starting points for solutions given well-described, constrained problems. - They are helpful for assisting in the higher-level design process and architectural thinking behind software development through brainstorming, thinking things through out loud, raising points that are obvious in retrospect that you didn't think of, etc. Talking through your task with the LLM ahead of time can save a lot of time by surfacing issues up front especially if you give it good prompts (see good prompts below). ## What they are not good at - They don't have strong analytical reasoning capability. - As the length of input increases (and output, which becomes input for the next tokens), the more likely something critical in that context will be missed. - Unreliable reasoning and missed context details are a huge part of why "agents" (semi-autonomous AI actors that use LLMs for planned workflows and multi-step tasks) are extremely unreliable. - It also means they are bad at writing secure code. You can use LLMs to help brainstorm around security issues in code when directly prompted, but it's not reliable and will very happily hallucinate. This being said, how to use them well? ## Use good prompts Include: - Context: what's the task - Intent: what is your goal and purpose in mind - Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result. @@ -23,13 +30,15 @@ Use good prompts. Include: - Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc." - For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. (For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. [link](https://spectrum.ieee.org/chatgpt-for-coding)) ## Use regeneration and temperature - For any nonzero temperature, regenerate at least a little to see how much variation you're getting. A second or third result might do the trick the first didn't. It will also show you faults with your prompt and how you might sharpen it up. - Use low or zero temperature for most coding tasks especially for a very thorough prompt, when doing language domain translation tasks, etc. - Increase temperature for chat use cases and conversational utility - Higher temp is rare in dev but can be useful for speculative, creative, wild brainstorming tasks at the higher level when thinking through task or coming up with alternative solutions. ## Good human coding practices lead to better results from AI - Use good names and natural language in the code. The variable name "annual_revenue" is a better than "rev" because it invokes the LLM's finance domain knowledge. - Use functional programming. Smaller units of code that have no side effects are easier not just for humans but also for LLMs to reason about, write tests for, debug, etc., because they don't rely on, or mutate state in, distant unseen places. - Separate concerns, e.g. config from logic, presentation from control code, etc, allowing the LLM to focus on a single problem domain when possible. @@ -38,18 +47,23 @@ Good human coding practices lead to better results from AI: - Code comments that give examples of input/output and use cases are very helpful. - Generate test cases by asking the LLM to use an adversarial mindset. See "role statements" above. Have it act as an adversary and explicitly identify edge cases and write tests from that point of view. ## LLMs don't have to focus on just code in order to be useful for development - Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. Remember they are excellent translators, so how can you model a problem as one of language translation instead of code gen? - Have a bag of tricks for types and formats of the output you might ask for, and try a different format if you're not getting what you want - the results might usefully suprise you. For example, ask for a YAML file to represent a DSL over the problem, or code that generates code, or have it role play someone in your position or an end user of the code. - Ask the LLM to perform world simulation: you can ask it to act as your program or system architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events. ## Collect and iterate on small utilities/scripts Build a personal toolkit that you can invest in and generate compound interest. - If a task can reasonably be scripted, try generating it. Generate and throw away lots of small scripts, keep and iterate on the ones that are useful more than once. - Roll prompts you use often into scripts that take command line arguments - Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone. - Explore the expanding space of CLI LLM tools like Chatblade ## Specific tips for Github copilot - Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time. - Learn the chat commands including /explain, @github, @workspace and use them often. -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 3 additions and 3 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -19,7 +19,7 @@ Use good prompts. Include: - Intent: what is your goal and purpose in mind - Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result. - Specificity: be as specific as possible and state all expectations, known constraints, requirements, use cases, etc. - Examples help (aka one- or few-shot prompting) when possible. - Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc." - For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. (For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. [link](https://spectrum.ieee.org/chatgpt-for-coding)) @@ -40,8 +40,8 @@ Good human coding practices lead to better results from AI: LLMs don't have to focus on just code in order to be useful for dev: - Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. Remember they are excellent translators, so how can you model a problem as one of language translation instead of code gen? - Have a bag of tricks for types and formats of the output you might ask for, and try a different format if you're not getting what you want - the results might usefully suprise you. For example, ask for a YAML file to represent a DSL over the problem, or code that generates code, or have it role play someone in your position or an end user of the code. - Ask the LLM to perform world simulation: you can ask it to act as your program or system architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events. Build small utilities. Collect what works into personal toolkits that generate compound interest: - If a task can reasonably be scripted, generate, run, and throw away lots of small scripts, keep and iterate on the ones that are useful more than once. -
jwhiting revised this gist
Jul 18, 2024 . 1 changed file with 47 additions and 30 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,38 +1,55 @@ These came out of various workshops, talks, and conversations at the AI worlds fair SF 2024. For example, the GitHub copilot folks gave multiple sessions, there were other sessions dedicated just to developer productivity, etc. Understand what coding LLMs are good at: - They are excellent at translation. Given input material in form X, produce output in form Y. - They are excellent at reproducing and remixing commonly seen coding tasks - They are great at generating feasible starting points for solutions given well-described, constrained problems. - They are helpful for assisting in the higher-level design process and architectural thinking behind software development through brainstorming, thinking things through out loud, raising points that are obvious in retrospect that you didn't think of, etc. Talking through your task with the LLM ahead of time can save a lot of time by surfacing issues up front especially if you give it good prompts (see good prompts below). Understand what they are not good at: - They don't have strong analytical reasoning capability. - As the length of input increases (and output, which becomes input for the next tokens), the more likely something critical in that context will be missed. - Unreliable reasoning and missed context details are a huge part of why "agents" (semi-autonomous AI actors that use LLMs for planned workflows and multi-step tasks) are extremely unreliable. - It also means they are bad at writing secure code. You can use LLMs to help brainstorm around security issues in code when directly prompted, but it's not reliable and will very happily hallucinate. This being said, how to use them well? Use good prompts. Include: - Context: what's the task - Intent: what is your goal and purpose in mind - Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result. - Specificity: be as specific as possible and state all expectations, known constraints, requirements, use cases, etc. - For translation tasks (e.g take X and generate Y), examples help (aka one- or few-shot prompting) - Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc." - For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. (For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. [link](https://spectrum.ieee.org/chatgpt-for-coding)) Regeneration / temperature: - For any nonzero temperature, regenerate at least a little to see how much variation you're getting. A second or third result might do the trick the first didn't. It will also show you faults with your prompt and how you might sharpen it up. - Use low or zero temperature for most coding tasks especially for a very thorough prompt, when doing language domain translation tasks, etc. - Increase temperature for chat use cases and conversational utility - Higher temp is rare in dev but can be useful for speculative, creative, wild brainstorming tasks at the higher level when thinking through task or coming up with alternative solutions. Good human coding practices lead to better results from AI: - Use good names and natural language in the code. The variable name "annual_revenue" is a better than "rev" because it invokes the LLM's finance domain knowledge. - Use functional programming. Smaller units of code that have no side effects are easier not just for humans but also for LLMs to reason about, write tests for, debug, etc., because they don't rely on, or mutate state in, distant unseen places. - Separate concerns, e.g. config from logic, presentation from control code, etc, allowing the LLM to focus on a single problem domain when possible. - Be consistent. Generated code tries to follow your existing code style. - Docs and comments, even if AI-generated, provide context to future prompts over that code, not just for other devs. If the code is fully or mostly generated from a prompt, include that as a comment or docstring. - Code comments that give examples of input/output and use cases are very helpful. - Generate test cases by asking the LLM to use an adversarial mindset. See "role statements" above. Have it act as an adversary and explicitly identify edge cases and write tests from that point of view. LLMs don't have to focus on just code in order to be useful for dev: - Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. Remember they are excellent translators, so how can you model a problem as one of language translation instead of code gen? - Have a bag of tricks for types and formats of the output you might ask for, and try a different format if you're not getting what you want - the results might usefully suprise you. For example, ask for a YAML file to represent a DSL over the problem, or code that generates code. - Ask the LLM to perform world simulation: you can ask it to act as your program or solution architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events. Build small utilities. Collect what works into personal toolkits that generate compound interest: - If a task can reasonably be scripted, generate, run, and throw away lots of small scripts, keep and iterate on the ones that are useful more than once. - Roll prompts you use often into scripts that take command line arguments - Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone. - Explore the expanding space of CLI LLM tools like Chatblade Specific tips for Github copilot: - Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time. - Learn the chat commands including /explain, @github, @workspace and use them often. -
jwhiting created this gist
Jul 18, 2024 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,38 @@ These came out of various workshops, talks, and conversations at the AI worlds fair SF 2024. For example, the GitHub copilot folks gave multiple sessions, there were other sessions dedicated just to developer productivity, etc. Use good prompts. Include: - Context: what's the task - Intent: what is your goal and purpose in mind - Clarity: ambiguous language that can be interpreted many ways will generate misses. clearly define the desired result. - Specificity: be as specific as possible and state all expectations, known constraints, requirements, use cases, etc. - Role statements sometimes help: "act as a python programmer who job is to do X, thinks about Y, etc." - For code, treat it like a junior engineer who can reuse and remix what it has already seen, not a senior programmer with general purpose creative reasoning. Don't confuse the ability to remix examples in the data set for seniority. For novel coding challenges published after 2021 that aren't in ChatGPT's training dataset, research shows its performance drops massively. [link](https://spectrum.ieee.org/chatgpt-for-coding) Regeneration: - Setting temperature to zero can be very useful if the prompt is really well written, but sometimes you actually want to see variation. - For a nonzero temperature, regenerate often. A second or third result might do the trick the first didn't or show you how your prompt is unclear. Good coding practices lead to better results from AI: - Use good names and natural language in the code. The variable name "annual_revenue" is a better than "rev" because it invokes the LLM's finance domain knowledge. - Use functional programming. Smaller units of code that have no side effects are easier not just for humans but also for LLMs to reason about, write tests for, debug, etc., because they don't rely on, or mutate state in, distant unseen places. - Separate concerns, e.g. config from logic, presentation from control code, etc, allowing the LLM to focus on a single problem domain when possible. - Be consistent. Generated code tries to follow your existing code style. - Docs and comments, even if AI-generated, provide context to future prompts over that code, not just for other devs. If the code is fully or mostly generated from a prompt, include that as a comment or docstring. - Code comments that give examples of input/output and use cases are very helpful. - Generate test cases by asking the LLM to use an adversarial mindset. See "role statements" above. Have it act as an adversary and explicitly identify edge cases and write tests from that point of view. LLMs don't have to focus just code in order to be useful for dev: - Represent problems and code in intermediate states like DSLs, config YAML, pseudocode, etc, so that LLM i/o is on higher-level representations of your problem space. - How can you model a problem as one of language translation (LLMs do good at that) instead of code gen? LLMs excel at language translation. - On the other hand, sometimes using code output when you're otherwise not planning to can produce good results. You can even ask it to write code that writes code which might expose some interesting aspects of the problem space. Code is one language among many, LLMs are good translators, solving problems is sometimes done best in a different language than the final result. - Use roleplay in the form of a world simulation: ask LLMs to "act as" your program or solution architecture itself and report on its internal activity, state changes, logic flows, etc given inputs and events. Build small utilities. Collect what works into personal toolkits that generate compound interest: - If a task can reasonably be scripted, generate, run, and throw away lots of small scripts, keep and iterate on the ones that are useful more than once. - Roll prompts you use often into scripts that take command line arguments - Write scripts to extract context from your codebase to use in prompts (e.g. all your classes, function names, with docstrings, with arguments to filter it to particular domains if needed) so that the whole code structure or a particular domain can be included in prompts). Code copilots are doing this now, Aider for example is an OSS copiot that uses treesitter to analyze the codebase and provide a "map". You can use Aider for that purpose alone. Specific tips for Github copilot: - Autocomplete is designed to be fast to generate but has less context: just the current file and tabs open. Whereas inline chat pulls from the whole workspace into a larger context window. They might use different models and they change these over time. - Learn the chat commands including /explain, @github, @workspace and use them often.