- Sandboxing for agents (Tavis)
- How people are safely sandboxing the tools they are using and shipping
- Security
- Managing many agents (Dan)
- How are people running lots of agents at once
- Orchestration
- Complex workflows (Alex)
- When to use more complicated workflows
- Does that actually work (e.g. multiple agents ones)
- Keeping agents on the rails (Agent harness session)
- How to keep a single coding session around
- Multi-persona prompting (Agent harness session)
- Dropping names/different personas into the LLM
- Debates about the architecture
- Critique the architecture with different personas
- Found that to be very helpful
- Sharing context with LLMs (Agent harness session)
- How to gather code and other context with LLMs while you're in a session
- Context and tool retrieval
- How do you make sure that it's actually using what it can
- Context retrieval, tools retrieval etc.
- Software verification (Ryan, Cameron)
- Awesome testing with LLMs
- AI psychosis
- More code than check (QA is the bottleneck)
- How do you break out of the loop
- How to build good
- Merge with preventing slop spread
- Always read the code, then again and again
- Do we still need to read it?
- Spec-driven, self-healing?
- Collective history/mining history (Scott)
- Way more code that can be reviewed
- Understanding which sessions that generate some code
- Review the process to trust the code
- How can you audit the histories, are you able to extract them etc. and process them
- What about a git for histories/conversations
- Claude has something
/insightsto get analysis from the last month
- How do you feel? (Caleb)
- Has been interviewing people's feelings when LLMs come
- Preventing skill atrophy/getting better (Josh)
- AI psychosis
- Addiction
- Also want to learn and understand things
- How do you enforce discipline and know what you're doing
- How to make sure you don't lose your job while still understanding
- How do you still level up
- OSS strategy for companies (Denis)
- What are expected changes for OSS
- Side projects?
- Companies that were planning on building something in the open
- Build vs. buy (Daniel)
- When to build and maintain software vs. just buying it
- This week: Status monitor whether something is down
- It sounds easy to build small things but what is the downside (maintenance)
- When do you just buy something? Or use a library?
- Super-specific local software (Matthew)
- He just wants to be a user
- Specific scope
- Only for one individual, not worth publishing
- Faucets vs. pipes (Caleb)
- Pipes are the "boring" stuff
- Consumers don't think about them (showers, toilets etc.)
- Are we investing in "pipes" (invisible) or "faucets" (visible)
- Incremental investments
- Hybrid deterministic vs. non-deterministic (Neil)
- Works on the billing side
- Billing in medical context
- LLM-trained transcription with a doctor
- Find out what is billable
- Some parts are highly deterministic (e.g. you can't bill for one thing but not both)
- How do you extract this from complex conversations
- How do you combine rules that must be followed but can't be followed by LLMs with LLMs
- Bare minimum for AI-powered software (Sal)
- Before LLMs it was databases, memcache etc. to scale
- What's the equivalent of this for LLMs
- Speeding up hybrid retrieval
- Balancing datastructures like trees
- How can you balance the environment that agents are running in continuously
- Operating software with AI (Riley)
- Lots of things about building
- Little things about operation/"ops"
- AI to do the managing of running software in production
- AI in high-risk domains (John)
- EE company, HV work (230kV)
- High risk things — AI is completely untrustworthy
- Real physical things can go wrong
- Using LLMs for troubleshooting
- Coming up with solutions outside of it
- If things were isolated and sandboxed it might be different
- Starting to get into it and learning what's happening
- Evals (Caleb)
- How do you know you get decent quality
- How do you know how they are improving rather than just changing
- Amplifying quality
- When shipping from deterministic to probabilistic features
- QA?
- Building a company hivemind/using it as a team
- Privacy vs. sharing of information are at odds
- Monoliths vs. microservices
- How companies and teams are using things as a team
- Second brain
- Notetaking pattern
- Markdown repo
- MCP
- Useful in some contexts
- Hugo codebase written in an old codebase
- MCP helps get more value out of it
- Research skill design (Carl)
- How do you get LLMs to do research
- Squad tools / PAW
- Short demo of new tools
- PAW (phase agentic workflow) project
- 5 different agents who can write a goal
- BuildStream (Fel)
- Reverse engineering retro games (Sadie)
- 5 minute heads up