Notes.md

Some principles and techniques I use in my teams at the moment. I hope these could be helpful for Technical Leads or Product Owners.

Addressing challenges with estimating

Software engineers enjoy solving problems with code. They value understanding why the thing they are working on is important.

For almost all non-trivial tasks, the full scope of dependencies, technical limitations and edge cases only reveal themselves during the development process. For this reason, I try to make the initial planning light touch and focus on the problem to solve. Accepting that we will start the development process before knowing everything can make estimating difficult. Forcing engineers to commit to specific estimates makes them feel that they cannot modify their approach if they learn something through the process of doing. Here are some reasons engineers can struggle with estimating and what can be done to help.

Estimating work that has too many unknowns

The product manager has been asked by the CEO to estimate how much it would cost to make the website run faster. Customers have been complaining that the website is slow to load.

This would be close to impossible for engineers to estimate how long it would take to improve the speed and by how much. There are several possible approaches, and knowing the impact of the approaches is impossible until you test the impact in production.

Rather than estimating how long something will take, set some goals for what you think you can deliver within a given timeframe. This gives the team flexibility to modify their approach as they learn through doing and keeps the team focused on the problem, rather than the solution.

In this scenario, using Objective Key Results(OKR) could be an option.

Example OKR

Objective: Reduce the time users wait for the website to load

Key Results:

1.0 - Load time reduced by 2 seconds ...the stretch goal
0.7 - Load time reduced by 1 second ...what we think we can achieve
0.1 - Load time reduced by 300 milliseconds ...what we are certain we can achieve

Estimating work that is too vague

In my experience, it is quite rare that a user story maps 1:1 with the engineering tasks required to achieve it. I have found that Behaviour Driven Development(BDD) can be a great technique to help engineers, product and design speak the same language.

User Story: As a Netflix customer, I want to see more details about devices connected to my account, so that I can make sure only authorized devices are accessing my subscription.

The above user story is fine. But we will need to break it down so the engineering team can reason about it.

BDD Scenarios

Scenario: User can view connected devices

GIVEN user has a Netflix Account
AND user is logged in
WHEN user views connected devices screen
THEN display all connected devices

Scenario: User can disconnect connected devices

GIVEN user has a Netflix Account
AND user is logged in
WHEN user views connected devices screen
THEN display all connected devices

AND each device has a "Disconnect" option
WHEN user clicks "Disconnect" for a specific device
THEN a confirmation dialog appears

WHEN user confirms the disconnection
THEN the selected device is removed from the list
AND a success message is displayed
AND the device list is updated in real-time

Scenario: System fails to get users connected devices

GIVEN user has a Netflix Account
AND user is logged in
AND the connected devices service is currently unavailable
WHEN user attempts to view connected devices screen
THEN an error message is displayed
AND the error message states "We're having trouble accessing your device information. Please try again later."

NB: For a large complex system, like Netflix, even the above BDD scenarios are too high level

Estimating work too early

Sometimes the work may be so complex, that we need to develop the architecture of the work before making decisions about effort or cost. For these tasks, I develop an Architecture Decision Record(ADR). This is usually developed by an Architect or Technical Lead and feedback is provided by the engineering team. The output of the ADR will help answer engineering questions about implementation.

Estimation anti-patterns

The urge to estimate work too early can cause teams to prematurely commit to solutions, simply to allow them to more easily estimate the effort or cost. This is an anti-pattern, as well as being anti-agile. Agile is about continuous learning and adaptation throughout product development. The engineering process is an integral part of this journey, not just an implementation phase.

Focusing too much on estimates can promote an unhealthy culture in the team. Engineers can choose suboptimal solutions, or ship technical debt just so they can meet an estimate. While occasionally compromising quality for deadlines may be necessary this should not become standard practice.

A focus on measuring engineer output solely through burndown charts can discourage engineers from collaborating on tasks and helping each other. This is because the completed work is usually only credited to the assigned engineer.

Helping engineers with technical debt

Technical debt happens. It's important to allocate time to address it.

Prioritise the technical debt

Not all technical debt is the same. It's important to understand which is important to fix, and which can be fixed later. For this, I use a simple mapping exercise with the team.

    |                                                                
    |                                        +no tests               
    |                                                                
    |                                                                
    |         +out-of-date library                                   
    |                                                                
    |                                                                
  I |                                                                
  M |                                                +slow api       
  P |                                                                
  O |                                                                
  R |                                                                
  T |                                                                
  A |                                                                
  N |                                                                
  C |                                      +flaky tests              
  E |                                                                
    |                                                                
    |                                                                
    |      +messy code                                               
    |                                                                
    |                                                                
    |                                                                
    |                                                                
    |                                                                
    |                                                                
    |                                                                
    -----------------------------------------------------------------
                                EFFORT

This makes it easy to see what technical debt we should tackle first.

Planning for technical debt

When developing new features, it is hard to understand the best way to design your code. This is because we are learning about the new system while we build it. Rather than engineers optimising their code early, based on use cases they think might exist, it is best to just make the thing work.

This undoubtedly leads to some technical debt where the code may be poorly optimised, repetitive, over-complicated or too simplistic. We bake in a 2-week clean-up after every big new feature where we can optimise the design of the code, with the benefit of having all the new knowledge we have acquired through building the service.

Learning from incidents

All products will eventually have an incident that causes a negative impact to users. The reliability of a service is usually just as important, if not more, than the user experience design. When issues happen, the engineering team is usually responsible for fixing the issue. Product Managers and Technical Leaders can learn from the root causes of incidents to help make their product better.

The causes of incidents are not:

Engineer Jane Doe made a mistake so it was her fault
The database just crashed

The root causes of incidents are issues with processes or flaws in the design of the system.

After incidents I run a Post Incident Review(PIR) with the team and use the 5-why method to identify root causes, and then actions we can take to mitigate the issue in the future(and make our product better).

 +------------------------------+
 |                              |
 | The site crashed             |
 |                              |
 +------------------------------+
                |                
                |                
 +------------------------------+
 |                              |
 | The database crashed         |
 |                              |
 +------------------------------+
                |                
                |                
 +------------------------------+
 |                              |
 | Bug in the database code     |
 | released                     |
 |                              |
 +------------------------------+
                |                
                |                
 +------------------------------+
 |                              |
 | We did not identify the bug  |
 | before releasing             |
 |                              |
 +------------------------------+
                                 
 Root Cause: We do not test the  
 database functionality before   
 releasing                       
                                 
 Action: Run automated tests     
 on the database before each release

Helping engineering teams with Quality Assurance

The engineer who writes the code is also responsible for verifying it meets the requirements and has adequate tests. Other engineers help in this process through pair programming and reviewing code. Sometimes we may have a change that has impacts across systems, or is such a large change that it requires significant manual testing to verify. In these scenarios, I use a QA Blitz.

In a QA Blitz we crowdsource colleagues to test the change. This can be set up by anyone. I start a document with some details about the change and testing steps. You can then assign different people to different test cases. You can aim to recruit people who have different devices, come from different geographical areas and have different levels of familiarity with the product.

Example QA Blitz

Aim: This feature adds an autocomplete dropdown to the search box to the website.

Test instructions:

Visit the website
Select the 'Search' search box
Expect to see autocomplete list of 7 results
Type a query
For queries with results, the autocomplete list should update to the relevant list of results
For queries with no results, no autocomplete list should be displayed
Navigate the list using the keyboard arrow keys
Select a result using keyboard 'enter' or mouse 'click'
Expect the result to now appear in 'Recent search history'
Select the 'Search' search box to open the autocomplete again
Remove all items from the 'Recent search history'
Expect the Autocomplete list to only contain autocomplete suggestions

Tester	Task	Result
Simon	Test on Chrome	Pass
Georgie	Test on Android phone	Pass
Jack	Test on iOS	Fail - Keyboard covers search input

simonschwartz/Notes.md