Reflecting on the article: “Why Scrum is awful for data science”. Rather than going through the article line by line and argue with the author, I write down how we ran DS, how we integrated with the rest of the company and why that worked pretty well for us.
TL,DR: You need some structure above a certain number of employees, but thoughtless adherence to scrum rules are counterproductive. Pick a workflow then optimise everything about it.
Our company was a market intelligence company providing market-moving news items for traders at Tier 1 investment banks and hedge funds. The staff was about 35 people: 15 engineers, 5 financial analysts, 5 data quality analysts and 5 data scientists (roughly), each team with a team leader. Large enough that you cannot sit around a table but small enough that “big place” operational methodologies (QBOs (quarterly business objectives), OKRs (objectives and key results)) would be an overkill.
To manage our goals, we needed to build a structure that was relatively easy to administer but enabled good cadence of results and still let us change if a change was needed.
The physical layout of the office was three small desks for the three smaller teams, and the engineers sit together on one side as well. This enables intra-team communication during the day. We also used Slack, JIRA and Confluence. There was a lot of seating shuffling around the tables to make sure that new joiners sit next to someone who can help them and also that people who work together sit next to each other. Well, at least we attempted, it’s cheap to swap to sets of monitors, and it makes a difference in terms of collaboration.
The DS team’s goal is to support the Financial Analyst’s effort to find market-moving pieces of information in a vast ocean of unfiltered data and deliver it to the clients (traders) who then exploit the advantage stemming from it at the markets. To achieve this, they work closely with the data quality analysts (DQA) and the FAs. The technical goal is to conduct very large scale analysis on raw text crawled from the internet and wrote deep learning models implementing various NLP (natural language processing) tasks on them.
As an agile organisation, our company evolved its product continuously. To achieve this, it was collecting pieces of feedback and insights about what matters from numerous sources, align these with business objectives and resources and adapt midterm plans to achieve those objectives.
The founders and investors set big picture business goals and progression plans. Parallel to this team-leads collect information from their area of concern: engineering issues from clients, bugs, feature requests, DS from recent analyses, data problems, financial analysts from their feel on the quality of the data they are observing. If an issue can be resolved internally in the team, it is up to the team-lead to decide if they inform the leadership. Think of a simple RACI matrix.
The cross-functional leadership decides together on what and how to prioritise, usually taking into account the best way to synchronise the different teams that are working based on entirely different paradigms: Engineers are creating and concerned about CI/CD and correctness. Financial analysts are working with news and economic theory, which is very fluid and hard to computerise but also very time-sensitive. The data scientists' work is experimental and hard to predict. The DQAs are “completionist”; their work can only be used when they are 100% done.
The synchronisation is an ongoing guessing game of what and when will be done, what can be done quickly and how to fill in the gaps with other relevant tasks, so no one sits idle.
The Scrum (which might not even be Scrum)
Rather than focusing on the dogmatic implementation of the paradigm, we kept ad-hoc pieces of it that worked, and even those parts should be taken with a pinch of salt. Prediction is too much of a guessing game when the priorities change all the time. A task might fell off the radar because of it is deprioritised based on estimation.
Focus is mainly on analysing the path of dependence to the final goal and the critical path that it entails. We work backwards from a deadline and assign priorities to tasks accordingly. This also informs resource problems much better as allocating more resource to a critical task might shorten the overall project. Think about, for example, the above mentioned DQA team that must finish data quality checks before they can report a dataset as “clear” for use. If change is required or a team fell behind, we review these and their effect on the critical path.
Multiple projects run at the same time to make sure that while one team is waiting for their task on the critical path to be enabled, they don’t run out of todos. A significant amount of time is reserved for “internal” jobs that don’t concern the whole of the company. It’s up to the team leader to judge how important these are and make sure they don’t impact any of the project’s critical paths.
As a project’s implementation phase reaching the finish line, the resources required to run it decrease. As we were a product-driven organisation, projects are never “done”. The product teams keep working on them but with less effort than at the implementation phase.
Leadership at the product meetings (more on this later) scope out new projects. They assess tasks that must happen before a go/no-go decision to map out new features/products/projects. These also include lower priority filling tasks which must be done at some point to enable the above high priority processes. There are multiple of these at the same time as well.
Because of the fluid nature of our operation, we adapted a company-wide one-week sprint schedule. Its purpose is primarily not to provide some uninterrupted time for the teams but to synchronise the flow of work across the company and give a “beat” to the whole organisation.
Sprint Planning Meeting
Each sprint starts on Tuesday morning with a team-specific sprint planning where the tasks are distributed and chopped into smaller tasks if necessary. They also schedule internal jobs if there is some free capacity. They review missed tasks, major problems so next time the team lead can inform the critical path analysis better about the capabilities of their team.
We found this Tuesday-Tuesday schedule better than a traditional Monday-Friday one, the extra day at the beginning of the new week after the weekend gave a fresh boost to finish the sprint. We also had (pre-COVID) a working from home day on Wednesday (can’t believe this was a big deal even six months ago) when everyone can give a good thought to the new tasks they just got.
Tuesday-Tuesday schedule enables that the sprint start and end meeting is the same one. Also, major projects are launched midweek rather than Friday so potential wrinkles can be ironed out in regular schedule rather than some poor soul on the weekend.
Sprint Planning Planning Meeting
Monday afternoon, the company leadership held the “Sprint Planning Planning (SPP)” meeting. During the day, each team member across the company updated a progress report (written in the previous sprint planning on Tuesday). From these, the team leaders update the company-wide sprint document and discuss potential delays and problems. At the meeting, they review if the issues affect the critical path and think over what are the implications. Some other project might be fast-tracked if the blocks delay the overall project.
Sprint Planning Planning meetings are very tactical, purely about the next one to three weeks, record what was done what should be done, what are the problems and move on. The strategic thinking happens on Thursday afternoon Product meetings; the same personnel attends this as SPP. Here we review and analyse larger-scale problems, usually based on reports and plans that a team made as part of their sprint. If new reports need to be done, tasks are created that can be prioritised in the next SPP. The project premises are reviewed as well as the ever-changing nature of market intelligence can shift priorities anytime. If a project’s critical path is affected (usually because one team didn’t finish a sprint task), we reviewed the causes of it and built in-house know-how on what went wrong and what should have been done to avoid this. This might mean that we underestimated the difficulty of something and next time we must allocate more time or resources to it. Tasks are not scheduled at the product meeting, just added to the company-wide sprint planning document to be prepared for next time.
Each day for each team starts with a standup (which are longer than at other places) where everyone reports on their current progress and problems. This enables new team members to make their voice heard in a public setting and also to learn about what others are doing at the moment and helps the dissemination of relevant knowledge. A designated notetaker records the meeting briefly, and their notes are publicly shared so others can see and also provide a diary of when certain topics were mentioned. Having a note taker also helps that person to get familiar with all the terminology the company and their team is using, which is very useful in onboarding, for example.
During the three years operating the above system, we found it beneficial from multiple sides. It is structured enough that creates a framework that we can align with but doesn’t ruin the pace of the company if something goes wrong. This is due to the critical path analysis and its regular updates. Rather than guessing priorities arbitrarily, priorities are naturally falling out of this system. It’s far easier to judge if something is important when you know that an entire team will be sitting on their hands if it doesn’t happen. Also, a lot of tasks are “embarrassingly parallel”, namely if we allocate more resources we can make it happen faster, this can be incorporated into the above.
Strict record-keeping is a relatively low effort but enormously helpful activity. Everyone can operate with the confidence that if we forget something, we could always go back and review what were our thoughts around that meeting.
Now that I am reading the entire text back, I realise that this is not related too much to Data Science, but I promise I write about that in the next issue…