By: Gregory S. Karlovits P.E. P.H. C.F.M., DevOps Steering Committee Facilitator

Introduction

The Hydrologic Engineering Center (CEIWR-HEC) has been in the business of software development for decades. Technologies for developing, delivering, and supporting software have changed profoundly in that time, and HEC's practices have evolved. The collection of HEC tools has grown, as has our team of developers and engineers that support those tools. Although HEC tools are widely considered industry-standard tools in water resources engineering, we know we can always improve the quality of our code, the way we develop our tools, the experience for our users and developers, and support for the demands of the the U.S. Army Corps of Engineers' (USACE) mission and the greater profession. As adoption of HEC software continues to increase, so too does the demand for new features, bug fixes, technical support, and training on the use of those tools. This creates a positive feedback loop: more users means more requests for features, and new features and more training bring new users. However, keeping up with the current level of demand, given finite resources and staff, is unsustainable unless we change how we do business.

Christopher Dunn, our recently retired Director, issued the DevOps moonshot challenge to HEC at a town hall in August 2020. The vision was for a modernization of software development practices across HEC, driven by Center-wide collaboration towards a set of goals for adoption of DevOps principles. DevOps is a philosophy of software development that emphasizes automation, collaboration, and continuous feedback. The name "DevOps" is a portmanteau of "Development" and "Operations" and represents a tighter feedback between the software development and IT operations aspects of the software delivery process. Although there can be a heavy up-front investment in new tools and techniques to meet these goals, everyone sees the benefits - the development teams, our customers in the field, and the greater water resources community. The most noticeable impact is on development timelines: good adoption of DevOps principles means that changes to the code make it into the end-user's hands quickly, without sacrificing any quality. It is not just the features and bug fixes that we need to get out the door quickly, it is also the associated documentation, which means that more than just our code writing and software building strategy had to change. It was an entire philosophical shift - automate what you can, simplify the process of getting products to the field, and create value for our customers more quickly. There was also an emphasis on Center-wide agreement on the goals for this journey so that teams shared common expectations, and more importantly could share knowledge and experience to help each other. The DevOps Steering Committee (DOSC) was formed to serve as the integrated, collaborative environment where HEC's software teams could set their goals and help each other meet them.

Although we celebrated "mission accomplished" for our moonshot goal in September 2022, it did not mark the end of the DevOps journey. DevOps is about continuous improvement. Our first, successful step was to build a solid foundation and develop expertise, tooling, and a common vision that supported all the HEC software teams in their adoption of DevOps principles. We are not adopting DevOps just for the sake of it. Each of the incremental improvements our software teams invest in has a payoff, and using DevOps as a playbook for how to do software development better means all the software teams at HEC are aiming at the same target.

So how did we get to where we are today? And where are we planning to go?

The Wild West Days

HEC-HMS Transformation, 2017-2019

Each of HEC’s software teams had different paths to our collective moon landing in 2022, but I can offer some perspective on the HEC-HMS team’s motivations for adopting these technologies before the challenge was issued. I was fortunate to get involved in the DevOps journey early on. In February 2017, I joined the HEC-HMS team. Although I had several years of programming experience, I did not have software development experience, nor had I written code as part of a team before. Like most of the HEC team, I was an engineer that learned how to write code and am not a computer scientist. The HMS team was growing quickly, and the processes in place for developing code, fixing bugs, building the software, delivering it to the field, asking for feedback, and documenting the software, were not suitable for even our modest team size.

This became clear to me when I was tasked with investigating and repairing my first bugs in HMS. I had to figure out how to get access to the right version of the source code, get the project set up in an IDE (i.e., a code development environment), run HMS from the IDE so I could see the impact of my changes, test to make sure the fix was correct and I didn't break something else, then get my changes back into the master code. There was a pain point at every step, and it was clear to me that if more developers were going to be working in the codebase, all of these actions needed to be easier than they were. I knew there had to be a better way, but my lack of experience in real software development meant I could only throw stones, unless I learned what that better way was. I spent time on personal programming projects to learn and practice how these things were done in the Real World™ and try to figure out what we could do as a development team to make our own lives easier. It turns out that the developer experience is only one aspect of DevOps I was considering at that time.

The actual priority issue was getting bug fixes to the field quickly. A full release cycle for HMS was typically much longer than a year, and in the meantime, bugs reported by users were repaired and one-off builds of the software were sent for testing. The publicly available software still contained the bug. It was not until we posted a full release of HMS to the HEC website that the problem was solved for most users. This process needed to change. What could we do to get these fixes to the field faster? We repeatedly ran into several process bottlenecks that could not scale to a larger team working on more things in parallel and made getting a release out the door a stressful ordeal.

We tinkered and tested new processes and technologies to remove these barriers. Over a period of about a year, the HMS team made several changes to its development and release cycle. The biggest efforts follow:

  • The team switched to a more modern IDE that made it easier to adopt several new technologies.
  • Any HMS team member could use an automated build tool on their computer to generate a build of the software.
  • We started using unit testing to validate that updated and new computational code produced the expected results.
  • We moved the HEC-HMS source code to a more modern repository system that HEC hosted, and we switched to an industry-standard version control system to track changes to the code.
  • The HMS User’s Manual and other standard documentation was posted online using a tool that let us update it continuously which made it much easier to collaborate.
  • Once the rest of the infrastructure was in place, we stood up a server-based automated build tool that generated software builds with no user intervention.

On the programming side, our changes lowered the barrier for new programmers to contribute to the HMS codebase, improved our beta testing program by letting us make more frequent beta releases with more of the features under development available, and reduced stress at release time by having better-managed code and an easier path to a final build of the software. However, the most impactful change we made was moving our documentation online (HEC-HMS Documentation). The process of updating those documents prior to release was an ordeal, and now we were able to continuously integrate our changes into the documentation. Furthermore, this change also let us move all the workshop materials from our training classes online for anyone to use (see HEC-HMS Tutorials and Guides). Now when we get ready for training classes, we can make small updates to a living document online and point students to the most up-to-date version, which we can change even as the class is underway. Online documentation has truly revolutionized how we deliver our reference and training materials to the field, and I cannot imagine a world without it.

The HMS team kept forging forward, making changes to the way we delivered our software to the field. We were not the only team making these strides, but teams were not working together on a common strategy for improving their processes. We were all in different places, considering different technologies, tackling different priorities, and worst – not collaborating as thoroughly as we should have.

One question we (and likely other teams) kept facing was, "what happens if ‘it’ doesn't work?" "It" (a wide range of changes we were already making) already was working but were deviating from the norm in a big way, and there is always risk in early adoption. Several of us on the team spent personal time researching technologies and techniques and struggling through their implementation. We were personally invested in making our continuous integration/continuous (CI/CD) experiments a success for HEC-HMS, and other teams were on the same journey. How could we come together to elevate the state of the practice at HEC?

Going Center-Wide, 2019-2020

In 2019, the first seeds of the Center-wide DevOps movement were planted when Chris Dunn asked the former Water Management Systems Division Chief, Chan Modini, to draft a white paper on the state of software development at HEC. I was fortunate to be asked to take part in the small team supporting the draft paper. The aim was to get a baseline before we would start identifying goals and plans for the future. The "Software Modernization Team", with representation across the Divisions at HEC, started meeting in August 2019; and used surveys to collate current practices and technologies being used across the Center. The baseline clearly showed that the software teams were all in different places in the adoption of DevOps technologies and processes, and the landscape was continuously changing. HEC-HMS was not the only team adopting new technology and improving its processes across the Center; the decision for each team to move towards DevOps was entirely organic and grassroots. Some teams were better positioned to advance more quickly. They had more staff and funding to pursue it, while smaller teams lacked the resources to make big changes without help. The work of the Software Modernization Team identified that CI/CD technologies were available and being used in day-to-day work, and that there was in-house expertise to move the whole center forward. Beginning in mid-2020, the HEC management team took a deep dive into The DevOps Handbook to get everyone on the same page about the question, "Why DevOps?" At the August 2020 HEC town hall, Chris stated his goal was for the Center to move to a DevOps environment within 2 years. If the entire Center were to incorporate DevOps into their software development processes; then collaboration, open communication, and tech transfer was the way that teams could lift each other up to meet these goals. Here we stood, at the bottom of the mountain, looking up.

Landing On the Moon, 2021-2022

Center-wide adoption of DevOps practices was an unfunded mandate. However, the vision was to improve customer service, improve HEC products, and reduce stress on HEC developers, and we could all agree these were the right things to do. Five key strategic goals for DevOps adoption set out the roadmap for software development teams to meet the mandate.

  1. Make software builds routine and stress-free.
  2. Invest in HEC team development skills in project management, programming, building software, and software testing.
  3. Improve timeliness and quality of HEC products by incorporating continuous integration/continuous (CI/CD) practices into our workflows.
  4. Improve practices related to contractor development of code.
  5. Use data to drive decision making.

To help teams meet these goals, a cross-Center steering committee was formed. This relatively small leadership group was tasked with setting and monitoring implementation goals, fostering communication and collaboration, and most importantly facilitating tech transfer between the teams. The group's plan was to rely on internal knowledge and capability and share it freely with software teams to get everyone across the finish line. Chris Dunn signed the charter for the DevOps Steering Committee (DOSC) on March 17, 2021, and the kickoff meeting was held on April 14, 2021. The above strategic goals were a good outline of "what" needed to happen, but the software teams and the DOSC needed to figure out the "how". Then, it was up to the teams to implement these practices and technologies in their workflow.

The DOSC identified several measurable goals to give the software teams some milestones in meeting the strategic vision.

  • Host all source code locally using a common repository.
  • Increase the number of contributors to the software’s code.
  • Employ an automated build tool so anyone can build the software; use a CI/CD tool for delivery.
  • Improve code quality with repository documentation and code review processes.
  • Develop and implement automated tests.
  • Expand public beta testing programs.
  • Improve project management by using standardized issue tracking.
  • Make software documentation more accessible.

The first strategic goal of automating software builds had a lot of dependencies. A software build involves taking the source code and all the libraries and other resources it depends on and getting it into a state where the software is ready for the end-user. For most programming languages this involves activities like compiling the code, linking to libraries, generating an executable file, and so on. The goal was to automate this process so that changes to the code could reach our users faster, with minimal intervention, and done the same way every time.

Automating software builds and other DevOps goals required teams to adopt modern software development technologies. HEC chose a collection of tools by Atlassian that met the requirements, integrated well with each other, and were cost effective. Teams began moving their source code over to BitBucket or GitHub, tracking issues and software development with Jira, and moving documentation online using Confluence. Most development teams chose JetBrains’ TeamCity as their build management and CI/CD tool, and implemented automated build tools such as Gradle or Maven to automatically trigger software builds with new code commits, which were being tracked using the Git version control system. This stack of technology had an upfront cost, but it was a team effort to support each other in their adoption. Teams pitched in through the DOSC to help each other, had ad-hoc meetings to iron out wrinkles, and our HEC IT staff person (Darren Nezamfar) was there every step of the way to get teams off and running. For teams that made this migration, they could trigger an automated build of their software with every new commit to the codebase via Git, run all the unit tests and produce a pass/fail report, and if successful produce a new build. Anyone on the team could also manually set off a build at any time.

To increase the number of contributors to our code, we had to make it easier to on-board new team members and set up better environments for developing and testing code. It also meant providing better standards and guidelines for contributing, a more robust code review process, and more testing, both in the automated sense and through public beta testing programs. This cultural shift resulted in more people at HEC contributing to HEC software code than ever before, and we are now better positioned to work with other developers inside and outside of USACE.

Software documentation, tech transfer, and field support got a lift from our DevOps efforts as well. Teams began migrating their documentation online, which had an up-front cost of importing everything from the existing Microsoft Word® documents and then fixing the formatting, but the payoffs were immense: HEC software documentation receives hundreds of thousands of pageviews per year and is never out of date. Training materials were moved online as well, reducing preparation time for our training classes and enabling the general public to work through the same training materials they would get from an in-person course at HEC. We stood up Discourse, an online forum and knowledge base for gathering tech support questions from the field and making answers to past questions searchable. The amount of HEC software knowledge available to USACE districts and to modelers across the world has never been higher.

Regular check-ins at ongoing DOSC meetings on the DevOps goals served two main purposes: to track progress, but also to ensure that no teams were being left behind. Teams facing a headwind had an opportunity to ask questions and get help, to keep them on track to meet the 2-year implementation goal. By August 2022, HEC's software teams had either met the goals, were on the way to completing them, or had a solid plan and support to finish soon. A little bit like the dog that finally caught the car, we were left with the question of where to go next.

The Spirit of Jed Bartlet: What's Next?(1)

What is the DOSC today? It has evolved from its original incarnation as a smaller leadership group into a broad software developers' community within HEC. As our software development capabilities continue to improve and our in-house expertise grows, we have the ability to tackle bigger issues facing our teams and set bigger goals. I was appointed the new DevOps Steering Committee (DOSC) facilitator in November 2023, taking the baton from Richard Nugent who served admirably in the role previously. As facilitator, my main job is to structure productive conversations among the HEC software teams to generate achievable goals, technology transfer, and a sense of community in the practice of software development. The DOSC meets once a month, and all HEC staff are invited and welcome to attend. Representatives from each HEC software team serve as "voting" members, should there be a need for a formal vote based on requirements in the DOSC charter. 

The Committee serves as an open forum for teams wrestling with questions about software development. Moreover, our most important function is to help every team meet our DevOps goals. Every DOSC meeting begins with an open call for questions, and depending how big those questions are, they are either discussed during the current meeting or added to the agenda for the next one. Consequently, the greatest success comes from HEC software teams sharing what they know and helping each other reach their goals. Often, the prickliest questions become an agenda item at the next meeting and the group has a facilitated discussion to sort them out. Many of these questions are larger than a 45-minute discussion can solve, and the group decides on a path forward to continue working towards a resolution. The key is that it is a Center-wide discussion about tricky technical topics, and the DOSC is the brain trust to solve them. Recent discussions have covered topics like data interchange between HEC software, the state of HEC open-source projects, and the next generation of HEC tools.

Soon, we will be setting our FY 2025 goals. We still look to the five strategic goals as our compass, but we get to take a moment to take stock of where we are and be honest with ourselves about what we can do better. HEC would not exist without its customers, so their needs must be a priority. We also need to consider the health and welfare of the HEC team, and the goals we set must improve the experience for our customers in a sustainable way. In the field there might not be as much excitement about the process of software development as there is about shiny new features in our tools, but for HEC to continue to lead the way in water resources tool development, we must invest in our people and processes and support each other as a team to make the most of the resources we have. The DOSC is here to ensure that happens.

(1) President Bartlet - "What's Next?"