DevOps experience from realPad

Follows a list of DevOps-style problems we had in realPad, along with solutions to each of them.

  • Problem: we need to test new functionality before we deploy it to our customers.
  • Solution: 2 almost identical environments, production and testing, created together at the beginning.
  • Problem: developers need to know when something crashes on the server.
  • Solution: Log4j2 configured to send errors / exceptions by mail. In 98% of the cases, seeing a log message and stacktrace is enough to identify the root cause of the problem. We plan to deploy the same thing for JavaScript on the frontend.
  • Problem: BitBucket is down.
  • Solution: nevermind, Git is a distributed source control system, meaning every developer has a “full” copy of the repository. It’s enough if the build system can be run locally.
  • Problem: we forgot to renew a domain. Yes this happened :) It was not directly, but for some historic reason, the DNS was directed through this other domain.
  • Solution: keep a clear DNS routing scheme, every expiration needs to be in a team calendar.
  • Problem: sysadmin goes on vacation.
  • Solution: on a vanilla Linux system I can do 80% of the configuration necessary to get realPad running (once I got it running on Raspberry Pi!). With the help of my colleagues, this goes up to 95%. Devs should be DevOps! :)
  • Problem: a new version of the system broke several API endpoints, iOS, Android and web clients could not work correctly.
  • Solution: just like key parts of the code should have their automated tests, the whole API should be checked (at least superficially) by a tool like Runscope.
  • Problem: hosting goes down. As in, we lose our virtual servers and cannot access them anymore.
  • Solution: after the necessary recovery (we lost a few days worth of data), we started taking hourly database backups, and we now rsync all the files uploaded to our system to a NAS in… well, the same city, but you get the point.