DevOps Engineer for Ably's Distributed Platform (remote / London)

Job description

What makes Ably special?

Ably helps power next generation digital experiences through its distributed global messaging cloud-based platform. Ones which are live rather than static, where data is in motion rather than at rest. Things like live chat, realtime location tracking, live document collaboration, gaming and elearning. Read a recent blog post on the actual distributed systems problems we think about and work on each day.


What we can offer you

Working at Ably means you are working on a cutting-edge distributed platform that spans 20+ data centres, soon to be multiple clouds delivering billions of messages for developers. You will learn with the best. You will have autonomy and freedom to experiment and improve. You will be part of a dynamic team and a business that is growing rapidly.

 

Job description

You'll be responsible for maintaining and improving our global distributed infrastructure and services. You will be working alongside our deeply technical engineering team who collectively bring a wealth of experience and broad technology skills, and in time you will build the infrastructure management team internally. We are strong believers in using the right tools for the job when they exist. Where they don't, we've built a whole host of orchestration tools and shared services to help us deliver our global platform.  Within our infrastructure, everything is automated, covered by tests, completely replicable and ephemeral in design. The calibre of the infrastructure automation and services code, like our realtime service, is what excites us each day and motivates us each.  


If you enjoy solving hard architecture and infrastructure problems at tremendous scale, then you'll love working at Ably.  Our team is currently made up of a strong remote contingent, however our base is in London and growing. This role in our team can either be based from our London office or remote from somewhere in Europe (timezone and proximity for face-to-face is important).

 

Our infrastructure stack:

  • Mostly AWS based, but this will likely include other clouds in future. 
  • Infrastructure languages: Ruby, Bash.
  • Service languages: Go, Elixir, Node.js and some C.
  • Architecture: Exclusively Docker containers for all services, servers are effectively ephemeral and disposed of frequently, code is packaged as slugs, data centers (circa 20) are isolated and autonomous, critical shared services always have redundancy baked in, manual configuration of any infrastructure is disallowed (all changes are rolled out using source control, environment based configs and CLI commands).
  • Data services: Cassandra (our realtime datastore, 3 regions, 6 data centers), Influx, Elastic, Kibana, Grafana, etc.
  • Web site: We use Rails & Heroku for simplicity. The web service is not part of our "core product" and thus has lower uptime requirements.

See https://goo.gl/cDUirr and https://goo.gl/XDpmBi for a taster on the lengths we go to at each layer in the stack to ensure 100% service uptime.  See 

 

Day to day you can expect to be working on:

  • Writing Ruby code for our infrastructure automation, orchestration, configuration and continuous integration testing of our infrastructure.
  • Writing Go code for our core routing, worker and other shared services.  
  • Making extensive use of a wide range of AWS services. Whilst we primarily use AWS for our infrastructure, in time we expect that to change as we span other cloud services.
  • Managing and developing out our continuous integration services that test every aspect of the service, from infrastructure tools, to our health servers, routers, realtime services, protocol adaptors and client libraries.  Our CI environment is mature, yet we would like to continue to evolve our CI environments to help improve the robustness of the platform and reduce risk of regressions.
  • Being exposed to our other development environments such as Node.js and Elixir, both used extensively in our realtime services.
  • Working with the realtime engineering team to ensure our infrastructure supports the ever changing networking, security and processing requirements.
  • Collaborating with the team to design, discuss and implement new features and services.
  • Diagnosing and fixing bugs in all areas of our platform.  You will often be working at very low levels in the network stack to help diagnose difficult to identify distributed problems.
  • Work with the engineering team to enable them to take responsibility for the complete lifecycle of the features and code they deliver i.e. pull request, reviews, testing, deploy to staging and sandbox environments, then into production environments. We are strong believers in all developers being responsible for deploying their own code.
  • Contributing to open source projects that we support or use in our products.  All of our client libraries are open source as well and may require your support at times.
  • Helping customers solve problems they are experiencing that may help us find bugs in the platform.
  • Support the wider team in regards to documentation and customer support.
  • Suggestions for new features or improvements to our protocol and API specifications.

 

Benefits

  • Salary range: €40k to €70k.
  • Employee options: Yes, negotiable.
  • Holidays: 25+ days excluding national holidays.
  • This role can be remote or on-site in our London office. However, if you are working remotely, you will need to be in a European timezone so that we can communicate effectively during business hours, and you will need to be close enough to visit our office in London occasionally.  Our preference is to have a team member near enough to commute to our London office when necessary. You will benefit from a flexible working environment in which working from home and managing your own working hours sensibly is the norm. 
  • Work in an environment where code quality, technical challenges and delivery is what we all care about. 
  • Skills development is intrinsic in the job. We're largely working on unsolved problems each day, and such, there is plenty of scope to widen your knowledge and skillset.
  • Work with genuinely nice people who care.


**** NO AGENCIES PLEASE ****

Requirements

  • Experience: A minimum of a three years of professional experience with Ruby and Go. Our infrastructure automation and orchestration layer requires you to be proficient in Ruby. Our shared services and routing layers require you to be proficient in Go. You should have experience using both statically and dynamically typed languages. Experience with Node.js and Elixir/Erlang is beneficial. You must have solid experience managing infrastructure and CI environments, and any distributed or large scale infrastructure management is preferred. Understanding of distributed systems is beneficial.
  • Pragmatic: A problem solver excited by the prospect of working autonomously solve problems and bring solutions to the team.
  • Fast Learner: We’re looking for software engineers who thrive on applying their knowledge, learning new technologies.  Our stack is diverse, and we expect it to continue to grow.
  • Testing: Experience using testing frameworks and adoption of test driven development where applicable.
  • Communication: We use tools such as Slack throughout the day to communicate, however we believe in voice conversations to discuss and solve problems. You must be proficient in spoken and written English, be eager to collaborate with the engineering team and constructively welcome code reviews.
  • Customers: Comfortable talking to customers and assisting them with their technical issues and integration.
  • Open source: We prefer developers who have contributed back to the open source community, even if those contributions are small. 


**** NO AGENCIES PLEASE ****