Site Reliability Engineer (SRE)
Our Client, a leading edge technology company, are looking to hire an experienced Site Reliability Engineer (SRE) to join their SaaS Operations team in Limerick .The successful candidate will support the delivery of the Gx SaaS solution, which primarily focuses on helping customers digitize and manage their validation, commissioning, and qualification processes.
The SRE will be responsible for ensuring the optimum performance and availability of the hosts SaaS platform. They will work in-conjunction with the DBA team to ensure the smooth operation of the solution, provide 3-rd level support to Customer Support team, and work closely with the Cloud Automation team to support the ongoing automation initiative.
The successful candidate shall report to the SaaS Operations Lead.
Some of the primary responsibilities of this role would include:
Work as a member of the Reliability Engineering team to build, administer and maintain the 24X7 production environment, staging/development environments and supporting infrastructure.
To be a point of contact for technical issues on a variety of components e.g. production systems, staging environment, development and deployment tools.
Continuously endeavour to improve the stability, performance and scalability of the platform that hosts the SaaS solution.
Collaborate with application engineering teams to solve business needs with provided cloud services.
Work with the Security function to ensure security best practises are followed across systems and that reporting metrics are available.
Required Skills/ Experience:
Minimum of 3+ years of experience in a Systems Administration/SRE/DevOps capacity.
Experience administrating of Windows and/or Linux based infrastructure in highly available environments.
Solid understanding and experience with AWS services and cloud operation concepts.
Web Server administration experience (IIS preferred).
Automation - Strong scripting experience (preferably Powershell & Python, but other scripting languages also considered).
Infrastructure-as-Code experience, preferably with Hashicorp Terraform and Packer
Working experience in configuration managements tools - Ansible
Containerisation and Kubernetes experience, preferably with Amazon EKS and ECR.
Understanding of TCP/IP, DNS, routing, VPN, load-balancing, SMTP.
Working Experience in supporting Patch Management.
Good understanding of DR Testing.
Working Knowledge of monitoring tools such as Prometheus, Grafana and ELK.
Degree or equivalent in a computing or engineering discipline.
Strong team player with a results-oriented track record.
Excellent written and verbal communication skills.
Self-motivated and enthusiastic with a continuous learning mindset.
Familiarity with DevOps technologies - CI-CD stacks, Git, Azure DevOps
SQL Server administration experience.
Knowledge of CIS security standards.
Previous experience of maintaining infrastructure in a regulated environment.
Experience of Agile / Kanban methodology.
Ability to have a positive impact on team members and communicate openly and directly to individuals or groups at all levels.
The Site Reliability Engineer will be expected to participate in the on-call support rota to provide 24/7 support for P1 issues in Customer environments.
For further information on this role please contact Amanda Duffy on (phone number removed) or email
Check out all our open jobs on our HERO Recruitment website