Site Reliability Engineer (Platform Tribe)

Продукт
Є медстрахування
Есть Equity
Startup
CTO / Architect
Back-End
DevOps / UnixAdmin
Java
Python
Ruby
Scala
Python

03.05.23

Who are we?

Preply is a fast growing product company at an early stage of development, backed by Europe’s most prominent investors. 270+ people are currently working on building a global human-to-human online tutoring marketplace with locations in Kyiv and Barcelona.

We have more than 37 nationalities on board, that work in small cross-functional teams to continuously improve and scale user experience. We offer remote working days and the possibility to use our platform for self-development. We challenge each other to learn more and faster, while promoting creative power and free will. Personal growth, a friendly and hierarchy-free atmosphere are guaranteed!

Openness, effectiveness and global mindset are the main characteristics of our people. Our goal is to make Preply a leading platform in online tutoring that will help people from all over the world achieve their life goals faster!

Join our Platform Tribe!

SRE role at Preply combines software development, operations and business skills to run large-scale, fault-tolerant, global language education platform.

SRE ensures that Preply systems — have reliability, uptime appropriate to business's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system. This person is expected to work on core parts of our platform and help us to meet the challenges of growing the organization in terms of both traffic and the number of developers.

While we have the DevOps team which is responsible for infrastructure in general, The SRE team is responsible for: system observability and alerting, managing and improving incident response processes, managing on-call rotations across the company.

Visit our Tech Radar to learn more about the technologies we use at Preply.

Our Engineering Blog: medium.com/preply-engineering

Open source github.com/...reply/graphene-federation

As a Site Reliability Engineer, you will:

Be responsible for Preply's uptime record;
Improve system scalability
Own availability and performance of mission critical services and build automation to prevent problem recurrence;
Improve system observability and alerting
Manage on-call rotations across company
Improve incident response processes
Establish credibility with the quality of the team's technical execution
Practice sustainable incident response and blameless postmortems.
Collaborate with product teams to help them tackle technical issues and design new systems

What we are looking for:

Expertise in problem solving and analyzing high loaded systems.
Proficiency with production troubleshooting is a must
Business-oriented & data-driven person
Experience with k8s, Docker, Helm.
Strong knowledge of any of those languages: Python, JS, Php, Java, Scala, Erlang at least 3+ years production experience.
Hands-on experience with any modern framework Django, Flask, Ruby on Rails, Magento, Spring, etc.

Nice to haves:

Python, Celery, Postresql, ElasticSearch, AWS, Kafka, Redis
Experience with DataDog
Experience with release engineering

What we offer: