bigshyft
AArcesium
Arcesium
Site Reliability Engineer
Fin Tech
SaaS
B2B
Seed
Start-up
MnC
Software
1000-5000 Employees
8y - 14y

(Competitive pay)

Hyderabad, Bengaluru/ Bangalore
Python, Kubernetes, observability, monitoring, Dynatrace

Role

Company

Job Description

What you'll do:

  • Design, develop, and implement scalable and reliable monitoring solutions for distributed systems at scale.
  • Define and implement monitoring requirements in collaboration with cross-functional teams.
  • Lead the development of monitoring architectures and strategies.
  • Integrate monitoring tools into existing infrastructure.
  • Maintain and support monitoring systems.
  • Demonstrate strong technical breadth/depth, driving innovation, evaluating new technologies, and deciphering the technical vision for engineering teams.
  • Own key contributions to technical design and architecture decisions, considering trade-off s of choices, managing risk, making decisions independently where appropriate, and presenting reasoned options for decision-making by others.
  • Lead the way by writing exemplary code, documentation, and RFCs.
  • Identify, propose, develop, deploy, and own R&D projects in accordance with the technical vision and needs of the team, turning problem statements into solutions, and operating independently as needed.


What makes you a great fit:

  • 10+ years of experience in SRE or a related field.
  • Proven experience in designing, developing, and implementing monitoring solutions.
  • Deep understanding of monitoring technologies and tools, including Prometheus, Grafana, Loki, and Tempo
  • Experience with cloud-based monitoring systems, such as New Relic, Datadog, and Grafana Cloud
  • Experience with log analysis tools, such as Splunk, Logstash, Fluent, and Sumo Logic
  • Experience with distributed tracing implementation using Open Telemetry, Jaeger
  • Strong understanding of SRE principles and practices.
  • Experience with incident response and management.
  • Reliability: An exposure to Chaos Engineering and various reliability practices ces including disaster recovery will be good to have.
  • Experience with Cloud Computing like AWS.
  • Experience with Kubernetes.
  • Experience in Agile practices (Scrum)
  • Excellent analytical, problem-solving, and troubleshooting skills.
  • Excellent communication and presentation skills.

All about us
Arcesium
  • Arcesium is a global financial technology and professional services firm, delivering solutions to some of the world’s most sophisticated financial institutions, including hedge funds, banks, and institutional asset managers. Expertly designed to achieve a single source of truth throughout a client’s ecosystem
  • Arcesium’s cloud-native technology is built to systematize the most complex tasks.
Employee count
1000-5000 Employees
Employment Type
Full Time Job
Company Type
Start-up, MnC
Headquarters
New York, New York, United States

Apply to Similar Jobs

  • RRubrik
    Rubrik
    Site Reliability Engineer (SRE) - Jarvis
    Series F
    Start-up
    1001-5000 employees
    3y - 9y
    ₹20 - ₹50 LPA
    Bengaluru/ Bangalore
    Python, Java, Unix, C++, DevOps
  • HHighRadius
    HighRadius
    SRE Architect
    Fin Tech
    AI/ML
    SaaS
    B2B
    Series C
    9y - 16y

    Competitive pay

    Hyderabad
    Linux, AWS, Google Cloud Platform, Azure