ID 1288- Sr Data Platform Engineer (100% remoto)

CONEXIONHR
    Job Overview
    • RemotoSi

    About the Company
    This company is the world’s leading roadside assistance platform. We expand mobility and transportation options for consumers, automotive, logistics, and technology companies.

    Responsibilities
    First 3 months:
    o Understand our platform development environment and philosophy
    o Understand our cloud platform and applications’ infrastructure
    o Understand our engineering teams’ work culture

    First 6 months:
    o Employ various cloud agnostic tools to marry our internal and external systems
    and third-party APIs together
    o Develop data platform services
    o Build monitoring infrastructure / services to give visibility into the pipeline’s status
    o Interface with different teams to make data available for reporting and
    analytics

    Ongoing:
    o Continue to optimize our data platform
    o Gather data requirements from other teams and implement solutions for them
    o Ensure integrity between our various systems and champion the flow of data
    across all of our systems ensuring data consistency
    o Work with structured and unstructured data at scale from a variety of
    different data sources (key-value, document, columnar, etc.) as well as
    traditional RDBMSs
    o Constantly monitor and support our complete data ecosystem
    o Maintain the data platform security and integrity
    o Operate and manage the services in production

    Requirements

    • Strong programming skills. Must be proficient in one of the following languages:
      Python / Scala / Java
    • Must have working knowledge of Pyspark, Panda Data Frames, SparkSQL etc.
    • Working knowledge of messaging and data pipeline tools like Apache Kafka, Amazon
      Kinesis
    • Must have experience developing APIs using frameworks like Flask/Django etc.
    • Experience with stream-processing systems: Apache Spark-Streaming, Apache Storm, etc.
    • Experience working in open table / in-memory table formats for huge analytics dataset: Iceberg, Parquet, Arrow, AVRO etc.
    • Experience writing and understanding complex SQL queries

    Bonus points

    • Experience with AWS cloud services: EMR, Glue, Athena, RDS, Redshift
    • Have worked with data pipeline and governance tools: Airflow, Azkaban, Luigi etc.
    • Experience working with NoSQL databases like, Apache Solr, DynamoDB, MongoDB
    • Have knowledge of HDFS, Flume, Hive, MapReduce
    • Nice to have worked in one of the data warehouse tools like AWS Redshift, Snowflake