About the Company
This company is the world’s leading roadside assistance platform. We expand mobility and transportation options for consumers, automotive, logistics, and technology companies.
First 3 months:
o Understand our platform development environment and philosophy
o Understand our cloud platform and applications’ infrastructure
o Understand our engineering teams’ work culture
First 6 months:
o Employ various cloud agnostic tools to marry our internal and external systems
and third-party APIs together
o Develop data platform services
o Build monitoring infrastructure / services to give visibility into the pipeline’s status
o Interface with different teams to make data available for reporting and
o Continue to optimize our data platform
o Gather data requirements from other teams and implement solutions for them
o Ensure integrity between our various systems and champion the flow of data
across all of our systems ensuring data consistency
o Work with structured and unstructured data at scale from a variety of
different data sources (key-value, document, columnar, etc.) as well as
o Constantly monitor and support our complete data ecosystem
o Maintain the data platform security and integrity
o Operate and manage the services in production
- Strong programming skills. Must be proficient in one of the following languages:
Python / Scala / Java
- Must have working knowledge of Pyspark, Panda Data Frames, SparkSQL etc.
- Working knowledge of messaging and data pipeline tools like Apache Kafka, Amazon
- Must have experience developing APIs using frameworks like Flask/Django etc.
- Experience with stream-processing systems: Apache Spark-Streaming, Apache Storm, etc.
- Experience working in open table / in-memory table formats for huge analytics dataset: Iceberg, Parquet, Arrow, AVRO etc.
- Experience writing and understanding complex SQL queries
- Experience with AWS cloud services: EMR, Glue, Athena, RDS, Redshift
- Have worked with data pipeline and governance tools: Airflow, Azkaban, Luigi etc.
- Experience working with NoSQL databases like, Apache Solr, DynamoDB, MongoDB
- Have knowledge of HDFS, Flume, Hive, MapReduce
- Nice to have worked in one of the data warehouse tools like AWS Redshift, Snowflake