Location: Pittsburgh, PA
Job Type: Full Time / Permanent
The Data Engineer is a key member of a platform team that contributes to software design, development and overall product lifecycle for a product that delights our users and adds value to the organization. The engineering process is highly collaborative. The Data Engineer is expected to pair on-a-daily-basis as they work through user stories and support products as they evolve. In addition, the Data Engineer may be involved in product configuration, performance tuning and testing as well as production monitoring. As a Data Engineer, you will be part of a team with more experienced engineers to help build and grow your skills while you create, support, and deploy production applications.
- Build the infrastructure to support coding, testing, processing, and maintaining data resources in support of the Data Science, analytics and reporting organizations using SQL, SQOOP, Python, Google Big Query, Kafka and other Big Data technologies.
- Collaborate with Data Scientists in the development of predictive models using machine learning, natural language and statistical analysis methods.
- Design and implement internal process improvements (automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc).
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs.
- Develop, refine and oversee data management standards, including establishing and enforcing governance procedures and ensuring data integrity across multiple functions.
- Responsible for owning data quality metrics and meeting defined data accuracy goals according to industry best practices.
Education & Experience:
- Bachelor’s Degree: Computer Science, Software Engineering, Information Systems or Information Technology or related field required, or equivalent experience
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc. Experience with authoring complex SQL queries
- Experience with NoSQL database technologies (MongoDB, Cassandra, etc) Knowledge of Agile
- Development and Agile Deployment tools and versioning using Git or similar tools
- Experience with Hadoop and other Big Data technologies such as Spark, PySpark and Kafka
- Knowledge of data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Experience with message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Experience building data pipelines utilizing Google Cloud platform.
- Experience with git or other code repository tools Experience with Concourse or other CI/CD tools.
- Google Cloud Platform (GCS, BQ, etc), Apache Kafka, Python