Senior Data Engineer
Location: West Pittsburgh, PA
Job Type: Full Time / Permanent
The Senior Data Engineer is a key member of our Data Engineering team that contributes to the success of our organization by enabling our Data Scientists to create and deploy models efficiently and effectively. As a Senior Data Engineer, you will design, build, and test machine learning pipelines and platforms to streamline the development of models as well as tools for model versioning, testing and validation. The Senior Data Engineer would be expected to be knowledgeable in Kubernetes and containerization technologies as well as have hands on experience within Google Cloud Platform. The Senior Data Engineer would also be expected to be a mentor for junior engineers to help the team build and improve their skills as their career grows.
- Build new machine learning platforms and support the ML pipelines to facilitate model training, evaluation, deployment and monitoring.
- Work with stakeholders to assist with data science related issues and support their data infrastructure needs.
- Maintain any Data Engineering cloud infrastructure via Infrastructure as Code
- Refine and improve our continuous integration/continuous delivery (CI/CD) pipelines to streamline deployment and product release cycles.
- Identify, design and implement internal process improvements (automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc).
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our data systems.
Education & Experience:
- Bachelor’s Degree in Computer Science, Software Engineering, Information Systems or Information Technology or related field required, or equivalent experience
- Three to five years of experience in Cloud technologies, Kubernetes, Machine Learning Platforms
- Experience with Kubernetes and Docker
- Experience building data pipelines utilizing Google Cloud Platforms Cloud Dataflow
- Experience building large-scale machine learning pipelines using Kubeflow or other machine learning platforms
- Experience with Linux/Unix environments
- Experience with object-oriented and scripting languages (Python, Java, etc.)
- Experience with continuous integration/continuous delivery (CI/CD) pipelines
- Experience with distributed data processing technologies (Spark, Dask, Apache Beam, etc..)
- Experience with NoSQL database technologies (MongoDB, Big Table, Cassandra, etc)
- Experience with Google Cloud Platform networking and Cloud Identity and Access Management
- Experience with message queuing, stream processing, and highly scalable ‘big data’ data stores (Kafka, Pub/Sub)
- Familiarity with ML Frameworks such as Tensorflow or Pytorch is a plus
- Experience with Agile Development and Agile Deployment tools and versioning using Git or similar tools
- Experience with Infrastructure as Code and deployment management tools (Terraform, Ansible, GCP Deployment Manager, etc..)