The Data

Building the Workforce of the Future

"There cannot be equity in society without equity in data collection, curation, and decisions."

Women in Big Data Founders
Read Report

Data Engineer (180670)

This job posting was marked by employer as filled and is probably no longer available
June 5, 2018
Various, CA/TX/NC
Job Type


Cloudera is looking for a Data Engineer to work on its internal Enterprise Data Hub. A Data Engineer is responsible for the design and implementation of data pipelines, drawing on a range of Hadoop technologies to support data ingest and transformation workflows with varying complexity, requirements, and SLAs.

Cloudera expects a Data Engineer to leverage both architectural as well as software development skills.

A Data Engineer at Cloudera will build on and develop fluency in Big Data ingest and transformation as well as being responsible for data quality and integrity for the pipelines.

It is also expected that a Data Engineer will contribute and collaborate with the entire stack--from Data Stewards, to Data Scientists, and Application Developers.  Data Engineers will work closely with other team members on system configuration, tools identification and with Data Scientists on applications and algorithms.

Data Engineers are expected to deliver clean and scalable data profiles/open data models to facilitate reusability and flexibility for changes over time – in an agile development environment. Data profiles/open data models will inform the daily actions of business groups across the Company.

A successful candidate will contribute both to the management of Cloudera’s internal data asset and to methodologies and tools for use by Cloudera solutions architects at customer sites.

Requirements (Entry Level)

  • Linux expertise a MUST

  • Strong grasp of SQL

  • Ability to code in Python and Java

  • Solid understanding of network and disk I/O

  • At least 1-2 years experience with ETL work

  • Fundamental understanding of data storage

  • Experience with Agile Development Methodology

  • Experience with Git Source Control

  • Understanding of basic principles of data governance

  • Excellent teaming and communication skills required

  • Experience or fundamental understanding of PySpark and Scala a plus

Requirements (Senior)

  • At least 3 years experience with Hadoop Technologies, specifically Impala, Hive, Map Reduce, Oozie, and Spark.

  • At least 5-10 years experience coding in SQL, scripting languages, Java.

  • At least 3-5 years experience with ETL, Business Intelligence, and Data Processing

  • Proven ability to architect and implement high-throughput data pipelines

  • Hands-on experience with Hadoop, ingest tools such as Kafka, Flume, Sqoop and Spark

  • Experience generalizing high-throughput Hadoop workflows (e.g. MapReduce, Spark)

  • Experience optimizing data storage in HDFS/Parquet/Avro, Kudu, and HBase

  • Firsthand experience handling petabytes of data

  • Past exposure to advanced numerical methods a plus

  • Experience with third party tools like Streamsets and Trifacta a plus

  • Experience with Cloudera Labs’ Envelope Spark Framework a big plus

Contact WiBD   Contact a WiBD member who works for the company and may be able to advise you.
Drop files here browse files ...
Are you sure you want to delete this file?