Previous Job
Previous
Big Data Engineer
Ref No.: 17-00069
Location: New York, New York
Join our client in building the next generation data-processing platform that helps Fortune 500 companies optimize video ads to directly increase sales. Our platform is built to handle over 10 billion events daily, process all events in real-time utilizing streaming technology, and run predictive modeling to optimize our clients' ads performance.

You'll take part in creating a scalable system to handle all our data and creating production level, parallelized big-data solutions. We strive to write robust and scalable code utilizing a combination of Java, Scala, and Python. Some of the frameworks and databases we're using are: AWS, Amazon Kinesis, Apache Spark, Databricks, Amazon Redshift, DynamoDB, and Aerospike, among others.

We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.

Responsibilities

Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
Implementing ETL processes
Monitoring performance and advising any necessary infrastructure/process changes
Requirements

5+ years hands-on programming experience in data engineering – they do not want a senior person.
Solid understanding of CS fundamentals
Proficiency programming in Java
Proficient in another modern programming language: Scala, Python, etc.
Proficient understanding of distributed computing principles
Ability to solve any ongoing issues with operating the cluster/instances
Proficiency with AWS ecosystem or Hadoop v2 ecosystem (Cloudera/MapR/Hortonworks), MapReduce, etc.
Experience with building stream-processing systems, using solutions such as Spark-Streaming, Storm or Kinesis
Experience with integration of data from multiple data sources
Degree in Computer Science, Computer Engineering or similar

Nice to Haves
Experience with Spark
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
Knowledge of various ETL techniques and frameworks, such as Flume
Experience with various messaging systems, such as Kafka or RabbitMQ
Experience with Big Data Client toolkits, such as SparkML, scikit-learn, H2O, etc.
Good understanding of Lambda Architecture, along with its advantages and drawbacks