Previous Job
Previous
Data Lake Staff Engineer (668670)
Ref No.: 18-13651
Location: Bellevue, Washington
Start Date / End Date: 10/29/2018 to 04/26/2019
Description:
Job Description:
As a core member, you will be part of the team that focuses on the next generation Client Data Lake Cloud Services.
Job Location: Bellevue, WA
Your primary responsibility includes designing and developing:
Distributed storage and query engine for heterogeneous data sources, including graph databases, key/value store, high performance parallel computing, data consistency and durability, data access path memory and I/O optimization, and query execution.
Data lake storage subsystem that provides low latency and high throughput for bulk data ingestion and point query, particularly large scale graph data.
Algorithms for analyzing query workload and intelligently managing the storage subsystem on optimizing the various system components. The goal is a complete autonomous system.
Big data tools that facilitate various consumers (users or programs), in terms of optimizing or automating the system.
The metadata module, data governance and security for the data lake cloud services.
The position involves strong problem-solving and analytical nature, and excellent verbal and written communication skills.
Requirements:
MS or higher degree in Computer Science or related field
5+ years of software development experience
Extensive knowledge and development experience in database-like kernel systems (e.g., traditional RDBMS, NoSQL, NewSQL) and storage subsystem or layer (e.g., LMDB, ScyllaDB, Kudu, CarbonData)
Understanding of transactions and distributed systems (such as CAP Theorem, and RAFT consensus algorithms) best practices and trade-offs
Query processing, optimization, execution, and relevant performance troubleshooting knowledge
Experience in developing big data service for cloud platforms
Excellent coding skills in Java or C++
Hands-on experience in Big Data Components/Frameworks, such as Hadoop, Spark, HBase, HDFS, Hive, NoSQL, Relational databases etc.
Strong plus or preference experience:
Experience/Knowledge on High Performance Computing
Knowledge on OS development, particularly I/O layer (file system and network)
Knowledge/experience of graph database engine
Metadata, data governance, and data security
Experience in other distributed system components, such as Solr, ElasticSearch, Kafka, Flink, REST API, ZooKeeper, etc.