由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
NewYork版 - Python Data Engineer (转载)
进入NewYork版参与讨论
1 (共1页)
k********o
发帖数: 236
1
【 以下文字转载自 NewJersey 讨论区 】
发信人: kittyhello (HelloKitty), 信区: NewJersey
标 题: Python Data Engineer
发信站: BBS 未名空间站 (Fri Nov 19 12:11:28 2021, 美东)
seeking a Senior Data Engineer consultant who enjoys data and building data
storage platforms from ground up. The ideal candidate has a passion for data
analysis, technology and helping people leverage the technology to
transform their business processes and analytics. As a Data Engineer, you
will be part of a team responsible for supporting a wide range of internal
customers. You will draw on all the skills in your toolkit to analyze,
design, and develop data storage and data analytic solutions using data lake
patterns, that help our customers run more effective operations and make
better business decisions.
Your Work Falls Into Two Primary Categories:
Strategy Development and Implementation
• Develop data filtering, transformational and loading requirements
• Define and execute ETLs using Apache Spark on Hadoop among other
Data technologies
• Determine appropriate translations and validations between source
data and target databases
• Implement business logic to cleanse & transform data
• Design and implement appropriate error handling procedures
• Develop project, documentation and storage standards in conjunction
with data architects
• Monitor performance, troubleshoot and tune ETL processes as
appropriate using tools like in the AWS ecosystem.
• Create and automate ETL mappings to consume loan level data source
applications to target applications
• Execution of end to end implementation of underlying data ingestion
workflow.
Operations and Technology
• Leverage and align work to appropriate resources across the team to
ensure work is completed in the most efficient and impactful way
• Understand capabilities of and current trends in Data Engineering
domain
Qualifications
• At least 5 years of experience developing in Python
• At least 4 years of experience in developing Apache Spark
applications
• Bachelor’s degree with equivalent work experience in statistics,
data science or a related field.
• Experience working with different Databases and understanding of
data concepts (including data warehousing, data lake patterns, structured
and unstructured data)
• 3+ years’ experience of Data Storage/Hadoop platform implementation
, including 3+ years of hands-on experience in implementation and
performance tuning Hadoop/Spark implementations.
• Implementation and tuning experience specifically using Amazon
Elastic Map Reduce (EMR).
• Implementing AWS services in a variety of distributed computing,
enterprise environments.
• Experience writing automated unit, integration, regression,
performance and acceptance tests
• Solid understanding of software design principles
Key to success in this role
• Strong consultation and communication skills
• Ability to work with and collaborate across the team and where silos
exist
• Deep curiosity to learn about new trends and how to do things better
• Ability to use data to help inform strategy and direction
Top Personal Competencies to possess
• Seek and Embrace Change – Continuously improve work processes
rather than accepting the status quo
• Growth and Development – Know or learn what is needed to deliver
results and successfully compete
Preferred Skills
• Understanding of Apache Hadoop and the Hadoop ecosystem. Experience
with one or more relevant tools (Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper,
HCatalog, Solr, Avro).
• Deep knowledge on Extract, Transform, Load (ETL) and distributed
processing techniques such as Map-Reduce
• Experience with Columnar databases like Snowflake, Redshift
• Experience in building and deploying applications in AWS (EC2, S3,
Hive, Glue, EMR, RDS, ELB, Lambda, etc.)
• Experience with building production web services
• Experience with cloud computing and storage services
send resume to [email protected]
1 (共1页)
进入NewYork版参与讨论