The Canadian National Railway (reporting mark CN) is a Canadian Class I freight railway headquartered in Montreal, Quebec, that serves Canada and the Midwestern and Southern United States.
The CN has a project to create a data hub and centralize data for many business applications. A draft architecture of this data hub has already been prepared by an outsourced company.
In this context I was asked to complete the draft architecture and define the security access level to this data hub.
The platform is based on the Cloudera distribution with Nifi, Kafka, Hadoop, Spark, Spark Streaming, Hive, MongoDB and PostgreDB.
I was in charge to:
- Design the target architecture of the data hub platform.
- Define and design of the security strategy, users access control level (based on roles and tags: Apache Atlas, Ranger, Knox and Kerberos)
- Define the strategy and the roadmap to secure the Kafka platform cluster.
- Act as a Big Data technology advisor
- Implement a POC on AWS to Validate Security Policy