Nationale Bank of Canada

Solutions Architect

The National Bank of Canada or NBC is an integrated group that provides comprehensive financial services to individual, small and medium-sized business clients net of large corporations in its home market, as well as specialist services internationally.

NBC Financials Market has two projects:

  • The first one is moving a on-premise Cloudera Hadoop 18 nodes Cluster to Amazon Web Services Cloud infrastructure.
  • The second one is creating a Proof of Concept for an analytical models factory (machine learning black-box) that select best algorithm to predict stocks and market prices.

In this context I was asked to:

  • Define the Cloudera Hadoop Cluster migration strategy and road-map to the AWS Cloud.
  • Define and implement AWS data security policies for the cluster.
  • Define and implement AWS / EMR / Athena cluster monitoring strategy.
  • Review and support developers for Spark / MapReduce batch jobs within the migration period.
  • Act as Big Data trusted advisor (architectures and technologies).
  • Design a real-time analysis platform to ingest / forecast in real-time market data.
  • Analyze Murex application logs in order to create a predictive maintenance model.
  • Work on the POC of the analytical models factory and implement it on AWS using SageMaker.
  • Contribute as a technology advisor to implement a fraud detection PoC  on Google Cloud Platform.
Methodology
  • Do an inventory of all data, all compute functions to migrate to the Amazon Web Services Cloud.
  • Define a migration strategy.
  • Define the security and monitoring strategies.
  • Create the infrastructure on AWS
  • Start moving less critical data, tables and compute functions
  • Validate with the users and owners
  • Continue until finishing all data, tables and compute functions
  • Set the users access levels
  • Set up monitoring tools and KPI
Key Details

Role: Big Data Solutions / Enterprise Architect

Date: 2018

Project duration: 7 months

Location : Montréal – Canada

Technologies: Amazon Web Services, Spark (Scala, PySpark), Kafka, Nifi, Airflow, MySQL, Atlas,  Ranger, Hadoop, Hive, Impala, Athena, Amazon SageMaker.

Main Steps To create the POC

Defining access and security strategy

Definition of the data access strategy. Using AWS IAM to Define Role Based Access Control and Tag Based Access Control

Preparing the environment on AWS
  • Definition of the target architecture on AWS.
  • Creation of target environments on Amazon using S3 for storage and Athena to query data.
  • Preparation of the data to be migrated according to their usage by each department and service (volume 700 TB of data in Parquet format)
  • Creating S3 Buckets on Amazon S3 to store data
  • Validation of migrated data and their structures / formats
Initiation of the migration
  • Take inventory of the local platform to migrate to the AWS cloud
  • Select a subset of the data to make a first migration test
  • Prioritize data to migrate