AWS Database Migration Service
Step-by-Step Migration Guide
API Version 2016-01-01
AWS Database Migration Service: Step-by-Step Migration Guide
Copyright © 2021 Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.
Table of Contents
AWS Database Migration Service Step-by-Step Walkthroughs ... 1
Migrating Databases to AWS Managed Databases ... 2
Migrating a MySQL Database to Amazon RDS for MySQL or Amazon Aurora MySQL ... 2
Full load ... 3
Performance Comparison ... 7
AWS DMS Ongoing Replication ... 7
Resources ... 8
Migrating a SQL Server Always On Database to AWS ... 8
Prerequisites ... 8
SQL Server Always On Availability Groups ... 9
Migrating from Amazon RDS for Oracle to Amazon RDS for PostgreSQL and Aurora PostgreSQL ... 12
Can My Oracle Database Migrate? ... 12
Migration Strategies ... 13
The 12 Step Migration Process ... 14
Automation ... 16
Future State Architecture Design ... 16
Database Schema Conversion ... 18
Application Conversion or Remediation ... 19
Script/ETL/Report Conversion ... 20
Integration with Third-Party Applications ... 21
Data Migration Mechanism ... 21
Testing and Bug Fixing ... 22
Performance Tuning ... 24
Setup, DevOps, Integration, Deployment, and Security ... 24
Documentation and Knowledge Transfer ... 27
Project Management and Version Control ... 27
Post-Production Support ... 28
Platform Differences ... 28
Migrating from SAP ASE to Amazon Aurora MySQL ... 29
Prerequisites ... 31
Preparation and Assessment ... 31
Database Migration ... 33
Best Practices ... 38
Migrating Databases to the AWS Cloud Using AWS DMS ... 39
Migrating an On-Premises Oracle Database to Amazon Aurora MySQL ... 40
Costs ... 40
Migration High-Level Outline ... 41
Migration Step-by-Step Guide ... 44
Working with the Sample Database for Migration ... 58
Migrating an Amazon RDS for Oracle Database to Amazon Aurora MySQL ... 59
Costs ... 59
Prerequisites ... 60
Migration Architecture ... 61
Step-by-Step Migration ... 62
Next Steps ... 92
Migrating a SQL Server Database to Amazon Aurora MySQL ... 92
Prerequisites ... 92
Step-by-Step Migration ... 93
Troubleshooting ... 108
Migrating an Amazon RDS for SQL Server Database to an Amazon S3 Data Lake ... 109
Why Amazon S3? ... 109
Why AWS DMS? ... 110
Solution Overview ... 110
Prerequisites ... 112
Step-by-Step Migration ... 112
Migrating an Oracle Database to PostgreSQL ... 129
Prerequisites ... 129
Step-by-Step Migration ... 130
Rolling Back the Migration ... 146
Troubleshooting ... 146
Migrating an Amazon RDS for Oracle Database to Amazon Redshift ... 147
Prerequisites ... 147
Migration Architecture ... 148
Step-by-Step Migration ... 149
Next Steps ... 178
Migrating MySQL-Compatible Databases to AWS ... 178
Migrating a MySQL-Compatible Database to Amazon Aurora MySQL ... 179
Migrating Data from an External MySQL Database to an Amazon Aurora MySQL Using Amazon S3 ... 179
Migrating MySQL to Amazon Aurora MySQL by Using mysqldump ... 189
Migrating Data from an Amazon RDS MySQL DB Instance to an Amazon Aurora MySQL DB Cluster ... 189
Migrating a MariaDB Database to Amazon RDS for MySQL or Amazon Aurora MySQL ... 196
Set up MariaDB as a source database ... 197
Set up Aurora MySQL as a target database ... 199
Set up an AWS DMS replication instance ... 200
Test the endpoints ... 201
Create a migration task ... 201
Validate the migration ... 202
Cut over ... 203
Migrating from MongoDB to Amazon DocumentDB ... 204
Launch an Amazon EC2 instance ... 204
Install and configure MongoDB community edition ... 205
Create an AWS DMS replication instance ... 206
Create source and target endpoints ... 207
Create and run a migration task ... 208
AWS Database Migration Service Step-by-Step Walkthroughs
You can use AWS Database Migration Service (AWS DMS) to migrate your data to and from most widely used commercial and open-source databases such as Oracle, PostgreSQL, Microsoft SQL Server, Amazon Redshift, Amazon Aurora, MariaDB, and MySQL. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations between different database platforms, such as Oracle to MySQL or MySQL to Amazon Aurora MySQL-Compatible Edition. The source or target database must be on an AWS service.
In this guide, you can find step-by-step walkthroughs that go through the process of migrating sample data to AWS:
• Migrating an On-Premises Oracle Database to Amazon Aurora MySQL (p. 40)
• Migrating an Amazon RDS Oracle Database to Amazon Aurora MySQL (p. 59)
• Migrating a SQL Server Database to Amazon Aurora MySQL (p. 92)
• Migrating an Amazon RDS for SQL Server Database to an Amazon S3 Data Lake (p. 109)
• Migrating an Oracle Database to PostgreSQL (p. 129)
• Migrating an Amazon RDS for Oracle Database to Amazon Redshift (p. 147)
• Migrating MySQL-Compatible Databases to AWS (p. 178)
• Migrating a MySQL-Compatible Database to Amazon Aurora MySQL (p. 179)
• Migrating a MariaDB Database to Amazon RDS for MySQL or Amazon Aurora MySQL (p. 196)
• Migrating from MongoDB to Amazon DocumentDB (p. 204)
• Migrating a MySQL Database to Amazon RDS for MySQL or Amazon Aurora MySQL (p. 2)
• Migrating a SQL Server Always On Database to AWS (p. 8)
• Migrating from Amazon RDS for Oracle to Amazon RDS for PostgreSQL and Aurora PostgreSQL (p. 12)
• Migrating from SAP ASE to Amazon Aurora MySQL (p. 29)
In the DMS User Guide, you can find additional resources:
• Migrating Large Data Stores Using AWS Database Migration Service and AWS Snowball Edge
RDS for MySQL or Amazon Aurora MySQL
Migrating Databases to AWS Managed Databases
You can move from existing, self-managed, open-source, and commercial databases to fully managed AWS databases of the same engine. The following walkthroughs show how to move your databases to Amazon Relational Database Service (Amazon RDS) and Amazon Aurora.
Topics
• Migrating a MySQL Database to Amazon RDS for MySQL or Amazon Aurora MySQL (p. 2)
• Migrating a SQL Server Always On Database to AWS (p. 8)
• Migrating from Amazon RDS for Oracle to Amazon RDS for PostgreSQL and Aurora PostgreSQL (p. 12)
• Migrating from SAP ASE to Amazon Aurora MySQL (p. 29)
Migrating a MySQL Database to Amazon RDS for MySQL or Amazon Aurora MySQL
You can use these two main approaches for migrating a self-managed MySQL database to an Amazon RDS for MySQL or Amazon Aurora MySQL database.
• Use a native or third-party database migration tool such as mysqldump to perform the full load and MySQL replication to perform ongoing replication. Typically this is the simplest option.
• Use a managed migration service such as the AWS Database Migration Service (AWS DMS). AWS DMS provides migration-specific services such as data validation that are not available in the native or third- party tools.
The following diagram displays these two approaches.
You can use a hybrid strategy that combines native or third-party tools for full load and AWS DMS for ongoing replication. The following diagram displays the hybrid migration approach.
The hybrid option delivers the simplicity of the native or third-party tools along with the additional services that AWS DMS provides. For example, in AWS DMS, you can automatically validate your migrated data, row by row and column by column, to ensure the data quality in the target database. Or, if you are only migrating a subset of the tables, it will be simpler to use AWS DMS to filter your tables than the equivalent configuration in the native or third-party tools.
Topics
• Full load (p. 3)
• Performance Comparison (p. 7)
• AWS DMS Ongoing Replication (p. 7)
• Resources (p. 8)
Full load
You can use one of these three tools to move data from your MySQL database to Amazon RDS for MySQL or Amazon Aurora MySQL. Follow the steps described in this document to perform the full data load.
mysqldump
This native MySQL client utility installs by default with the engine that performs logical backups, producing a set of SQL statements that you can execute to reproduce the original database object definitions and table data. mysqldump dumps one or more MySQL databases for backup or transfer to another MySQL server. For more information, see the mysqldump documentation.
mysqldump is appropriate when the following conditions are met:
• The data set is smaller than 10 GB.
• The network connection between source and target databases is fast and stable.
• Migration time is not critical, and the cost of re-trying the migration is very low.
• You don’t need to do any intermediate schema or data transformations.
You can decide not to use this tool if any of the following conditions are true:
• You migrate from an Amazon RDS for MySQL DB instance or a self-managed MySQL 5.5 or 5.6 database. In that case, you can get better performance results with Percona XtraBackup.
• It is impossible to establish a network connection from a single client instance to source and target databases due to network architecture or security considerations.
• The network connection between the source and target databases is unstable or very slow.
• The data set is larger than 10 GB.
• An intermediate dump file is required to perform schema or data manipulations before you can import the schema or data.
For details and step-by-step instructions, see Importing data to an Amazon RDS for MySQL or MariaDB DB instance with reduced downtime in the Amazon RDS User Guide.
Follow these three steps to perform full data load using mysqldump.
1. Produce a dump file containing source data.
2. Restore this dump file on the target database.
3. Retrieve the binlog position for ongoing replication.
For example, the following command creates the dump file. The --master-data=2 parameter creates a backup file, which you can use to start the replication in AWS DMS.
sudo mysqldump \
--databases <database_name> \ --master-data=2 \
--single-transaction \ --order-by-primary \ -r <backup_file>.sql \ -u local_user \ -p <local_password>
For example, the following command restores the dump file on the target host.
mysql -h host_name -P 3306 -u db_master_user -p < backup_file.sql
For example, the following command retreives the binlog file name and position from the dump file.
Save this information for later when you configure AWS DMS for ongoing replication.
head mysqldump.sql -n80 | grep "MASTER_LOG_POS"
-- Will Get output similar to
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000125', MASTER_LOG_POS=150;
Percona XtraBackup
Amazon RDS for MySQL and Amazon Aurora MySQL support migration from Percona XtraBackup files that are stored in an Amazon S3 bucket. Percona XtraBackup produces a binary backup files which can be significantly faster than migrating from logical schema and data dumps using tools such as mysqldump.The tool can be used for small-scale to large-scale migrations.
Percona XtraBackup is appropriate when the following conditions are met:
• You have administrative, system-level access to the source database.
• You migrate database servers in a 1-to-1 fashion: one source MySQL server becomes one new Amazon RDS for MySQL or Aurora DB cluster.
You can decide not to use this tool if any of the following conditions are true:
• You can’t use third-party software because of operating system limitations.
• You migrate into existing Aurora DB clusters.
• You migrate multiple source MySQL servers into a single Aurora DB cluster.
• For more information, see Limitations and recommendations for importing backup files from Amazon S3 to Amazon RDS.
For details and step-by-step instructions, see Migrating data from MySQL by using an Amazon S3 Bucket in the Amazon RDS User Guide.
Follow these three steps to perform full data load using Percona XtraBackup.
1. Produce a backup file containing source data.
2. Restore this backup file from Amazon S3 while launching a new target database.
3. Retrieve the binlog position for ongoing replication.
For example, the following command creates the backup file and streams it directly to Amazon S3.
xtrabackup --user=<myuser> --backup --parallel=4 \ --stream=xbstream --compress | \
aws s3 cp - s3://<bucket_name>/<backup_file>.xbstream
Use the Amazon RDS console to restore the backup files from the Amazon S3 bucket and create a new Amazon Aurora MySQL DB cluster. For more information, see Restoring an Amazon Aurora MySQL DB cluster from an Amazon S3 bucket.
For example, the following command prints the binlog information after you finish the creation of a compressed backup.
MySQL binlog position: filename 'mysql-bin.000001', position '481'
For example, the following command retreives the binlog file name and position from the from the xtrabackup_binlog_info file. This file is located in the main backup directory of an uncompressed backup.
$ cat </on-premises/backup>/xtrabackup_binlog_info // Output
mysql-bin.000001 481
mydumper
mydumper and myloader are third-party utilities that perform a multithreaded schema and data migration without the need to manually invoke any SQL commands or design custom migration scripts.
mydumper functions similarly to mysqldump, but offers many improvements such as parallel backups, consistent reads, and built-in compression. Another benefit to mydumper is that each individual table gets dumped into a separate file. The tools are highly flexible and have reasonable configuration defaults. You can adjust the default configuration to satisfy the requirements of both small-scale and large-scale migrations.
mydumper is appropriate when the following conditions are met:
• Migration time is critical.
• You can’t use Percona XtraBackup.
You can decide not to use this tool if any of the following conditions are true:
• You migrate from an Amazon RDS for MySQL DB instance or a self-managed MySQL 5.5 or 5.6 database. In that case, you might get better results Percona XtraBackup.
• You can’t use third-party software because of operating system limitations.
• Your data transformation processes require intermediate dump files in a flat-file format and not an SQL format.
For details and step-by-step instructions, see the mydumper project.
Follow these three steps to perform full data load using mydumper.
1. Produce a dump file containing source data.
2. Restore this dump file on the target database using myloader.
3. Retrieve the binlog position for ongoing replication.
For example, the following command creates the backup of DbName1 and DbName2 databases using mydumper.
mydumper \
--host=<db-server-address> \
--user=<mydumper-username> --password=<mydumper-password> \ --outputdir=/db-dump/mydumper-files/ \
-G -E -R --compress --build-empty-files \ --threads=4 --compress-protocol \
--regex '^(DbName1\.|DbName2\.)' \
-L /<mydumper-logs-dir>/mydumper-logs.txt
For example, the following command restores the backup to the Amazon RDS instance using myloader.
myloader \
--host=<rds-instance-endpoint> \
--user=<db-username> --password=<db-password> \ --directory=<mydumper-output-dir> \
--queries-per-transaction=50000 --threads=4 \
--compress-protocol --verbose=3 -e 2><myload-output-logs-path>
For example, the following command retreives the binlog information from the mydumper metadata file.
cat <mydumper-output-dir>/metadata
# It should display data similar to the following:
SHOW MASTER STATUS:SHOW MASTER STATUS:
Log: mysql-bin.000129 Pos: 150
GTID:
Note
1. To ensure a valid dump file of logical backups in mysqldump and mydumper, don’t run data definition language (DDL) statements while the dump process is running. It is recommended to schedule a maintenance window for these operations. For details, see the single-
transaction documentation.
2. While exporting the data with logical backups, it is recommended to exclude MySQL default schemas (mysql, performance_schema, and information_schema), functions, stored procedures, and triggers.
3. Remove definers from schema files before uploading extracted data to Amazon RDS. For more information, see How can I resolve definer errors.
4. Any backup operation acquires a global read lock on all tables (using FLUSH TABLES WITH READ LOCK). As soon as this lock has been acquired, the binary log coordinates are read and the lock is released. For more information, see Establishing a Backup Policy. For logical
backups this step done at the beginning of the logical dump, however for physical backup (Percona XtraBackup) this step done at the end of backup.
Performance Comparison
We tested these three full load options using a Mysql 5.7 database on EC2 as the source and Aurora MySQL 5.7 as the target. The source database contained the AWS DMS sample database with a total of 9 GB of data. The following image shows the performance results.
Percona XtraBackup performed 4x faster than mysqldump and 2x faster than mydumper backups. We tested larger datasets, for example with a total of 400 GB of data, and found that the performance scaled proportionally to the dataset size.
Percona XtraBackup creates a physical backup of the database files whereas the other tools create logical backups. Percona XtraBackup is the best option for full load if your use case conforms to the restrictions listed in the Percona XtraBackup section above. If Percona XtraBackup isn’t compatible with your use case, mydumper is the next best option. For more information about physical and logical backups, see Backup and Recovery Types.
AWS DMS Ongoing Replication
To configure the ongoing replication in AWS DMS, enter the native start point for MySQL, which you have retrieved at the end of the full load process as described for each tool. The native start point will be similar to mysql-bin-changelog.000024:373.
In the Create database migration task page, follow these three steps to create the migration task.
1. For Migration type, choose Replicate ongoing changes.
2. Under CDC start mode for source transactions, choose Enable custom CDC start mode.
3. Under Custom CDC start point, paste the native start point you saved earlier.
For more information, see Creating tasks for ongoing replication using AWS DMS and Migrate from MySQL to Amazon RDS with AWS DMS.
NoteThe AWS DMS CDC replication uses plain SQL statements from binlog to apply data changes in the target database. Therefore, it is slower and more resource-intensive than the native Primary/
Replica binary log replication in MySQL. For more information, see Replication with a MySQL or MariaDB instance running external to Amazon RDS.
You should always remove triggers from the target during the AWS DMS CDC replication. For example, the following command generates the script to remove triggers.
# In case required to generate drop triggers script
SELECT Concat('DROP TRIGGER ', Trigger_Name, ';') FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA not in ('sys','mysql');
Resources
For more information, see the following references:
1.Amazon Aurora Migration Handbook 2.What is AWS Database Migration Service?
3.Best practices for AWS Database Migration Service
4.Using a MySQL-compatible database as a source for AWS DMS 5.Using a MySQL-compatible database as a target for AWS DMS
Migrating a SQL Server Always On Database to AWS
Microsoft SQL Server Always On is a high-availability feature for Microsoft SQL Server databases.With the synchronous-commit secondary replica, your application remains transparent to a failover. If the primary node in the Always On Availability Group (AAG) fails due to unforeseen circumstances or due to maintenance, your applications remain unaware of the failure, and automatically redirect to a functional node. You can use AWS Database Migration Service (AWS DMS) to migrate a SQL Server Always On database to all supported target engines. AWS DMS has the flexibility to adapt to your Always On configuration, but it may be unclear how to set up the optimal AWS DMS configuration.
Using this guide, you can learn how to configure AWS DMS to migrate a SQL Server Always On database to AWS. This guide also describes specific configuration and troubleshooting issues and best practices to resolve them.
The guide includes a customer use case and covers the issues that the customer encountered when configuring AWS DMS, along with the solutions employed.
Topics
• Prerequisites (p. 8)
• SQL Server Always On Availability Groups (p. 9)
Prerequisites
The following prerequisites are required to complete this walkthrough:
• Understand how to work with Microsoft SQL Server as a source for AWS DMS. For information about working with SQL Server as a source, see Using a SQL Server Database as a Source for AWS Database Migration Service.
• Understand how to work with SQL Server Always On availability groups. For more information about working with SQL Server Always On availability groups, see Working with SQL Server Always On availability groups
• Understand how to run prerequisite tasks for AWS DMS, such as setting up your source and target databases. For information about prerequisites for AWS DMS, see Prerequisites for AWS Database Migration Service.
• Understand the supported features and limitations of AWS DMS. For information about AWS DMS, see What Is AWS Database Migration Service?.
For more information about AWS DMS, see the AWS DMS User Guide.
SQL Server Always On Availability Groups
Always On availability groups provide high availability, disaster recovery, and read-scale balancing. These availability groups require a cluster manager. The Always On availability groups feature provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012 (11.x), Always On availability groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a fail-over environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.
AWS DMS Use Case
A customer used AWS DMS to migrate data from a SQL Server 2017 source database. This database was clustered in a 4-node Always On Availability Group (AAG) configuration. The customer configured the AWS DMS source endpoint to connect directly to the IP address of the primary node of the AAG by using an IP address. With this setup, the customer used the AAG HA/DR functionality for internal applications. In this case, AWS DMS can’t use the secondary database if a failover happens. The customer used the target endpoint to populate an Operational Data Store (ODS) of the Amazon RDS for SQL Server database and an Amazon Simple Storage Service (Amazon S3) data lake.
The following diagram displays the customer’s existing architecture.
Issues with This Approach
Maintenance activities (operating system patching, RDBMS patching) can cause a server failover and AWS DMS will not be able to connect to the source.
Activity and transactions continue to occur on the failover database as shown in the preceding image.
Because of this, the change data capture task becomes out of sync when the cluster fails back to the primary node.
At the start of the task, AWS DMS polls all the nodes in Always On cluster for transaction backups. The AWS DMS task can also fail if transaction backup happens from any other node than the primary.
The Solution Recommended by AWS DMS
To address connectivity design deficiencies, AWS DMS recommended to configure the AWS DMS source endpoint to connect to the AAG listener IP address or a canonical name record instead of connecting directly to the IP address of the primary node. In case of a failover, AWS DMS will interact with the secondary databases, like any other application. Without using the AAG listener IP address, AWS DMS will not be aware of the secondary replica to connect in case of a failover.
The following diagram displays the proposed architecture.
AWS DMS recommended to set the extra connection attribute MultiSubnetFailover=True in the customer’s AWS DMS endpoint. This ODBC driver attribute helps AWS DMS connect to the new primary in case of an Availability Group failover. This attribute is designed for situations when the connection is broken. In these situations, AWS DMS attempts to connect to all IP addresses associated with the AAG listener. For more information, see Multi-subnet failovers.
Also, AWS DMS recommended to set the extra connection attribute
alwaysOnSharedSynchedBackupIsEnabled=false to poll all the nodes in Always On cluster for transaction backups.
For more information on extra connection attributes for SQL Server as source, see Extra connection attributes when using SQL Server as a source for AWS DMS.
RDS for PostgreSQL and Aurora PostgreSQL
Migrating from Amazon RDS for Oracle to Amazon RDS for PostgreSQL and Aurora PostgreSQL
Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL- Compatible Edition have evolved as a strong and cost-effective alternatives to Oracle without the need for a software license or a server to manage. The journey from Amazon RDS for Oracle to Amazon RDS for PostgreSQL and Aurora PostgreSQL has never been easier. This guide provides a quick overview of the process and considerations to be made when moving existing workloads to Amazon RDS for PostgreSQL or Aurora PostgreSQL and some of the tools that can assist in the process. It complements a large body of detailed online reference guidance on every aspects of a migration, and serves to provide a birds eye view of the process.
This document focuses on migrating custom applications where you control the source code. If you operate a packaged vendor application on Oracle, you must determine if the vendor supports the new platform.
Topics
• Can My Oracle Database Migrate? (p. 12)
• Migration Strategies (p. 13)
• The 12 Step Migration Process (p. 14)
• Future State Architecture Design (p. 16)
• Database Schema Conversion (p. 18)
• Application Conversion or Remediation (p. 19)
• Script/ETL/Report Conversion (p. 20)
• Integration with Third-Party Applications (p. 21)
• Data Migration Mechanism (p. 21)
• Testing and Bug Fixing (p. 22)
• Performance Tuning (p. 24)
• Setup, DevOps, Integration, Deployment, and Security (p. 24)
• Documentation and Knowledge Transfer (p. 27)
• Project Management and Version Control (p. 27)
• Post-Production Support (p. 28)
• Automation (p. 16)
• Platform Differences (p. 28)
Can My Oracle Database Migrate?
To quickly see if your workload qualifies as a migration candidate, please use the DMA Connect Application and Database Questionnaire to sort out migration obstacles specific to your application.
Consider the following questions. The more you answer No, the easier the migration to Amazon RDS for PostgreSQL or Aurora PostgreSQL will be.
Application Questions Comments
Are there Oracle dependent parts of the
application that you can’t modify by yourself? If you don’t control all of the code it can be difficult to change the underlying database.
Is the application commercial off the shelf, and
not available for PostgreSQL? Unless the COTS application also supports Oracle, it will not be able to migrate.
Application Questions Comments Does the application use specific methods to
connect to an Oracle database such as Oracle Call Interface (OCI)?
Refactoring OCI calls to ODBC is not impossible, but typically an involved process.
Does the application use Oracle specific libraries? It could be challenging finding PostreSQL replacements for Oracle specific libraries.
Database Questions Comments
Does the database use any third party packages? It could be challenging finding PostreSQL replacements for Oracle specific packages.
Does the database use any data cartridges? It could be challenging finding PostreSQL replacements for Oracle specific cartridges.
Does the application use Oracle Forms or
Application Express (APEX)? Completely replacing Forms or APEX with a non- Oracle solution is substantial.
Does the database use SQLJ or .NET stored
procedures? You can refactor external stored procedure code
for use with PostgreSQL, but it adds development work.
Does the database use Oracle Streams? Some refactoring is required to replace Oracle Streams with a PostgreSQL-compatible solution.
Does the database use Oracle Multi Media? Some refactoring is required to replace Oracle Multi Media with a PostgreSQL-compatible solution.
Does the database use Oracle Locator? Depending on feature use, such a solution may be refactored to work with PostGIS 3.1.
Does the database use Oracle Java Virtual
Machine (JVM)? Detaching a Java application from Oracle JVM can
be involved development work.
Does the database use Oracle Machine Learning or
formerly Advanced Analytics? The solution will have to be refactored to use similar functionality on AWS.
Migration Strategies
The options for dealing with a legacy application have often been described as the 6 R’s. For more information, see 6 Strategies for Migrating Applications to the Cloud on the AWS Cloud Enterprise Strategy Blog.
• Re-host
• Re-platform
• Repurchase
• Refactor/Re-architect
• Retire
• Retain
This document describes the steps to migrate database instances running on Amazon RDS for Oracle to Aurora PostgreSQL or Amazon RDS for PostgreSQL. This also details out the steps to Re-platform and Refactor the application(s) running on these databases.
Re-platforming and re-architecting a database application ranges from modifying the code to work with a different cloud-native database to also adopting other cloud-native operations such as serverless application architectures like Kubernetes. This document deal with the changes necessary to migrate to a new database with pointers to other available documentation.
The 12 Step Migration Process
You may have an Oracle database in Amazon RDS for both production or non-production purposes, and it may just be convenience and familiarity that steered you to Oracle even though there is a licensing cost to this choice. It is certainly easier to continue with the database you know than something new, but sometimes there are few remaining reasons do so.
Everyone’s Oracle application is special, and nobody has the same setup and needs for the future. To provide a single framework for migration, this guide organizes the work in 12 topics in order to cover what is in scope for most applications. These topics will be used in sequence for multiple purposes and should not be seen as a strictly linear process, but rather an overall arch of a migration project where individual topics and activities can be overlapped or swapped to fit specific project conditions. The following image shows the 12 steps with an approximate share of effort in a typical project.
Each step will be described at a high level in order to allow the reader to skip to relevant topics in the following chapters.
1.Future State Architecture Design
The understanding of the current design and its requirements together with those of the future state are addressed here with deployment diagrams and feature or component mappings for things that will change as a result of the migration. This step defines the scope and architectural view of the migration.
2.Database Schema Conversion
Because we are migrating a database application from Oracle to Amazon RDS or Aurora, the database schema needs to change. Subtle differences in functionality and syntax need to accommodate the new platform and comprehensive tooling exists to automate this step. In this step we include replacements for any Oracle specific database feature which works differently on PostgreSQL.
3.Application Conversion or Remediation
The Oracle application may be implemented in any programming language like Java or C#, and often abstracts from the nature of the underlying database through an object relational model (ORM). But it is also common to have some reliance on the database syntax directly in the application code, and this step covers the necessary changes to the application code to work with Amazon RDS for PostgreSQL.
In this step we also include operating system dependencies like direct file access which may need to change on the new platform.
4.Script/ETL/Report Conversion
An Oracle application may move data in and out sideways in addition to the application for the purpose of reporting or data import/export. This can happen by executing a stored procedure or through external scripting and SQL*Loader. Such scripts and PL/SQL statements and the operational framework will need to be modified to work with PostgreSQL.
5.Integration with Third-Party Applications
Few applications are islands and often connect to other applications and monitoring. These dependencies may be affected by the move to the PostgreSQL database platform. The monitoring of the database may need to use native AWS tools or the third-party applications use Oracle specific means of communication. These dependencies may already support PostgreSQL or suitable replacements will need to be found.
6.Data Migration Mechanism
As we move from one database platform to another the data needs to move as well. This will happen several times through the migration, first for testing purposes and later for production cutover. If there are multiple customers of the database application they may need to be migrated at different times once the application has been migrated.
7.Testing and Bug Fixing
Migration touches all the stored procedures and functions and may affect substantial parts of the application code. For this reason good testing is required both at the unit and system functional level.
8.Performance Tuning
Due to database platform differences and syntax, certain constructs or combinations of data objects may perform differently on the new platform. The performance tuning part of the migration resolves any bottlenecks that might have been created.
9.Setup, DevOps, Integration, Deployment, and Security
How the application is put together and deployed may be impacted by the migration, and many customers take the opportunity to embrace infrastructure as code for the first time in the context of a migration. In this step we also focus on the impact to application security. In this step we also address cutover planning.
10.Documentation and Knowledge Transfer
In order to support the application going forward it may be necessary to document the changes that happened to the application and the operational environment. Maintenance of the application will have been impacted by the change of database and certain application behavior may have changed.
This is especially important if the migration is done by a different team from those maintaining the application.
11.Project Management and Version Control
A migration certainly involves people with different skills and often entirely different teams, and maybe an outside party. A successful project needs to be well planned and coordinated to execute on a predictable schedule. Version control is a crucial foundation for a migration since database code may not be managed in the same way as application code.
12.Post-Production Support
After the application is live, the migration team may need to stay around for a while to address any emerging problems on the new platform that were not caught by testing.
Automation
This document references the freely available AWS Schema Conversion Tool (AWS SCT) for code conversion and the AWS Database Migration Service for data migration. For more information, see Installing, verifying, and updating AWS SCT and AWS Database Migration Service.
Future State Architecture Design
When you migrate an Oracle application to use a different database like PostgreSQL you must capture the architecture of the existing application to ensure that all considerations are covered, we call that the current state architecture. The current state architecture describes the part of the application that matters to the migration from an architectural point of view. The same is true for the future state architecture which takes the new database platform into account. We don’t need to describe everything, but some things, like external dependencies, are very important, and help us determine what work to do.
You may already have some favorite drawing tools for architecture diagrams such as Lucidchart, Visio or the freely available Diagrams.net which are all great choices as they supports AWS infrastructure symbols along with many others to describe the current and future environment. But the tool is less important than what is captured in the diagrams.
The architecture diagrams also serve the important role of defining the context of what is inside and outside the scope of work as a team collaborates on the task.
Current State Architecture
There may be existing documentation on the database application which should be examined for currency and relevance. Let us review what is important for the migration work before we decide if more documentation is needed.
A network diagram is useful because It typically connects servers to each other, and servers to databases. It may also show the division into multiple availability or disaster recovery zones. This is useful because it shows potential server and network dependencies that must be addressed in the new architecture. A network diagram may also highlight important security considerations like multiple networks and internet connectivity.
A component diagram is useful if the application is comprised of multiple parts using different technologies which each may present the migration with their own challenges to address.
A class diagram is useful if it shows a specific persistence layer or a query factory where the migration can focus while leaving the rest of the application untouched.
A data flow diagram is useful because it directly shows parts of the value chain of information flowing inside and outside the application highlighting what additional code may needs to be changed.
The following image shows a simple network diagram that can help easily communicate current architecture.
Future State Architecture
The future state architecture envisions the application using the new database, and potentially other services in the environment. It’s a new version of the current state architecture diagrams with certain parts replaced with the new components. This document will focus mainly on replacing Amazon RDS for Oracle with Amazon RDS for PostgreSQL or Aurora PostgreSQL.
Transition Architecture
Depending on how involved your migration is, you may need a transition architecture by which we mean, infrastructure that is there only for migration purposes. Examples of transition architecture includes AWS DMS servers and other mediating or transformation platforms. Such infrastructure has to be provisioned, secured and removed after the migration to avoid additional vulnerability and cost.
The following image shows a transition architecture diagram.
For more information, see AWS Well-Architected Framework.
Database Schema Conversion
Relational databases contain a tabular structure for data using basic data types and procedural code in the form of triggers, functions and stored procedures. Oracle uses the PL/SQL dialect which is different from the PL/pgSQL dialect of PostgreSQL and while some table definitions and queries may look the same, others will require modification. Doing so manually would be a substantial task, but fortunately there are freely available AWS tools to automate this job (See Automation section in the Introduction).
The AWS Schema Conversion Tool (AWS SCT) is capable of connecting to Oracle and reading all PL/SQL code directly from the source Oracle database and converting it to PostgreSQL PL/pgSQL. AWS SCT will retrieve the DDL for tables, views, triggers, stored procedures and functions from the database, parse it and generate the equivalent PostgreSQL code.
Based on experience, AWS SCT fully converts 90+% of the database code which leaves less than 10% for the database expert to improve.
Process
At a high level, the database conversion process works like this:
• Download and install AWS SCT (Linux or Windows).
• Download and install Oracle database drivers (you probably have those already).
• Download and install the PostgreSQL database drivers for Amazon RDS or Aurora PostgreSQL.
• Run AWS SCT and create a migration assessment report. For more information, see Creating migration assessment reports with AWS SCT in the AWS Schema Conversion Tool User Guide.
• Run AWS SCT and automatically convert the database code. For more information, see Converting database schemas using the AWS SCT in the AWS Schema Conversion Tool User Guide.
• Fix any warnings and error in the database code conversion.
AWS SCT operates with default assumptions about mappings between Oracle and PostgreSQL which may or may not be optimal for your particular application due to the data you have in the database.
Certain data type mappings may need to be changed to ensure good performance. As an example, a NUMBER datatype in Oracle is an extremely versatile container which without further qualification may be too expensive for the application. In this case you would look at the type of data contained in the NUMERIC column and its requirements for precision and scale, and then determine the best match for that in PostgreSQL with the appropriate precision and scale.
Once AWS SCT has automatically converted the DDL code, the developer needs to investigate any warnings and errors which need manual remediation. Warnings and Errors can happen for many reasons.
AWS SCT does not have 100% coverage of all syntactical situations, and code inside the database can be corrupted or encrypted preventing AWS SCT from reading it. In these situations, the output DDL code is marked up with comments about the problem AWS SCT had with conversion, and ask the developer for help.
Exceptions
There are exceptions to the automatic code conversion by AWS SCT like SQLJ, .NET Stored Procedures, Spatial data, RDF Graphs. But in each case there are good candidate replacement features like Lambda functions, PostGIS and Neptune.
Interactive and Batch Modes
AWS SCT offers both an interactive GUI and a command line interface (CLI) which are useful in different situations. The user interface is good in a more interactive situations where the user needs to explore the schema and perhaps select only part of it for conversion. The CLI is good for automation in situations where DDL code might be coming from a different source such as reports. For more information, see Script/ETL/Report Conversion (p. 20).
Schema Drift
If the original database schema changes during the timeframe of migration, this can be detected in AWS SCT which can compare the old and the new database schema and highlight the object that need to be updated. If an object was converted 100% or with few manual changes, that object can be converted again and remediated.
For more information, see AWS Schema Conversion Tool User Guide, Oracle Database 19c To Amazon Aurora with PostgreSQL Compatibility (12.4) Migration Playbook, and AWS Schema Conversion Tool CLI and Interactive Mode Reference.
Application Conversion or Remediation
The Oracle application may be written in a variety of languages like C++, C# and Java, each with their own patterns for calling Oracle. A common case is the use of an object relational model (ORM) between the application code and the database which reduce the amount of PL/SQL that needs to be changed.
Examples include Entity Framework and Hibernate which are also supported on PostgreSQL.
Oracle uses the PL/SQL dialect which is different from the PL/pgSQL dialect of PostgreSQL and while some table definitions and queries may look the same, others will require modification. Doing so manually would be a substantial task, but the freely available AWS Schema Conversion Tool (AWS SCT).
AWS SCT is capable of identifying and replacing embedded PL/SQL in the application code with the equivalent PostgreSQL code. For more information, see Automation (p. 16).
In addition to using AWS SCT, you must also examine the source code for possible issues like:
• Specific ORM or other data access framework and versions or in use and confirm its compatibility with the target engine.
• Modify database connection as appropriate for the new engine.
• Modify any table/entity mapping configuration or code as appropriate for the converted schema.
• Identify and refactor any vendor-specific driver functionality in use in the code.
Process
At a high level, the application conversion process works like this:
1. Perform the database conversion. This is necessary because the PL/SQL conversion needs to know the schema of the database. For more information, see Database Schema Conversion (p. 18).
2. Run AWS SCT and automatically convert the application code. For more information, see Converting application SQL using the AWS SCT.
3. Fix any warnings and errors in the application code conversion.
Exceptions
There are exceptions to the automated application code conversion process If the application uses the native Oracle Call Interface (OCI). In this case the developer must refactor the code to use ODBC or JDBC.
Script/ETL/Report Conversion
ETL is an acronym that stands for Extract, Transform and Load. The ETL process plays a central role in data integration strategies. ETL allows businesses to gather data from multiple sources and consolidate it into a single, centralized location. ETL also makes it possible for different types of data to work together.
ELT is similar to ETL. However, the primary difference between them is that the data transformation processes occur after the Raw data from the source have been extracted and loaded into a staging area. The transformation of the data may occur in the destination database or in the middle tier or via serverless tools that might reduce the cost of the data processing.
Transforming the data is a critical process that may provide significant value to the data. It’s also the stage where the data could be cleansed, standardized, deduplicated, verified, sorted, shared, and much more.
The role of ELT or ETL in database migration projects is critical for any successful migration.
For the remainder of this document, ETL will also refer to ELT patterns.
ETL can be implemented in the database itself, in external scripts or in third-party tools such as Informatica, Talend, and so on. If the ETL is done using Oracle stored procedure, the freely available AWS Schema Conversion Tool (AWS SCT) is capable of converting the ETL code to AWS Glue. For more information, see Automation (p. 16).
Process for Conversion to AWS Glue
If Python/Glue is a desired future state architecture for ETL code, and the ETL is implemented in the database, the conversion process works like this:
1. Perform the database conversion. This is necessary because the PL/SQL conversion needs to know the schema of the database. For more information, see Database Schema Conversion (p. 18).
2. Run AWS SCT, select the code involved in ETL and automatically convert the ETL code to AWS Glue.
For more information, see Converting ETL processes to AWS Glue.
3. Fix any warnings and errors in the ETL code conversion.
Process for Conversion of Stored Procedures
If ETL or report process is implemented in the database, then the database conversion takes care of converting the code, and only the method to call the stored procedures need to change.
Process for Conversion of Scripts, Reports, and Third-Party ETL
If the ETL or Report code is available in scripts or hosted in third-party tools and those tools will be used in the future, then a custom process will have to be implemented:
1. Perform the database conversion. This is necessary because the PL/SQL conversion needs to know the schema of the database. For more information, see Database Schema Conversion (p. 18).
2. Extract PL/SQL statements from the third-party ETL or reporting tool into flat files, unless already available.
3. Write YAML configuration files for AWS SCT CLI to convert external files.
4. Run AWS SCT CLI on the external scripts using the YAML configuration files.For more information, see AWS Schema Conversion Tool CLI and Interactive Mode Reference.
5. Fix any warnings and errors in the ETL or report code conversion.
6. Insert the converted PL/pgSQL code back into the third-party ETL or reporting tool, unless they stay as flat files.
Integration with Third-Party Applications
Few applications are islands and your Oracle application is likely to integrate with other applications that are not themselves going to be migrated. Examples include ETL, reporting, and monitoring applications for alerts and logs. For more information, see Script/ETL/Report Conversion (p. 20).
If these third-party applications connect directly to the Oracle database, they are going to be affected by the migration. If they are packaged applications, the vendor may offer support for Amazon RDS and Aurora PostgreSQL and if they are custom, you may need to modify them to work with the migrated application. There are a wealth of resources on the partner network which complement any solution from AWS.
AWS native tools such as Amazon Simple Notification Service, Amazon RDS Performance Insights, Amazon CloudWatch, and Amazon Relational Database Service are already integrated with the Amazon RDS and Aurora PostgreSQL database platform and are recommended for a full picture of the ongoing performance.
For more information, see Engage with AWS Partners.
Data Migration Mechanism
For testing purposes and for production cutover, data needs to be migrated from the old Amazon RDS for Oracle instance to the new Amazon RDS or Aurora PostgreSQL instance. Such a data migration requires knowledge of data type mapping and possibly incremental loading, depending on the size of the data and migration window.
For this purpose AWS Database Migration Service (AWS DMS) can be used to connect source and target databases to replicate the contents of the data in the most optimal way.
Process
1. Create a replication server.
2. Create source and target endpoints that have connection information about your data stores.
3. Create one or more migration tasks to migrate data between the source and target data stores.
After you configured AWS DMS, you can perform the following operations:
• A full data migration from Oracle to PostgreSQL.
• An ongoing replication from Oracle to PostgreSQL.
Depending on the type of data in the database, you may need to optimize AWS DMS for handling certain data types like LOBS which you can read more about in the product guidance.
Reverse Migration
Normally you just fall back to the old system if a migration fails during smoke testing, and in most cases you may decide to fix forward after cutover, in which case you fix any unforeseen bugs in the migrated system. But in some cases you may decide to have the option of migrating production data back from the new system to the original system after having been in production for a time. In those cases, a reverse data migration mechanism must be configured.
For more information, see What is AWS Database Migration Service? and Migrating Oracle databases with near-zero downtime using AWS DMS.
Testing and Bug Fixing
Testing can be manual or automated. We recommend that you use an automated framework for testing. During migration, you will need to run the test multiple times, so having an automated testing framework helps speed up the bug fixing and optimization cycles.
Unit Testing
Unit testing at the data level after migration can range from comparing every last bit of data in source and target by comparing extracted CSV files, but more realistically, custom aggregation queries should be constructed to incorporate large amounts of the migrated data and compare the results.
Unit tests validate individual units of your code, independent from any other components. Unit tests check the smallest unit of functionality and should have few reasons to fail.
Database objects need to be validated after migrating the DDL of an Oracle database to PostgreSQL.
Database objects includes packages, tables, views, sequences, triggers, primary and foreign keys, indexes, constraints.
A typical way to perform unit testing on the converted database is to script out calls to stored
procedures and functions and compare the returned data with external tools such as standard Linux/Unix tooling of diff.
The application needs to be validated with new and existing test case scenarios based on documented changes on database objects such as field names, types, minimum and maximum values, length, mandatory fields, field level validations etc.
Functional Testing
Functional testing of the application is done by exercising user stories and comparing the results on the source and target system. This is typically a manual process, but third-party tools do exist to make automated regression tests of the UI (e.g. Selenium).
Functional testing of the database is largely done through the application, but there may be additional direct database use cases that can only be done directly on the database such as ETL for imports and extracts. In these cases, the data can be compared automatically before and after using standard Linux/
Unix tooling like diff on extracted CSV files for example.
Functional testing of reports involve visual inspection to see that all fields are correctly displayed and comparison of the semantic values between the old and the new reports.
Load Testing
In order to stress the migrated system and test its performance you may perform load testing which is typically done on a system that is scaled the same as production and requires a means of simulating load on the system. It is sometimes limited to running specific well-known expensive operations rather than user traffic.
Standard Operating Procedures
Standard Operating Procedures (SOP) may be affected by a migration of database application. Database management procedures change when going to PostgreSQL and some procedures may be unnecessary when going to the highly managed Aurora PostgreSQL.
In any case, all existing operational procedures need to be tested and their language updated to reflect the new environment. For more information, see Managing Amazon Aurora PostgreSQL.
Monitoring
Monitoring of the database will be affected by the migration and some metrics may change which could affect how SLA is monitored. The way operational staff go from detecting a problem to diving into the underlying details may be affected. For more information, see Monitor Amazon RDS for PostgreSQL and Amazon Aurora for PostgreSQL database log errors and set up notifications using Amazon CloudWatch.
Cutover
Cutover procedures are the planned event where everything goes the way you want, but it still needs to be tested.
Fallback
Fallback is when you have both old and new systems in sync with the new one operating as primary and you decide to switch back to the original which is still in sync.
Rollback
Rollback is usually the scenario when you don’t have an ongoing replication mechanism to keep old and new in sync, so in the event of a no-go decision during the cutover, you abandon the new system and go back to the original.
Migrate Back
In some rare cases, you may decide to include the option of migrating production from the new system back to the old system after having the cutover. If you include this scenario, it must be tested.
For more information, see Testing Amazon Aurora PostgreSQL by using fault injection queries, Automate benchmark tests for Amazon Aurora PostgreSQL, Validating database objects after migration using AWS SCT and AWS DMS, and Validate database objects after migrating from Oracle to Amazon Aurora PostgreSQL.
Performance Tuning
Any migration is likely to slightly change the performance of individual queries in the application and in stored procedures and functions. Depending on the context, those small differences may not matter in reality. But it is a good idea to deliberately compare the performance of operations that are known to be slow in the original system because any difference in performance is likely to be greater. Such testing is usually confined to specific long running ETL jobs and reports. Other performance issues may show up during functional or load testing and will be addressed as bugs.
Setup, DevOps, Integration, Deployment, and Security
Deployment to production is the culmination of the migration activity and is a high stakes effort which requires good planning and benefits from well tested automation.
With DevOps, you can create a process that helps easily deploy and update your virtual architecture in a scalable and repeatable way. This reduces the risk of human error. Furthermore, DevOps allow us to deploy much faster than humans which may be a factor in large deployments.
Wave Planning
For any application or cluster of applications there is an important question of sequencing because every cutover window, for example, a weekend can only accommodate so much work. This means that larger portfolio may need to be migrated in multiple waves, and this makes wave planning necessary.
Wave planning considers that some parts of the application will move while other stay behind under different network, connectivity and security conditions. Different parts of the application may also be under different ownership, so wave planning becomes the place where all stakeholders coordinate their efforts. Wave planning is a matter of minimizing risk during the overall migration.
Infrastructure Automation
Infrastructure automation is a code layer that wraps API calls to a cloud provider with commands to provision infrastructure. Such code typically in YAML scripts are easily learned with any coding experience and are scalable and powerful. This layer will allow you to spin up one or one hundred web nodes nearly
simultaneously. This layer is not designed to configure files on a server, install software on a server, or run commands on a server - That comes in the next section.
Terraform represents a cross cloud incarnation of this idea. The downside of Terraform is that its cross- cloud and open source nature makes it slower to adopt new features and provide detailed provisioning, often months after a new feature or configuration is released.
AWS CloudFormation is a native AWS language in json or YAML format. Use AWS CloudFormation to write infrastructure code more specifically to specific features because it doesn’t have to work with other clouds.
Configuration Management
Configuration management systems manages configuration of software and state of files on a server or group of servers. These systems however are capable of much more than that, they also allow functionality such as installing software, running local commands, starting services and more in the same scalable way.
Ansible is a lightweight and easily installed tool that is configured using YAML files. It doesn’t require a local daemon to be installed on the instances that Ansible is managing. The way that ansible does all this is through open source functions that essentially wrap cli commands that are run on remote hosts over Secure Shell Protocol (SSH). This allows for a plethora of functionality from database manipulation to package instillation through simple changes in pre-written functions at the YAML level. Beyond a large library of open source function one can easily write custom functions (in python) or simply use the cli function to run any cli command through ansible remotely and in a scalable fashion. Some environments could be prevented from using Ansible due to limited or highly restricted SSH access to resources due to security protocols and standards.
Puppet works in a primary and secondary system that communicates over https (443) and is configured using its own language called puppet. It’s often found in enterprise level deployments, configuration management platforms based on a locally deployed daemon called a node. Puppet differs from other similar platforms like Chief is its methodology in regards to how a resource acquires a desired state.
Puppet takes a declarative approach, which is to say it defines the end state it requires, but makes not design on how it is achieved. Due to its fairly high level of technical investment in regards to its programming language, puppet is generally not recommended for smaller deployments as the investment.
Chef has a lot in common with puppet like a similar primary and secondary model, they both communicate over https, and are configured using a programming language. Where they differ is in terms of how they handles state. Chef takes an imperative approach, which is to say you as the end user have nearly full control on how a resource acquires a desired state which it achieved through Ruby as its configuration system. This type of deployment provides more flexibility as well as being easy to adopt if you are already using Ruby.
Code Repository
A code repository offers safe storage of code and a change capture log which facilitates parallel development of a codebase by many developers simultaneously with the use of code branches, and integration with CI/CD pipelines. Other files than application source code might be stored in a git repo such as infrastructure as code (IaC), database reports; recorded state changes or even short logs.
Importantly you should never store credentials in the code repository. Credentials should be handled in other ways discussed below to avoid sensitive files being deleted or scrubbed, remnants left behind of the original state. The two most widely used code repositories are GitHub and GitLab with similar features and functionality.
Secrets Management
A vault is for credentials. There are several options when it comes to Vault, many of which are baked into a lot of the technologies already described. Ansible has ansible vault, Jenkins has a functionality
for storing and parametrize credentials which can be used at a smaller scale as a vault. However most of these baked in vaults don’t generally have the effectiveness and capabilities of a dedicated vault.
HashiCorp Vault is a dedicated vault that differs from a secrets manager that comes packaged with another product is its varied capabilities around secret management. HashiCorp is an industry leader in this regards with capabilities such as dynamic secrets that can be generated on the fly for database or application credentials, data encryption, leasing and renewal which always credentials to be expired and rotated as well as the ability to revoke credentials remotely. In general, if an enterprise requires a wide range of credentials stored across a range a technologies, Vault is generally a good option to start.
Orchestration
Orchestration is the glue that binds DevOps together. Many of the described technologies can be executed manually or from a scheduled script, however building on this idea of removing one of the largest points of failures such as humans from the actual deployment and management of infrastructure we can eliminate many of the pitfalls that can arise in the process of executing deployment scripts.
Orchestration allows you to create a repeatable timeline, with logic gates, to deploy your infrastructure in exactly the order with the configurations chosen. This also allows deployments themselves to be tested before a change or deployment to production. Each of the configuration management platforms discussed above generally have an orchestration platform, they are generally focused on scheduling jobs within their particular vertical. For example, AWX for Ansible is mostly limited to scheduling ansible jobs.
Jenkins is managed through a GUI running on a primary node that is deployed on a resource within the company. There is also functionality to allow secondary nodes deployed on micro services for larger scale.
The process is arranged in a series of Groovy files called Jenkinsfile that dictate what action will be taken during each step in the pipeline. These jobs then can be scheduled jobs that periodically run and kick any job, from a ansible jobs that runs periodically to prevent configuration drift, to reporting jobs that execute a series of database calls. Generally Jenkins can connect any modern codebase and technology together.
An example pipeline you might run on Jenkins: use git to pull configuration scripts for the pipeline, then use a Terraform job from the cloned codebase to deploy a server, use Ansible to install and configure a database, then push a status file with information on the pipeline run back to git all while using Vault to manage the secrets for both access right for Jenkins and configuration of users on the database.
Documentation and Knowledge Transfer
As a result of the migration, the operation of the database and the future development of the application and database will have been affected due to infrastructure and technology changes. If the migration is done by a separate team, it is vital that these changes be documented.
There may be a need for additional PostgreSQL training to operate the database and develop for PostgreSQL going forward.
Project Management and Version Control
Experience tells us that the steps take different amounts of efforts across a typical project.
When planning a migration project, tooling like AWS SCT can provide an important data point in the form of how much manual work needs to be done to fix database and application code that was not automatically converted. Using the above rules of thumb shares of the overall project, an initial plan can be created.
Post-Production Support
After production cutover, there are a few possibilities for what can happen. You will have either decided to fix forward as the old system is being decommissioned, or you have decided to way a certain amount of time, a bake-in time, with production on the new system, during which a decision to abandon the new system can be made. Abandoning the new system has the following flavors:
• Roll back, all new data is lost.
• Roll back and reapply all new transactions.
• Roll back and migrate new production data back.
• Maintain a live replication back to the old system until bake-in period is over.
During this time, defects are tracked and triaged for possibly triggering the rollback or being fixed forward. Help desk will have been trained in the new system differences and will be able to detect if an end user inquiry just requires training or it may be a defect.
Beyond acceptance migration criteria, the application may have well defined KPIs defined already which can be observed when in production on the new system and compared to historical KPIs.
For more information, see How to Migrate Your Oracle Database to PostgreSQL on the AWS Database Blog.
Platform Differences
This section discusses some of the differences between Oracle and PostgreSQL to illustrate opportunities and challenges with migrating an Oracle application. This overview is by no means an exhaustive, however these are common challenges you may encounter when administering PostgreSQL after a background with Oracle.
Range and List Partitions
Along with possible performance difference, architecturally partitions on Oracle and PostgreSQL act quite differently. On Oracle you can define a Range, for example each month or year being a partition, or a list, where every occurrence of say the letter “N” or “Y” in a char field is partitioned at the table definition level and PostgreSQL handles these operations differently. PostgreSQL operates with a
“parent” table that holds no data and a “child” table that defines the partitions themselves and holds the data. The parent table is created first, the child tables is then defined with corresponding constraints to create the partition. You must supply a trigger to insert into the parent table and have the data be routed to the correct partition. For more information, see Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility on the AWS Database Blog.
Data Types
Some PostgreSQL data types are much easier to work with than their corresponding Oracle types. For example, the Text type can store up to 1 GB of text and can be handled in SQL just like the char and varchar fields. They don’t require special large object functions like character large objects (CLOBs) do.
However, there are some important differences to note. The Numeric field in PostgreSQL can be used to map any Number data types. But when it’s used for joins (such as for a foreign key), it is less performant than using an int or bigint. This is a typical area where custom data type mapping should be considered.
The PostgreSQL Timestamp with time zone field is slightly different from and corresponds to the Oracle Timestamp with local time zone. These small differences can cause either performance issues or subtle application bugs that require thorough testing.