In this phase, enterprise architects should attempt to understand these important aspects within a migration:
When building on the AWS platform there are trade-offs among various dimensions - cost, durability, query-ability, availability, latency, performance (response time), relational (SQL joins), size of object stored (large, small), accessibility, read heavy vs. write heavy, update frequency, cache-ability, consistency (strict, eventual) and transience (short-lived). Weigh your trade-offs carefully, and decide which ones are right for your application.
Amazon S3 + CloudFront
Good for: Storing large write-once, read-many types of objects, Static Content, Distribution Media files, audio, video, images,
Use Cases: Backups, archives ,Versioning
Not Good for: Querying, Searching
Not used for: Database
EC2 Ephemeral Store:
Good for: Storing non-persistent objects, transient updates
Use Cases: Config data, scratch files, TempDb
Not good for: Storing database logs, customer data
Not used for: Shared drives, sensitive data
Good for: Relational data, Querying, Indexing, Structured data
Uses Cases: Complex apps, Web apps, OLTP
Not good for: Clusters
Not used for: Clustered DB, Simple lookups
Amazon EBS or SSD:
Good for: Off-instance storage, persistences
Uses Cases: Clusters, Boot data, Log, RDBMS data
Not good for: Static data, web facing content, key-value data
Not used for: Content Distribution
Good for: Light Weight Queries
Uses Cases: Query, Indexing, Tagging, Meta-Data, Logs
Not good for: Complex joins, BLOBs, Relational data, Typed data
Not used for: OLTP, DW, OLAP
(Data Storage Options in AWS cloud)
We can also add Redshift a heavy DataWarehousing managed solution, to the above table. Redshift is ideal as a platform basis for Business Intelligence, Predictive Analytics, or complex querying of large data sources. It can also be used as a staging platform to clean, transform and move data into a SaaS data model such as Salesforce.com, when many different sources from different databases are imported, filtered, organized within Redshift and then exported to the target platform.
Post-it Note: If your existing infrastructure consists of Fileservers, Log servers, Storage Area Networks (SANs) and systems that are backing up the data using tape drives on a periodic basis, you should consider storing this data in Amazon S3.
Existing applications can utilize Amazon S3 without a major change. If your system is generating data every day, the recommended migration flow is to point your “pipe” to Amazon S3 so that new data is stored in the cloud right away. Then, you can have an independent batch process to move old data to Amazon S3. Most enterprises take advantage of their existing encryption tools (256-bit AES for data at-rest, 128-bit SSL for data in-transit) to encrypt the data before storing it on Amazon S3.
If you use a standard deployment of MySQL, moving to Amazon RDS will be a trivial task. Using all the standard tools, you will be able to move and restore all the data into an Amazon RDS DB instance. After you move the data to a DB instance, make sure you are monitoring the key metrics of usage and load. It is also highly recommended that you set your retention period so AWS can automatically create periodic backups.
If you require transactional semantics (commit, rollback) and are running an OLAP system, simply use traditional migration tools available with Oracle, MS SQL Server, DB2 and Informix. All of the major databases are available as Amazon Machine Images and are supported in the cloud by the vendors. Migrating your data from an on-premise installation to an Amazon EC2 cloud instance is no different than migrating data from one machine to another.
When transferring data across the Internet becomes cost or time prohibitive, you may want to consider the AWS Import/Export service With AWS Import/Export Service, you load your data on USB 2.0 or eSATA storage devices and ship them via a carrier to AWS. AWS then uploads the data into your designated buckets in Amazon S3.
For example, if you have multiple terabytes of log files that need to be analyzed, you can copy the files to a supported device and ship the device to AWS. AWS will restore all the log files in your designated bucket in Amazon S3, which can then be fetched by your cloud-hosted business intelligence application or Amazon Elastic MapReduce services for analysis.
If you have a 100TB Oracle database with 50GB of changes per day in your data center that you would like to migrate to AWS, you might consider taking a full backup of the database to disk then copying the backup to USB 2.0 devices and shipping them. Until you are ready to switch the production DBMS to AWS, you take differential backups. The full backup is restored by the import service and your incremental backups are transferred over the Internet and applied to the DB Instance in the cloud. Once the last incremental backup is applied, you can begin using the new database server.