top of page

Transferring Hundreds of TBs of Data from AWS S3 to Google Cloud Storage: A Cloud Architect’s Guide

  • Karan Chhatwani
  • Aug 25
  • 7 min read

Written By: Karan Chhatwani


ree

Migrating petabyte-scale datasets between clouds is no small feat. As a Google Certified Professional Cloud Architect, I’ve often seen customers struggle when planning large-scale transfers, especially when dealing with hundreds of terabytes spread across multiple AWS S3 buckets.

In this blog, I’ll walk you through how to strategically transfer hundreds of TBs from


AWS S3 to Google Cloud Storage buckets, covering:

  • Pre-migration considerations & AWS-side prerequisites

  • Tools and approaches available on Google Cloud

  • Cost, performance, and reliability trade-offs

  • Best practices from the field.


Step 1: Pre-Migration Considerations

Before jumping into tools and scripts, align on the following:


Data Assessment

  • How many buckets? | How many objects (millions vs billions)? | Object sizes (lots of small files vs large files)?

Business Needs

  • Is downtime acceptable or do you need continuous sync? | Is there compliance (PII, HIPAA, GDPR) to consider? | Do you need version history preserved?

Costs

  • Egress charges from AWS S3 → data transfer out of AWS. | Network interconnect or VPN costs | Storage charges on Google Cloud.


Step 2: Preparing on the AWS Side

On the AWS side, there are a few must-dos:


  • IAM Setup: Create a dedicated IAM role with limited access (For example: just read-only permissions to S3 buckets).

  • Inventory Your Data: Use AWS S3 Inventory reports to understand the number of objects, sizes, and distribution across buckets.

  • Enable Versioning Checks: If versioning is enabled in S3, decide if you want to migrate all versions or just the latest.


A successful migration starts with AWS IAM and bucket setup. Storage Transfer Service (STS) needs specific permissions to access your S3 buckets. Assign these to a dedicated IAM role with least-privilege access. Required AWS S3 IAM


Permissions for Storage Transfer Service (STS):

- s3:ListBucket— Allows STS to list objects in the bucket.— Always required.

- s3:GetObject— Allows STS to read objects in the bucket.— Required if transferring the current version of objects.

- s3:GetObjectVersion— Allows STS to read specific object versions.— Required if your manifest specifies versions. Otherwise, use `s3:GetObject`.

- s3:DeleteObject— Allows STS to delete objects in the bucket.— Required if `deleteObjectsFromSourceAfterTransfer = true`.


Pro tip: Create a dedicated IAM role (e.g., STS-Migration-Role) with only these permissions and grant STS cross-account access.


Example below:  

ree

Inventory Your Data

  • Use AWS S3 Inventory to generate reports on object counts, sizes, and metadata. This will help estimate transfer times and costs.


Pro Tip: If your AWS S3 data is in Glacier, first run an `AWS Batch Operations` Restore job to bring data back to Standard or Infrequent Access before the GCP transfer. Run it twice , once with a fast retrieval tier (for recent critical data) and once with standard retrieval (for bulk archival). This way, your migration doesn’t stall waiting for Glacier restores.


Handling Files Larger Than 5 GB: When transferring very large objects (files greater than 5 GB) from AWS S3 to Google Cloud Storage, you need to take special care. While smaller files can be copied in a single operation, files above 5 GB require multipart uploads and downloads to ensure reliability and performance.


Why this matters:

  • A single transfer of files larger than 5 GB may fail or timeout.

  • Multipart transfers break the file into smaller chunks and reassemble them at the destination.

  • This allows parallelism, faster transfer speeds, and automatic retry on failed parts.

  • You can use either Batch Operations (if there are a lot of buckets with larger size or just use this simple bash script for single bucket with larger files)


ree

  • stat -c%s "$file" gets file size in bytes.

  • split -b 5G -d file part- breaks large files into 5GB chunks (part-00, part-01 …).

  • && ensures success message runs only if previous command succeeded.

  • || ensures error message logs if previous command failed.

  • Keeps a log file for transparency.


Pro Tip: Always preprocess and split large files (>5GB) before starting a bulk transfer job. This ensures the transfer pipeline doesn’t fail midway. For AWS S3 specifically, you can use aws s3 cp --expected-size for multipart uploads, but preprocessing at the source saves retries and bandwidth costs.


Versioning Decisions

  • If S3 versioning is enabled, decide whether to migrate all versions or just the latest. This highly impacts permissions and transfer volume.


Step 3: Choosing the Right Google Cloud Tool

Google Cloud offers multiple ways to migrate data. The right choice depends on scale, bandwidth, and use case. The best tool in our scenario is:


Storage Transfer Service (STS) — Recommended for Most Migrations

  • Fully managed, serverless service.

  • Can pull data directly from AWS S3 into GCP Cloud Storage Buckets.

  • Supports scheduling, incremental syncs, and metadata preservation.

  • Handles retries and parallelism automatically.

Best choice for multi-TB multi-bucket transfers with minimal operational overhead.


ree

Managed Private Network (MPN) with STS — Optimize Cost and Performance

One of the biggest challenges in large migrations is AWS egress cost. Transferring 500 TB from S3 over the internet could cost ~$25,000–$45,000 in AWS data transfer fees.


The solution? Managed Private Network (MPN).


What is MPN?

  • A Google-managed private transfer network between AWS and GCP. Instead of paying AWS egress fees, you pay a per-GiB transfer rate to Google Cloud. You may still incur AWS request charges (e.g., LIST, GET), but no bulk egress charges. This managed private network is designed to simplify the cost structure for large-scale migrations by eliminating the variable and often high AWS egress fees in favor of a flat, predictable fee from Google.


ree

Benefits:

  1. Massive Cost Savings: No AWS egress charges; predictable per-GiB billing from Google Cloud.

  2. Performance: High-throughput private paths for large datasets.

  3. Security: Traffic never touches the public internet.

  4. Fully Managed: No need for Direct Connect or Interconnect setup.


Limitations:

  1. Shared Bandwidth: Transfers over MPN share capacity across projects. Speeds may vary during peak usage.

  2. Large Files Impact: Large-object transfers may slow more than smaller-object workloads.

  3. Limited Support: Available only via: Google Cloud Console (UI option: Managed Private Network), REST API (managedPrivateNetwork field), Not supported via gcloud CLI or client libraries.


Supported AWS Regions for MPN:

Currently, MPN supports the following AWS regions:

  • Asia Pacific: ap-east-1, ap-northeast-1, ap-northeast-2, ap-northeast-3, ap-south-1, ap-south-2, ap-southeast-1

  • Canada: ca-central-1, ca-west-1

  • Europe: eu-central-1, eu-central-2, eu-north-1, eu-south-1, eu-south-2, eu-west-1, eu-west-2, eu-west-3

  • United States: us-east-1, us-east-2, us-west-1, us-west-2


Pro tip: If your source S3 buckets are in non-supported regions, replicate them to a supported region first and then use MPN and you’d still save costs. 

For example:Transferring just 5 TB of data from AWS S3 to Cloud Storage bucket without MPN will have nearly $300–$350 of difference (rough estimates, might vary depending on the region).


Using Google Cloud Platform’s STS with best practices:


Step 1: Store AWS Credentials in Secret Manager

Go to Secret Manager in Google Cloud Console and create a new secret with the following format:


ree

Once created, note down the resource name (e.g., projects/1234567890/secrets/aws-s3-creds).


Pro Tip: Using Secret Manager avoids hardcoding AWS credentials in STS jobs. You can also set rotation policies to auto-expire credentials after migration.


Step 2: Enable Storage Transfer Service

Enable the Storage Transfer API in your project and ensure you have sufficient IAM permissions (roles/storagetransfer.admin).


Step 3: Configure the STS Job

  • Source: AWS S3 bucket

  • Destination: GCP Cloud Storage bucket

  • Authentication: Select Secret Manager and point to the secret created in Step 1.

  • Transfer type: One-time or recurring schedule, depending on your use case.

ree

Step 4: Enable Managed Private Network

When configuring your STS job, enable the Managed Private Network feature. This ensures data moves via Google’s private backbone, reducing reliance on costly public internet egress.


Pro Tip: If you’re moving petabytes of data, Managed Private Network can save tens of thousands of dollars in egress fees compared to regular transfers. Always benchmark costs before large migrations. You can use both AWS calculator (for egress costs) and GCP calculator for storage and transfer costs.


Step 5: Specifying a User-Managed Service Account for STS

By default, STS uses a Google-managed service agent to access buckets, but you can delegate access to a user-managed service account. This lets you enforce finer-grained permissions and improve security across multiple transfer jobs.


ree

Why Use a User-Managed Service Account?

  • The default service agent needs broad bucket-level permissions across your project.

  • With a user-managed service account (UMSA), you can limit access to specific S3 or GCS buckets.

  • Permissions are scoped per transfer job or per user, reducing risk and improving auditability.


Setup Steps

1. Create or identify a user-managed service account (UMSA)

Make or select an existing service account, e.g.:

ree

2. Grant your user access to that UMSA

Let authorized users (e.g., engineers or CI/CD systems) use this UMSA:

  • Google Cloud Console → IAM → select UMSA → Add Principal → Grant roles/iam.serviceAccountUser to user Or using gcloud:

ree

3. Grant the STS service agent the ability to impersonate your UMSA

Find STS’s own service agent email:


ree

In the Console IAM for your UMSA, Add Principal → that email → Grant roles/iam.serviceAccountTokenCreator.

Or, using gcloud:

ree

4. Grant the UMSA access to your buckets

Assign appropriate IAM roles to the UMSA on your source and destination buckets, such as:

  • roles/storage.objectViewer – to read from source.

  • roles/storage.objectCreator or roles/storage.objectAdmin – to write to destination.

These roles should only apply to specific buckets, not project-wide.


5. Create an STS transfer using your UMSA:

  • When building your transfer job, under Service account type, choose User-managed service account and enter the UMSA’s email.

  • REST API (serviceAccount field):

ree

Pro Tip: Use this approach to segment transfer responsibilities, for example, developers work only with dev-data buckets using their own UMSA, while ops handles prod-data with separate UMSAs. This separation aligns with least-privilege governance and audit traceability.


6.Monitoring & Optimization

  1. Use Cloud Monitoring metrics for STS jobs to track throughput, failures, and retries.

  2. Set up budget alerts to monitor unexpected costs during transfer.


Pro Tip: If transfers fail repeatedly, check object-level ACLs in S3. Sometimes restrictive ACLs block reads even if IAM permissions look correct.

This approach ensures:

  • No accidental credential leaks (thanks to Secret Manager).

  • Egress costs are minimized.

  • Data is transferred over Google’s private backbone for better performance.


7. Best Practices for Smooth Migration

  • Start Small → Pilot with a 1–2 TB dataset before scaling up.

  • Preserve Metadata → STS can copy timestamps, ACLs, and storage class metadata.

  • Plan Cutover → If apps actively use S3, consider dual-write during migration and a final sync at cutover.

  • Automate → Define transfer jobs via Terraform or REST API for repeatability.

  • Document Everything → Track which buckets were migrated, when, and by which method.


Conclusion:

Transferring 400–500 TB from AWS S3 to Google Cloud Storage is absolutely feasible with the right planning.

  • Use Storage Transfer Service for reliability and scalability.

  • Enable Managed Private Network (MPN) wherever possible to eliminate AWS egress fees and achieve predictable costs.

  • Plan AWS IAM permissions carefully, and validate with a small pilot run before migrating production data.


With the right architecture and tooling, you can complete even petabyte-scale cross-cloud transfers securely, cost-effectively, and with minimal operational burden. Whether you’re running a one-time migration or setting up recurring syncs, these steps and the insider Pro Tips will help you avoid pitfalls and optimize your cloud-to-cloud transfer strategy.


This blog is my first in a series on cross-cloud migrations, where I’ll share hands-on architectural insights to help you move workloads to Google Cloud with confidence.


Stay tuned!



 
 
 

Comments


bottom of page