How Long Will It Take for Dataload to Complete?

Akamai supports the import of 10,000 records per minute when using dataload.py, which equates to 600,000 records per hour. These upper limits can be helpful in planning, but your actual run time can vary depending on the complexity of the records being imported.  For example, records with a lot of plural data take longer to process.  If you find your records-per-minute average is below 10,000, you can try and tweak performance by changing the following arguments:

-b BATCH_SIZE
-w WORKERS
-r RATE_LIMIT

Be careful not to allow too many API calls (-r) per second, as your Akamai Identity Cloud APIs are limited in the number of calls that can be made in one minute, and are limited in the number of concurrent calls at any given time.  Note that this measurement includes all traffic to your Akamai Identity Cloud instance, not just API calls from dataload.  Because of that, you might want to plan your migrations to coincide with periods of non-peak traffic.

A rough calculation that can be used is this:

BATCH_SIZE x RATE_LIMIT x 60 = Number of records per minute.

This calculation assumes that the API response time from entity.bulkCreate is less than WORKERS/RATE_LIMIT.  A larger BATCH_SIZE generally means a higher API response time.  More attributes per record and the inclusion of complex structures like plurals will also increase API response time.

The best strategy to improve your dataload performance is to start by migrating a small sample of test records that are very similar to the format of your actual records and keeping track of how long the migration process takes.  (It’s recommend that you perform test migrations in a non-production environment.)  After the first test, adjust the arguments noted above and repeat as necessary.