Working with Data Transformations

Data transformations are defined in the Python script transformations.py; if you open that file you’ll see a series of functions similar to this:

def transform_plural(value):
   """
   Transform the plural data represented in the CSV as a JSON string into a
   Python object.
   """
   return json.loads(value)

Functions, as you might have guessed, can be used to transform your data migration data (as needed, of course).

This also means that, if you do need to use a data transformation as part of the migration process, you’ll need a function in transformations.py that can perform this feat. That function can be:

  • One of the predefined functions included in transformations.py and used as-is.
  • A predefined function you modified to better fit your needs.
  • A custom function you’ve written yourself. 

Regardless, it needs to be a function.

For example, suppose your customers previously had the option of selecting one of three values for their country of residence: United StatesCanada, or Mexico. Let’s further suppose that, for their new Akamai user profiles, you’d prefer that country codes (USCAMX) be used instead. Without going into the details of writing Python functions (something that goes beyond the scope of this documentation) here’s a custom function (transform_country) that converts country names into country codes:

def transform_country(value):
   """
   Transform country string values into country codes.
   """
   if value == "United States":
       return "US"
   elif value == "Canada":
       return "CA"
   elif value == "Mexico":
       return "MX"
   else:
       return None

All you have to do is add that function to transformations.py and you’re good to go.

Well, OK: almost good to go. In addition to creating (or modifying) a function in transformations.py, you also need to call that function from dataload.py. Note that some transformations are enabled (and called) by default:

reader.add_transformation("password", transform_password)
reader.add_transformation("birthday", transform_date)
reader.add_transformation("profiles", transform_plural)

For other transformations, you’ll need insert a similar line of code, specifying the full path of the user profile attribute being transformed as well as the name of the function. For example, if data for the primaryAddress.country attribute needs to be transformed by calling the function transform_country you would add this line:

reader.add_transformation("primaryAddress.country", transform_country)

At that point you are good to go. When you run dataload.py, you should end up with users who have country entries similar to this:

By the way, if you don’t want to call one of the default functions, either delete that line of code from dataload.py, or “comment out” the function call by prefacing the line with a hashtag (#). For example, if you comment out the birthday transform your function calls will look like this:

reader.add_transformation("password", transform_password
# reader.add_transformation("birthday", transform_date)
reader.add_transformation("profiles", transform_plural)

Because the script line has been commented out, the transform_date function will not be called if your data migration script processes the birthday attribute.

And that’s a good question: what if you’d like to call the same function more than once? For example, suppose you have two additional date values – custom attributes named subscriptionDate and subscriptionExpirationDate– that require the same date transformation that the birthday attribute does. That’s fine; just call transform_date three times, once for each attribute:

reader.add_transformation("birthday", transform_date)
reader.add_transformation("subscriptionDate", transform_date)
reader.add_transformation("subscriptionExpirationDate", transform_date)