How to… Split data fields

One of the most common actions you will take when preparing data for analysis will be to split a column in to it's sub-parts. This happens as data is picked up from operational systems, outputted as a unique ID for that record of data, or, squeezed together to fit in to a poorly designed database table. The human brain is fantastic at spotting patterns in data (that's why we create visual analytics) so you will often spot the need to split data fields (columns) by just looking at the original data.

25_1 Result of splitting Product Code.png

Probably needs to become split apart to form the three separate columns that would help with the analysis of this data set.

Basic Splits

Splitting data in most data tools can be very easy to achieve; Tableau Prep is no different. Simply choose the data field you want to split, go and find the '...' icon, select, 'Split Values' and then 'Automatic Split' and Prep will decide what is the best logic to split the field. 

25_2 Selecting an Automatic Split from a data field’s menu.png

In this case, the automatic split has worked as desired:

25_3 Result of the automatic split from Figure 25-2.png

Prep actually write three calculated fields to perform this task which you can always edit if you want a slightly different result. 

25_4 Calculations in the Change pane resulting from an automatic split.png

You will find these calculations in the Changes pane in Prep. You can also edit them from here, or alternatively learn how Prep completed the task you set it:

25_5 one of the calculations formed during a split opened in the editor.png

This formula is splitting the 'Product Code' field on the '-' separator and pulling back the third part (ie what comes after the second '-'. The resulting values are then 'trimmed' which means any leading or trailing spaces are removed as these can play havoc when matching text values. 

When that doesn't work as intended - Advanced Splits

Let's tweak the data a little bit to give a pattern of delimiters (the character we split the data field by) but hide that a little more. 

25_6 Revised data set with unclear delimiter for Prep Builder.png

The Product Type (first part of the Product Code) has had the word 'Soap' removed and spaces added either side of the hyphen. The results from Prep showed this more irregular pattern doesn't give the results we are looking for when using an Automatic Split.

25_7 Result of automated split of the data set in Figure 25-6.png

The Product Type has disappeared from the data resulting from the split completely. Notice all those blank records? The blanks are the result of the word 'Soap' being removed from the bars as there are no blanks resulting from the 'Liquid Soap' rows. Let's look at the calculation Prep is writing:

25_8 Using the calculator editor to understand the split issue.png

Well that's not clear, is it? 

In these situations, this is where your human eye can take over and ignore the Prep automatic and set your own 'Custom' split. By setting up the Custom Split to work on the hyphen then the results are back to what we'd want from this data set.

25_9 Setting up a Custom split.png

In the Custom Split, not only do you set the delimiter, but you also set what you want returned from the split. Here, let's select 'All' to ensure all the data is returned.

25_10 Result of the Custom split.png

Custom Split can be a very powerful way to save you from having to write a number of complex calculations. Here the third calculation Prep has built is quite complicated but you could simplify if you choose to but no need unless your flow is struggling for performance. 

Where to not use Split

Although Split is a powerful function, there are times when you might want to check it is the right technique for the task you have been set. 

Address Data

25_11 Address data separated by commas.png

Here the address details are separated by columns. However, if you were to split them, due to some 'street' address lines being separated by commas in two parts and some all in one, that section of the address will reside in different columns. Addresses are typically difficult to pick apart and likely take more complex calculations / logic to align the correct parts to the correct columns. If you want to solve this problem have a go at 2019 Week 46 Challenge as split can be part of the solution, but won't be the only part.

No clear delimiter

If there is no clear and logical delimiter to split by, then the split technique is definitely not the one to follow. In this circumstance, more advanced string functions like FIND() or FINDNTH() might be a good approach. Otherwise, regular expressions (also known as regex) functions would be useful. These will covered in a future 'How to...' post. 

Ultimately, split is a great first technique to investigate to break-up data columns to aid your analysis. In these situations Prep has a great couple of options but might not always be the solution and you'll need to form the logic yourself.  

Previous
Previous

How to… deal with Nulls

Next
Next

How to… clean by Grouping data