When on the Transform & Map stage of setting up a Dataflow, you also have the option of applying a Column Filter to it. This will allow you to directly select what datapoints the system is allowed to use, and with two different types of filters available, you have a broad set of options to draw on - be it to narrowly target a dataflow, or simply to cleanse it of invalid data.
1. Static Filters
When adding a Filter to your Dataflow, one option - the simplest and most straightforward - is the Static Filter, selected via the Add static filter button. When this is selected, you get three fields - first, you select the Column you wish to filter. This can be one of the original, downloaded columns, or a Custom Column created via the Transform & Convert feature. Next, you select the Operator - the method by which it will be filtered. Your options there are dependent on the column's data-type, and it's worth keeping in mind that any newly-downloaded column will be of the String type. String-type columns do offer a wide variety of filtering options, which will be explained in more detail later in this guide - however, there are others. For example, only columns of the type DateTime, Decimal or Integer can filter based on Greater Than, Less Than, Greater Than Or Equal, or Less Than Or Equal. Meanwhile, several other datatypes will only offer the options of Equal or Not Equal.
Third is the entry-field, where you can enter a static value or text, which will be used for the comparison. This may allow you to, for instance, filter out any Price that is Less Than 10, or any In-Stock Boolean that is equal to False. When you have made your selection, simply select Apply. Those columns that do not match your requirements will be filtered out when the Dataflow runs - you will be able to see the number that were removed, and the number that were let through, when you perform your first test-run, or later by selecting View Details on an active Dataflow.
The most common usecase for the Static Filter is to target a particular event-type when streaming from a website - enabling the data to be sorted into Pageviews, Add To Basket, Buys, and so on. Another option is to filter out various types of invalid data, preventing issues and inaccuracies down the road - having issues with people inputting GUIDs as Customer IDs when the system is only set up to handle phone-numbers in that role? You can simply filter the input-data based on whether or not the number in that column contains a dash...
2. Relational Filters
When adding a Filter to your Dataflow, a more advanced option is the Relational Filter, selected via the Add relational filter button. When this is selected, you get three fields - first, you select the Column you wish to filter. This can be one of the original, downloaded columns, or a Custom Column created via the Transform & Convert feature. Next, you select the Operator - the method by which it will be filtered. Your options there are dependent on the column's data-type, and it's worth keeping in mind that any newly-downloaded column will be of the String type. String-type columns do offer a wide variety of filtering options, which will be explained in more detail later in this guide - however, there are others. For example, only columns of the type DateTime, Decimal/Currency or Integer can filter based on Greater Than, Less Than, Greater Than Or Equal, or Less Than Or Equal. Meanwhile, several other datatypes will only offer the options of Equal or Not Equal.
The third field allows you to pick a second Column. This must be of the same data-type as the first, and will be compared directly to the first. This allows for a separate column to be used purely for double-checking all your input-data, filtering out flawed or erroneous datapoints - or using existing columns for the same purpose. Cleaner data-input results in better recommendations and more reliable results! When you have made your selection, simply select Apply. Those columns that do not match your requirements will be filtered out when the Dataflow runs - you will be able to see the number that were removed, and the number that were let through, when you perform your first test-run, or later by selecting View Details on an active Dataflow.
By far the most common usecase for this feature is to filter items that are on sale - either to get only on-sale items, or to remove on-sale items. In either case, a comparative filter based on the sale-price and original price will do the trick.
3. Filter Operators
The Static and Relational Filters share the same pool of Operators - which are also used for filtering in many other sections of Raptor's systems, so you may well be familiar with them already. Some, however, are unique to this feature, and for the sake of convenience, all will be covered here.
Equals - available for all Datatypes: The two values must be exactly identical.
Does not equal - available for all Datatypes: The two values must not be exactly identical.
Contains - available for String only: The first value must contain the second value.
Does not contain - available for String only: The first value must not contain the second value.
Starts with - available for String only: The first value must start with the second value, be it a number, letter or special symbol.
Ends with - available for String only: The first value must end with the second value, be it a number, letter or special symbol.
Does not start with - available for String only: The first value must not start with the second value, be it a number, letter or special symbol.
Does not end with - available for String only: The first value must not end with the second value, be it a number, letter of special symbol.
Is one of - available for String only: The first value must match one of a list of comma-separated values, like an expanded version of Equals. Mostly applicable to Static Filters.
Is not one of - available for String only: The first value must not match any of those in the comma-separated list of values, like an expanded version of Does Not Equal. Mostly applicable to Static Filters.
Has length - available for String only: The first value, be it a word or a number, must have a length equal to the second value. A number is required for the second value. Mostly applicable to Static Filters.
Is greater than - available for DateTime, Decimal, Currency and Integer only: The first value must be higher than the second value.
Is less than - available for DateTime, Decimal, Currency and Integer only: The first value must be lower than the second value.
Is greater than or equal to - available for DateTime, Decimal, Currency and Integer only: The first value must be higher than or equal to the second value.
Is less than or equal to - available for DateTime, Decimal, Currency and Integer only: The first value most be lower than or equal to the second value.