Not So Wild Wildcard

Michael Kearney earned “Ace” standing for solving a vexing mystery. We first suspected a problem when Admins reported variable results when targeting for specific counties. Typically we use the Action Network wildccard “%” to cover inconsistencies in how data was entered. For example, 18940 and 18940-2418 are both correct zipcodes for my home in Newtown, PA. But if I target 18940, I will miss residents who provided zip plus 4.


Use of “%” as a wild card solves that problem. But is doesn’t work when applied in searches for a text field value like a county name. For example, “Bucks” and “Bucks ” (a space after the s) and “Bucks County” are not all captured by using the wild card. The trailing space fools the wild card and our targeting would omit those records

Michael delved deep into the mystery and found some very peculiar properties of the logic used by the AN system. He found a way to make the logic work. So then we had a discussion about whether to clean up the input data to remove the offending hidden problems, or to modify the logic statements that make ladders and targeting work. Michael’s thinking is so clear, I can’t resist simply quoting him:

Before we proceed, we should decide how stringent we want to be about the County field values, i.e. how varied can they be before we send them to the ERROR pile. Because I understand how all this works now, we have considerable flexibility. Some possible choices:

1. Very strict: the County field value must match the  county name exactly, with no leading or trailing characters of any kind allowed, including blanks or other white space (tab, etc.)

2. Strict: leading and trailing blanks allowed, but name must match exactly, e.g. 

3. Modest: the field value must contain the county name somewhere

4. Variations of the above that are case insensitive (Radnor matches RADNOR)

The tradeoff is that the stricter you make the rules, there will be added delay getting people into the right network group while errors are being corrected. Nonetheless, I’m inclined to recommend no 1. Allowing even minor variations in the data makes querying it considerably more nuanced than I had originally realized. I’m attaching a table that begins to document this. For example AN’s “%” wildcard is not what it may seem.

There are specific regular expression sequences that can be used to enforce each of the options above. Once we have the decision, I’ll provide the spec. 

There are also additional subtleties regarding the OR (|) function that we need to address. More on that later.  

MK

However, if values remain with leading and trailing blanks, then we cannot avoid the \ kinds of query parameters, as my examples in the attachment show, if we want admins to be able to pull all cases. I expect this sort of thing is partially responsible for the “there are people missing from my query” kinds of questins we get.

Given your concerns, I suggest we take the following approach:

1. Use rule 2. The regular expression to handle this is what we used in the test,  add \s* to the beginning and end of the county name in BOTH the name groups connected by ORs and in the individual county branches.

2. Recommend selection by county group rather than county, with the caveat that it may return folks who don’t live in the county (e.g I subscribe to Monto and Chesco, too)

3. Periodically go in and clean up the County field, sort of like I did for phone, and at least take care of the easy cases.

4. Tag error exit points of the ladder uniquely, so we know where it failed.

Finally, VERY annoying factoid about OR:

A|B is not the same as B|A

the last string in the sequence must be an exact match, while all the rest are not. This accounts for the behavior you saw where the process enters a branch but fails on the second test. Always adding the \s* at both ends resolves the ambiguity by making clear WHAT we want rather than relying on ambiguous and inconsistent defaults.

~ Michael Kearney, May 2019

I will edit the logic of the PPO ladder to reflect Michael’s recommendations. I’ll also publish the suggestion to AN admins — use the county group list to target subscribers in that county because it’s likely to be easier and more complete than constructing a query to include both Action Network county and FDPA inferred county.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.