Mac Numbers Categorize based on partial match

I presently use the following formula in Excel to categorize bank transactions by doing a partial match on the transaction descriptions. Unfortunately, this formula does not directly import from Excel to Numbers and being so, I was wondering if there is a way to either make it work in Numbers or achieve the same result by using a different formula. I understand that I can use the filter option in Numbers, however having a formula categorize into a cell would be preferred.


=INDEX(Categories!$B$2:$B$76,MATCH(TRUE,ISNUMBER(SEARCH(Categories!$A$2:$A$76,Transactions!G2)),0))


Categories!$B$2:$B$76 is the reference category range.

Categories!$A$2:$A$76 is the range of text to partial match for the applicable category.

Transactions!G2 is the transaction description the partial match matches against.



Posted on Oct 25, 2022 2:44 AM

Reply
Question marked as Top-ranking reply

Posted on Oct 26, 2022 5:50 AM

I just saw that! I put it in as a feature request yesterday but apparently it was already finished and ready to distribute. It makes my method (and a number of other problems) a lot easier. Here it is now:



Last row of Table 2 is a footer row. It can be hidden after everything is set up.

Formula in that row is =TEXTJOIN("|",1,A)


Formula in cells of Table 1::E is =IFERROR(XLOOKUP(REGEX.EXTRACT(LOWER(C),LOWER(Table 2::B$19)),Table 2::A,Table 2::B),"")


For those who want to know how it works,

The textjoined string is a "regular expression" with the "OR" character between each substring. When used with REGEX.EXTRACT on the longer string in Table 1::C, REGEX.EXTRACT will try to "extract" one of those substrings from the longer string. The result of REGEX.EXTRACT will be one of the substrings (or an error if none are found). That substring is then used in XLOOKUP to search Table 2::A and return the category from Table 2::B.

The LOWER function is used in the formula because REGEX.EXTRACT is case sensitive. It ensures all characters being compared are lowercase. IFERROR is for when no substring can be "extracted" which can happen if there is no entry in Table 1::C or if there is no matching substring in Table 2::A.


I tested it with 2400 entries in Table 1 and 200 entries in Table 2 with no noticeable slowdown.


19 replies
Question marked as Top-ranking reply

Oct 26, 2022 5:50 AM in response to Skylas

I just saw that! I put it in as a feature request yesterday but apparently it was already finished and ready to distribute. It makes my method (and a number of other problems) a lot easier. Here it is now:



Last row of Table 2 is a footer row. It can be hidden after everything is set up.

Formula in that row is =TEXTJOIN("|",1,A)


Formula in cells of Table 1::E is =IFERROR(XLOOKUP(REGEX.EXTRACT(LOWER(C),LOWER(Table 2::B$19)),Table 2::A,Table 2::B),"")


For those who want to know how it works,

The textjoined string is a "regular expression" with the "OR" character between each substring. When used with REGEX.EXTRACT on the longer string in Table 1::C, REGEX.EXTRACT will try to "extract" one of those substrings from the longer string. The result of REGEX.EXTRACT will be one of the substrings (or an error if none are found). That substring is then used in XLOOKUP to search Table 2::A and return the category from Table 2::B.

The LOWER function is used in the formula because REGEX.EXTRACT is case sensitive. It ensures all characters being compared are lowercase. IFERROR is for when no substring can be "extracted" which can happen if there is no entry in Table 1::C or if there is no matching substring in Table 2::A.


I tested it with 2400 entries in Table 1 and 200 entries in Table 2 with no noticeable slowdown.


Oct 25, 2022 5:56 AM in response to Skylas

Skylas wrote:

a way to either make it work in Numbers or achieve the same result by using a different formula.


Yes, you can do this in Numbers by building your reference categories and search text for partial matches right into the formula, something like this.




I used option-return in the formula editor to separate each search text-category pair onto its own separate line. That makes the formula easier to edit.


Each pair takes this form:


COUNTMATCHES(reference to cell with the description text,"the search text"),"the associated category",


To add a new pair just add a line before the line with TRUE at at the end. Be sure to include the trailing comma followed by option-return.


The formula above is like this (unfortunately the forum software doesn't like option-returns but you can add them yourself at the appropriate places as shown in the screenshot):


=IFS(COUNTMATCHES(A2,"John Food"),"Food",COUNTMATCHES(A2,"Home Depot"),"Hardware",COUNTMATCHES(A2,"Storage Depot"),"Storage",COUNTMATCHES(A2,"Burger King"),"Food",COUNTMATCHES(A2,"Amazon"),"Other",TRUE,"")


COUNTMATCHES here acts like a "contains" operator.


I haven't run into a practical limit to the number of lines of pairs in the formula. This is not quite as easy as simply typing search terms and category names into ranges of cells. But simple enough to be practical.


SG

Oct 25, 2022 6:01 AM in response to Skylas

There was a recent thread like this. The Excel formula you posted is an "array" formula. Array formulas can do in one cell what requires a column (array) of calculations in Numbers. In Numbers, while it is easy to find a substring in column of longer strings, it is not so easy to match a long string to a column of substrings. You can find "ford" in a column of strings like "xyz ford pdq" but not the other way around.


Below is the link to the other thread. See if it makes any sense. It is the exact same problem, just different columns/tables. If not I'll rework it for you.

Numbers - IF Contains Specific Text then … - Apple Community


EDIT: I see SGIII posted while I was writing. His solution was also in that other thread.


Oct 25, 2022 4:19 AM in response to Skylas

Hi Skylas,


Some thoughts;

  • click in any spare cell and type = to bring up the Function Browser. Type search to see this:


wild cards are allowed. Scroll down to see some examples.


  • Take a look at Template Chooser > Personal Finance > Personal Budget for examples of using Categories.

We can adapt that Personal Budget to allow it to choose partial matches.


  • Please show us some of your examples where you want partial matches from transactions.


Regards,

Ian.





Oct 25, 2022 6:22 AM in response to Badunit

Hi Badunit,


Thank you for the reference thread.

This is indeed the same issue.


I had previously arrived at the conclusion "You can find "ford" in a column of strings like "xyz ford pdq" but not the other way around." when using Numbers but thought it was me that was missing something.

Considering this, using SGIII's solution would indeed be the most effective option.

Oct 25, 2022 4:59 AM in response to Yellowbox

Hi Ian,


Thank you for the quick follow up.


I have tinkered with the Search function in the past but it always seems like the "search-string" value must be a single specific value and not a range.


In regards to the Personal Budget Template option, I do actually use a modified version of this template. However, instead for adding each bank transaction manually and then manually assigning a category, I import a table of transactions from the bank which, via Excel with the formula above, become auto categorized when imported. I would like to eliminate the need for Excel and have the transactions become auto categorized in Numbers.


Example Below from Excel

Oct 25, 2022 5:56 AM in response to SGIII

Hi SG,


Thank you for taking the time and providing an alternate solution.

However, I too have tried this method but it becomes rather cumbersome having to edit the nested IF formulas each time either a category or a reference text is added or changed. With the Excel formula method, I just add or delete the reference text and/or category on the fields on the right in the example and everything just updates accordingly.



Oct 25, 2022 6:06 AM in response to Skylas

"becomes quite cumbersome having to edit the nested if formulas"


Please note that there are NO "nested IF" functions here.


This approaches use the IFS function. The IFS function eliminates nesting. It enables simple addiction, subtraction, or modification of search term-category pairs, just as you would do with the cell ranges.


It's not quite as convenient as having ranges of cells as in Numbers. But not hard at all. I suggest trying it before rejecting it. 😀 It works very well here.


And no clutter of extra tables and such.


SG

Oct 25, 2022 6:11 AM in response to SGIII

Hi SG,


Sorry, I shouldn't have said nested IF functions since indeed it is not the case in your formula.

However, the premise is comparably the same in the sense that the formula must be physically edited and propagated throughout all transactions lines when ever a new condition is applicable.


Using the example above with only a couple transactions your solution would indeed be suitable but, for a larger set of transactions updating the formulas each time, although possible, it is not a very straightforward and clean solution.

Because I have tried in the past is the reason why I am not open to it as a suitable solution.


This is actually the reason why I ended up having Excel be an in between process prior moving the data into Numbers.



Oct 25, 2022 6:22 AM in response to Skylas

Using IFS is MUCH easier than nested IFs, which are a nightmare.


Yes, as you say, the formula must be filled down again after a modification. For me that takes a couple of seconds at most. (And of course the filling often has to be done anyway after a new import.)


I use this approach with about a dozen, occasionally changing, categories and search terms. Really not much trouble.


On the Mac, with the formula editor that allows you to keep the pairs on separate lines, this approach is only marginally less convenient than setting up an Excel array formula and maintaining search terms and category names in separate ranges.


Give it a try!😀


SG



Oct 25, 2022 6:34 AM in response to SGIII

Hi SG,


As mentioned, I have tried and can from experience tell you that for me the quickest solution at the moment is having an excel template file where I simply drop the transactions and then the category column just becomes populated.

If there is a cell that doesn't then I just add a line to the reference list.; no adjusting formulas and no repopulating of calculation fields.


Also, by having the reference text and the categories on their own set of columns it is simply just easier to review than examining the internal text of a formula especially if you have over 20 conditions. Unless of course, you start organizing the conditions in some sort of order, such as alphabetical, in which case here too the column arrangement is much less cumbersome since you can simply sort the columns.

Oct 25, 2022 6:51 AM in response to Skylas

Whichever works best for you.


The IFS formula might get unwieldy if you have a lot of entries. I know on my iMac it gets difficult to edit the formula when the list gets long. It gets long delays. It was getting pretty bad with 80 entries. My solution may also slow down, though, I didn't test it for that.


I hope one day we get Excel's TEXTJOIN function which would eliminate the need for that column of concatenations in my solution. TEXTJOIN can concatenate an entire range, with a delimiter between each item. Of course, you could do CONCATENATE(C2,"|",C3,"|",C4,"|",C5,"|",...,C100) but that is painful and would get mixed up if you sorted the table.

Oct 25, 2022 7:05 AM in response to Badunit

Hi Badunit,


Performance is indeed another aspect to be considered in the solution.

I personally did not test the IFS solution with as many conditions I presently have on the Excel template, currently 92, but I can imagine it would get pretty sluggish running the IFS solution over say 500 transactions...


Over the years I have stumbled on other Number's limitations and always wonder if it is just me that is missing something but I guess it is just the way it is...


Hopefully, someone who is able to modify things reads some of these threads.

Oct 25, 2022 9:33 AM in response to Skylas

Skylas wrote:

can imagine it would get pretty sluggish running the IFS solution over say 500 transactions...


Each IFS refers to only one transaction. The number of transactions is not relevant.


If you have a huge number of search term-category name pairs then there might be some sluggishness. On my machine it is not sluggish.


Might want to actually give it a try!😀


SG





This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Mac Numbers Categorize based on partial match

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.