Tom’s Tutorials For Excel: Fuzzy Matching With Partial Strings For Sum or Average

Tom’s Tutorials For Excel: Fuzzy Matching With Partial Strings For Sum or Average

When you have a mish-mash of alphanumeric strings in a list, here’s how you can sum or average their corresponding numbers based on a partial string criterion.



The picture shows two examples with the criterion in cell B4.

The example on the left evaluates each item in column B for the presence of “354” and sums for column C. The cells that match are B13, B16, and B19, so their corresponding numbers in cells C13, C16, and C19 sum to 168 in blue-colored cell C4, with array formula
=SUM(IF(ISNUMBER(FIND(B4,B7:B19,1)),C7:C19)).

The example on the right evaluates each item in column B for the presence of case insensitive “ATLAS” and averages for column C. The cells that match are B7, B8, B14, B15, and B19, with numbers in C7, C8, C14, C15, C19 averaged in pink-colored cell C4, by array formula
=AVERAGE(IF(ISNUMBER(FIND(B4,B7:B19,1)),C7:C19)).

Recall, array formulas are applied to a cell by simultaneously pressing the Ctrl+Shift+Enter keys, not just Enter. The curly braces are not typed in by you; Excel places them automatically when the array formula is properly installed.

Share Button
Posted in Tom's Tutorials for Excel
Tags: , , , , , , , , , , , , , ,
6 comments on “Tom’s Tutorials For Excel: Fuzzy Matching With Partial Strings For Sum or Average
  1. Hi Tom,
    For fun, I tried to modify the formula so that the partial string value of 354 can be separated with other text and/or numbers. It seems to be possible but it’s a long formula! I’ll test it and share tomorrow.
    Cheers,
    Kevin Lehrbass
    https://www.youtube.com/user/MySpreadsheetLab

  2. Zoran Stanojevic says:

    Hi guys,
    Is the challenge still open? Didn’t sound as hard to me – solved for SUM & AVERAGE by using 4 & 5 functions respectively: either below 70 chars, and both array entered.
    Regards,
    Zoran

    Way to contact me: https://goo.gl/O2cSY8

  3. Vikas says:

    Dear Sir,
    Why did you not use wildcard here ?

    Regards,
    Vikas.

    • Tom Urtis says:

      What wildcard would help? You need to evaluate for the presence of a specific string (in a larger string) such as in the criteria cell B4. If I am looking for “XYZ” and in the string, there exists KJSHXYZLRGY, and I used the “?” wildcard to find ?XYZ? then my returned result would be HXYZL. That can be bypassed with just the specific presence of XYZ using the FIND method for an exact match of that (in this example) 3-character string inside a larger string.

      • Dan Lanning says:

        If I’m not mistaken, the asterick wild card would work (as long as cells are text):
        SUMIF(B7:B19,”*354*”,C7:C19) OR “*ATLAS*”
        AVERAGEIF can be used for averages.

        • Tom Urtis says:

          Hi Dan, thanks. Also I see I missed a previous, similar comment from Vikas, and yes, you are right.

          I can’t remember exactly, so I need to go back to my notes from when I wrote this post, but something back then struck me about running into unreliable results with wildcards as opposed to FIND. I’ll go back and see, with an example to show (if it turns out to be the case), a scenario I seem to recall coming across that fails with wildcards. In the meantime I wanted to reply and thank you (and Vikas belatedly) for your comments, much appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *

*