in CODING, Python

Python: fuzzy searching

In building this patch management application, I was having a hard time trying several ways to compare a dictionary of the names of packages to another of the name of software titles.

A little bit of searching and I learn about a Python library called ‘FuzzyWuzzy’ that uses ‘Levenshtein distance’ to measure the metric distance between two strings.

It now means that I post a software title and find all the packages that match within a metric that I set.

The basic layout is below with a metric match of 90%

from fuzzywuzzy import fuzz

def search(values, searchFor):
    # Get the values form dict
    for k, v in values.items():
        # make the value a string (some were int)
        v = str(v)
        # Skip empty packages
        if v == 'None':
            # lowercase the str to increase match
            Partial_Ratio = fuzz.partial_ratio(searchFor.lower(), v.lower())
            # over 90% and good to go
            if Partial_Ratio > 90:
                return v

def main():
    # Function that gets package id & names from API
    pkgs = get_all_packages()

    # Function that does same for software titles
    sw_titles = get_all_software_titles()

    # Loop through the sw titles
    for sw in sw_titles:

        # assign the name to var
        sw_name = sw['name']

        # print for testing
        print('SW Title: ' + sw_name)

        # Loop through packages
        for pkg in pkgs:

            match = search(pkg, sw_name)

            # print if theres a match
            if match is not None:
                print('Match: ' + match)

Write a Comment