Do We Need Fuzzy Logic In Our Databases?
I was reading in Bart Kosko’s book: Fuzzy Thinking, where he mentioned that in much of our science, math, logic, and culture, we have assumed a world of blacks and whites that doesn’t change. In other words, everything is either true or false. Even our computers are based on this concept of black and white, especially when we notice how it is based on the binary strings of 0s (black) and 1s (white).
This belief of a bivalence world is not new. For instance, Aristotle wrote what he felt were the black-and-white (binary) laws of logic, laws which scientists and mathematicians still use to describe the gray universe. Aristotle’s binary logic reduced to one law:
A OR not A
This law has defined what was considered philosophically correct for more than 2000 years!
But, here comes Albert Einstein to punch through this binary world when he said:
So far as the laws of mathematics refer to reality, they are not certain. And so far as they are certain, they don't refer to reality.
If we think about Einstein’s quote, we realize that the world of binary logic described by Aristotle seems to not fit the world it is intending to describe (real-world). One world is artificial, the other is real. Bart Kosko refers to this phenomena as the mismatch problem:
The world is gray but science is black and white.
We thus have a fuzzy world for which we are using a nonfuzzy description. In programming we are using statements that are either true (1) or false (0), while statements about the real world differ. We are trying to present a gross simplification to the real world where truth can lie in the many shades of gray that come in-between just black and white.
The real shift to what Einstein described happened officially by Lotfi A. Zadeh (father of fuzzy logic) from the University of California, Berkely, when he made the fuzzy set theory and fuzzy logic come to existence, by introducing the notion of fuzzy sets in his seminal paper in the year 1965.
Fuzzy logic can be described as follows:
Fuzzy logic is an approach to computing based on "degrees of truth" rather than the usual "true or false" (1 or 0) Boolean logic on which the modern computer is based. It includes 0 and 1 as extreme cases of truth (or "the state of matters" or "fact") but also includes the various states of truth in between so that, for example, the result of a comparison between two things could be not "tall" or "short" but "0.76 of tallness. Fuzzy logic seems closer to the way our brains work. We aggregate data and form a number of partial truths which we aggregate further into higher truths which in turn, when certain thresholds are exceeded, cause certain further results such as motor reaction. It may help to see fuzzy logic as the way reasoning really works and binary or Boolean logic is simply a special case of it.
We don’t hear a lot the term fuzzy database, do we? We normally just say database. When we say database, we most likely refer to a regular database where only perfectly described (crisp) data are stored. This I believe is how most databases we work with look like.
But, let’s think about it once more. A lot of data in our real world is vague, uncertain, imprecise, you name it. Wouldn’t storing such data in a database that would assume that the data which is imprecise in nature as precise raise a problem? Ignoring that fact of real-world data imperfection will lead to deformation on how we humans perceive the real-world through our database systems, in addition to eliminating substantial information, which could be critical in some data-intensive application.
Here comes what we call fuzzy databases to the play. Fuzzy databases aim at bringing a database which is able to deal with the uncertain real-world data using fuzzy logic. It strives to handle the imperfect information we have in our databases.
A question that may arise here is, are there situations where a fuzzy database could take on a problem that a traditional database would have issues with? The answer is simply yes, and I believe many.
Let’s take an example on that: FuzzBoxTM. This is a fuzzy database which explains three main problems it can solve, a traditional database would not be able to: (1) Database containing dirty data (database doesn’t refer to the same item the same way); (2) Database containing slightly different duplicate data, as opposed to the easy to find exact duplicates; and (3) Searching the database with a misspelled item, which is something very natural to occur.
With the above situations, a traditional database would suffer, which leads us (as users) to suffer too. When we have dirty data, our website or application will not work properly, leading us to lose productivity and business. Removing the duplicates causes the website or application to run more efficiently, and thus, increasing productivity. Not matching an item that actually exists in the database, just because of some misspelling, will cause the lose of productivity and business.
FuzzBoxTM demonstrates an example when we have a misspelled name of some country. Let’s try different scenarios with the country name Jordan. If you write the country as Jordna (replacing only the last two letters), a traditional database would just tell you that it has no such country. Let’s try searching for Jordna in FuzzBoxTM. Look how it returned Jordan with score 2, which is the highest score, and thus our match. Try now typing the following forms of the country name, and see how you get a high score return value to Jordan: jodran, jaodran, ordan, jorrdan, jjordann.
Doesn’t that look interesting? Even for severe misspellings, we are getting a return value to our query, as opposed to null returns if we just misspell even one letter.
This was just a simple example to show how weak traditional databases can be, not mentioning their weaknesses when it comes to relating ambiguous data together (i.e. relations).
Thus, I believe that fuzzy logic should be an essential part of each database, as it will save us a lot of pain.