Approximate Joins in MySQL Server

There are two main ways to reconcile payments against charges:

  • Open Item: match payments against individual charges, typically by carrying the charge number in the payments table
  • Statement: list and sum all charges and all payments, and show the difference as the outstanding balance.
The Open Item method needs a foolproof way to match payments to charges, but what if the customer neglected to return a copy of the invoice, or to write the invoice number on the cheque. Reconciliation staff spend much of their time resolving such problems.

First of all we have to create a new database 'approx'.

Syntax :

CREATE SCHEMA approx;
USE approx;

Then Create the two tables 'charges' and 'payments'.

Syntax :

CREATE TABLE charges (
  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  custID INT UNSIGNED,
  amount DECIMAL(10,2) NOT NULL
);

CREATE TABLE payments (
  ID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  custID INT UNSIGNED,
  amount DECIMAL( 10,2) NOT NULL
);

img 1.jpg

Both the tables have a custID column to identify whose charge or payment it is, but there is no foreign key linking payments to specific charges; that is the link we are going to approximate.

Now populate the tables with a few rows of sample charges and payments for customer #1, ensuring that we have a variety of payments – some that match the charge exactly, some that are close but not enough, and some that are slight over-payments.

Syntax :

INSERT INTO approx.charges VALUES
(NULL,1,100),(NULL,1,12),(NULL,1,56),(NULL,1,43),(NULL,1,59),(NULL,1,998);
INSERT INTO approx.payments VALUES
(NULL,1,99),(NULL,1,62),(NULL,1,40),(NULL,1,50),(NULL,1,12),(NULL,1,1000);


img 2.jpg

The first thing to do is define an approximation threshold: how close must the amount paid be to the amount charged before we conclude that the amounts are related?

For example : we define the proximity threshold as 2. In a real-world example, it might be 10, or 50, or perhaps percentage of the charge. It all depends on the nature of the organization and the typical total purchase. A house builder may make frequent purchases valued at $1000 and more.
We scale the threshold to the typical situation.

Since the amount paid might be more or less or even equal to the amount charged, to link a payment to a charge we need not an equi-join but a theta-join that tests a range both below and above the charge amount. That might suggest a BETWEEN clause. Here we use the ABS() function:

Syntax :

SET  @proximity = 2;   -- change this value to suit your situation
SELECT
  c.ID AS ChargeNo,
  c.Amount AS Charge,
  p.ID AS PaymentNo,
  p.Amount AS Payment
FROM charges c
JOIN payments p
  ON c.custID = p.custID
  AND ABS(c.amount - p.amount) <= @proximity
WHERE c.custID = 1;

Before running this query, look at the data to anticipate the result.

img 3.jpg

The solution is correct, as far as it goes, but it doesn't go far enough. We correctly identified the three situations: underpayment, exact payment and overpayment, but we suppressed all charges that don't have a matching payment. Reconciliation staff are probably interested in a bigger picture of the situation. Fix this by changing the INNER JOIN to a LEFT JOIN.

Syntax :

SET @proximity = 2;
SELECT
  c.ID AS ChargeNo,
  c.amount AS Charge,
  p.ID AS PaymentNo,
  p.amount AS Payment
FROM
  charges c
LEFT JOIN payments p
  ON c.custID = p.custID
  AND ABS(c.amount - p.amount) <= @proximity
WHERE c.custID = 1;

img 4.jpg

The reconciliation people now know that three charges have no matching payment.
If the customer mistakenly pays for something twice? Add a row to the Payments table with an amount of 1000, then re-run the last query.

img 5.jpg

We can see at once that charge number 6 was paid for twice.

Resources 

Here are some useful related resources: