freeCodeCamp/curriculum/challenges/english/08-coding-interview-prep/rosetta-code/jaro-distance.english.md

4.6 KiB

title id challengeType
Jaro distance 5a23c84252665b21eecc7ec2 5

Description

The Jaro distance is a measure of similarity between two strings. The higher the Jaro distance for two strings is, the more similar the strings are. The score is normalized such that 0 equates to no similarity and 1 is an exact match. Definition The Jaro distance \( d_j \) of two given strings \(s_1\) and \(s_2\) is \begin{align}d_j = \begin{cases}0& & \text{if }m=0 \\\\{\frac {1}{3}}\left({\frac {m}{|s_{1}|}}+{\frac {m}{|s_{2}|}}+{\frac {m-t}{m}}\right)& & \text{otherwise}\end{cases}\end{align} Where:
  • \(m\) is the number of matching characters;
  • \(t\) is half the number of transpositions.
Two characters from \(s_1\) and \(s_2\) respectively, are considered matching only if they are the same and not farther than \(\left\lfloor\frac{\max(|s_1|,|s_2|)}{2}\right\rfloor-1\). Each character of \(s_1\) is compared with all its matching characters in \(s_2\) . The number of matching (but different sequence order) characters divided by 2 defines the number of transpositions. Example Given the strings \(s_1\) DWAYNE and \(s_2\) DUANE we find:
  • \(m = 4\)
  • \(|s_1| = 6\)
  • \(|s_2| = 5\)
  • \(t = 0\)
We find a Jaro score of: \(d_j = \frac{1}{3}\left(\frac{4}{6} + \frac{4}{5} + \frac{4-0}{4}\right) = 0.822\). Write a function a that takes two strings as parameters and returns the associated Jaro distance.

Instructions

Tests

tests:
  - text: <code>jaro</code> should be a function.
    testString: 'assert(typeof jaro=="function","<code>jaro</code> should be a function.");'
  - text: <code>jaro(""+tests[0][0]+"",""+tests[0][1]+"")</code> should return a number.
    testString: 'assert(typeof jaro(tests[0][0],tests[0][1])=="number","<code>jaro()</code> should return a number.");'
  - text: <code>jaro(""+tests[0][0]+"",""+tests[0][1]+"")</code> should return <code>"+results[0]+"</code>.
    testString: 'assert.equal(jaro(tests[0][0],tests[0][1]),results[0],"<code>jaro(""+tests[0][0]+"",""+tests[0][1]+"")</code> should return <code>"+results[0]+"</code>.");'
  - text: <code>jaro(""+tests[1][0]+"",""+tests[1][1]+"")</code> should return <code>"+results[1]+"</code>.
    testString: 'assert.equal(jaro(tests[1][0],tests[1][1]),results[1],"<code>jaro(""+tests[1][0]+"",""+tests[1][1]+"")</code> should return <code>"+results[1]+"</code>.");'
  - text: <code>jaro(""+tests[2][0]+"",""+tests[2][1]+"")</code> should return <code>"+results[2]+"</code>.
    testString: 'assert.equal(jaro(tests[2][0],tests[2][1]),results[2],"<code>jaro(""+tests[2][0]+"",""+tests[2][1]+"")</code> should return <code>"+results[2]+"</code>.");'
  - text: <code>jaro(""+tests[3][0]+"",""+tests[3][1]+"")</code> should return <code>"+results[3]+"</code>.
    testString: 'assert.equal(jaro(tests[3][0],tests[3][1]),results[3],"<code>jaro(""+tests[3][0]+"",""+tests[3][1]+"")</code> should return <code>"+results[3]+"</code>.");'
  - text: <code>jaro(""+tests[4][0]+"",""+tests[4][1]+"")</code> should return <code>"+results[4]+"</code>.
    testString: 'assert.equal(jaro(tests[4][0],tests[4][1]),results[4],"<code>jaro(""+tests[4][0]+"",""+tests[4][1]+"")</code> should return <code>"+results[4]+"</code>.");'

Challenge Seed

function jaro (s, t) {
  // Good luck!
}

After Test

console.info('after the test');

Solution

function jaro (s, t) {
  var s_len = s.length;
  var t_len = t.length;

  if (s_len == 0 && t_len == 0) return 1;

  var match_distance = Math.max(s_len, t_len) / 2 - 1;

  var s_matches = new Array(s_len);
  var t_matches = new Array(t_len);

  var matches = 0;
  var transpositions = 0;

  for (var i = 0; i < s_len; i++) {
    var start = Math.max(0, i - match_distance);
    var end = Math.min(i + match_distance + 1, t_len);

    for (var j = start; j < end; j++) {
      if (t_matches[j]) continue;
      if (s.charAt(i) != t.charAt(j)) continue;
      s_matches[i] = true;
      t_matches[j] = true;
      matches++;
      break;
    }
  }

  if (matches == 0) return 0;

  var k = 0;
  for (var i = 0; i < s_len; i++) {
    if (!s_matches[i]) continue;
    while (!t_matches[k]) k++;
    if (s.charAt(i) != t.charAt(k)) transpositions++;
    k++;
  }

  return ((matches / s_len) +
    (matches / t_len) +
    ((matches - transpositions / 2.0) / matches)) / 3.0;
}