freeCodeCamp/curriculum/challenges/english/08-coding-interview-prep/rosetta-code/hash-join.english.md

278 lines
12 KiB
Markdown
Raw Normal View History

---
title: Hash join
id: 5956795bc9e2c415eb244de1
challengeType: 5
---
## Description
<section id='description'>
<p>An <a href="https://en.wikipedia.org/wiki/Join_(SQL)#Inner_join" title="wp: Join_(SQL)#Inner_join">inner join</a> is an operation that combines two data tables into one table, based on matching column values. The simplest way of implementing this operation is the <a href="https://en.wikipedia.org/wiki/Nested loop join" title="wp: Nested loop join">nested loop join</a> algorithm, but a more scalable alternative is the <a href="https://en.wikipedia.org/wiki/hash join" title="wp: hash join">hash join</a> algorithm.</p>
<p>Implement the "hash join" algorithm, and demonstrate that it passes the test-case listed below.</p><p>You should represent the tables as data structures that feel natural in your programming language.</p>
<p>The "hash join" algorithm consists of two steps:</p>
Hash phase: Create a <a href="https://en.wikipedia.org/wiki/Multimap" title="wp: Multimap">multimap</a> from one of the two tables, mapping from each join column value to all the rows that contain it.
The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm.
Ideally we should create the multimap for the smaller table, thus minimizing its creation time and memory size.
Join phase: Scan the other table, and find matching rows by looking in the multimap created before.
<p>In pseudo-code, the algorithm could be expressed as follows:</p>
<pre>
let A = the first input table (or ideally, the larger one)
let B = the second input table (or ideally, the smaller one)
let j<sub>A</sub> = the join column ID of table A
let j<sub>B</sub> = the join column ID of table B
let M<sub>B</sub> = a multimap for mapping from single values to multiple rows of table B (starts out empty)
let C = the output table (starts out empty)
for each row b in table B:
place b in multimap M<sub>B</sub> under key b(j<sub>B</sub>)
for each row a in table A:
for each row b in multimap M<sub>B</sub> under key a(j<sub>A</sub>):
let c = the concatenation of row a and row b
place row c in table C</p>
</pre>
Test-case
<p>Input</p>
<table>
<tr>
<td style="padding: 4px; margin: 5px;">
<table style="border:none; border-collapse:collapse;">
<tr>
<td style="border:none"> <i>A =</i>
</td>
<td style="border:none">
<table>
<tr>
<th style="padding: 4px; margin: 5px;"> Age </th>
<th style="padding: 4px; margin: 5px;"> Name
</th></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 27 </td>
<td style="padding: 4px; margin: 5px;"> Jonah
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 18 </td>
<td style="padding: 4px; margin: 5px;"> Alan
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 28 </td>
<td style="padding: 4px; margin: 5px;"> Glory
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 18 </td>
<td style="padding: 4px; margin: 5px;"> Popeye
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 28 </td>
<td style="padding: 4px; margin: 5px;"> Alan
</td></tr></table>
</td>
<td style="border:none; padding-left:1.5em;" rowspan="2">
</td>
<td style="border:none"> <i>B =</i>
</td>
<td style="border:none">
<table>
<tr>
<th style="padding: 4px; margin: 5px;"> Character </th>
<th style="padding: 4px; margin: 5px;"> Nemesis
</th></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Whales
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Spiders
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Ghosts
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Zombies
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> Glory </td>
<td style="padding: 4px; margin: 5px;"> Buffy
</td></tr></table>
</td></tr>
<tr>
<td style="border:none"> <i>j<sub>A</sub> =</i>
</td>
<td style="border:none"> <i><code>Name</code> (i.e. column 1)</i>
</td>
<td style="border:none"> <i>j<sub>B</sub> =</i>
</td>
<td style="border:none"> <i><code>Character</code> (i.e. column 0)</i>
</td></tr></table>
</td>
<td style="padding: 4px; margin: 5px;">
</td></tr></table>
<p>Output</p>
<table>
<tr>
<th style="padding: 4px; margin: 5px;"> A.Age </th>
<th style="padding: 4px; margin: 5px;"> A.Name </th>
<th style="padding: 4px; margin: 5px;"> B.Character </th>
<th style="padding: 4px; margin: 5px;"> B.Nemesis
</th></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 27 </td>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Whales
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 27 </td>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Jonah </td>
<td style="padding: 4px; margin: 5px;"> Spiders
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 18 </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Ghosts
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 18 </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Zombies
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 28 </td>
<td style="padding: 4px; margin: 5px;"> Glory </td>
<td style="padding: 4px; margin: 5px;"> Glory </td>
<td style="padding: 4px; margin: 5px;"> Buffy
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 28 </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Ghosts
</td></tr>
<tr>
<td style="padding: 4px; margin: 5px;"> 28 </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Alan </td>
<td style="padding: 4px; margin: 5px;"> Zombies
</td></tr></table>
<p></p><p></p><p>The order of the rows in the output table is not significant.</p>
<p>If you're using numerically indexed arrays to represent table rows (rather than referring to columns by name), you could represent the output rows in the form <code style="white-space:nowrap">[[27, "Jonah"], ["Jonah", "Whales"]]</code>.</p><hr>
</section>
## Instructions
<section id='instructions'>
</section>
## Tests
<section id='tests'>
```yml
tests:
- text: <code>hashJoin</code> is a function.
testString: assert(typeof hashJoin === 'function', '<code>hashJoin</code> is a function.');
- text: '<code>hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])</code> should return <code>[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]</code>'
testString: 'assert.deepEqual(hashJoin(hash1, hash2), res, ''<code>hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])</code> should return <code>[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]</code>'');'
```
</section>
## Challenge Seed
<section id='challengeSeed'>
<div id='js-seed'>
```js
function hashJoin (hash1, hash2) {
// Good luck!
return [];
}
```
</div>
### After Test
<div id='js-teardown'>
```js
const hash1 = [
{ age: 27, name: 'Jonah' },
{ age: 18, name: 'Alan' },
{ age: 28, name: 'Glory' },
{ age: 18, name: 'Popeye' },
{ age: 28, name: 'Alan' }
];
const hash2 = [
{ character: 'Jonah', nemesis: 'Whales' },
{ character: 'Jonah', nemesis: 'Spiders' },
{ character: 'Alan', nemesis: 'Ghosts' },
{ character: 'Alan', nemesis: 'Zombies' },
{ character: 'Glory', nemesis: 'Buffy' },
{ character: 'Bob', nemesis: 'foo' }
];
const res = [
{ A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Whales' },
{ A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Spiders' },
{ A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
{ A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' },
{ A_age: 28, A_name: 'Glory', B_character: 'Glory', B_nemesis: 'Buffy' },
{ A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
{ A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' }
];
const bench1 = [{ name: 'u2v7v', num: 1 }, { name: 'n53c8', num: 10 }, { name: 'oysce', num: 9 }, { name: '0mto2s', num: 1 }, { name: 'vkh5id', num: 4 }, { name: '5od0cf', num: 8 }, { name: 'uuulue', num: 10 }, { name: '3rgsbi', num: 9 }, { name: 'kccv35r', num: 4 }, { name: '80un74', num: 9 }, { name: 'h4pp3', num: 6 }, { name: '51bit', num: 7 }, { name: 'j9ndf', num: 8 }, { name: 'vf3u1', num: 10 }, { name: 'g0bw0om', num: 10 }, { name: 'j031x', num: 7 }, { name: 'ij3asc', num: 9 }, { name: 'byv83y', num: 8 }, { name: 'bjzp4k', num: 4 }, { name: 'f3kbnm', num: 10 }];
const bench2 = [{ friend: 'o8b', num: 8 }, { friend: 'ye', num: 2 }, { friend: '32i', num: 5 }, { friend: 'uz', num: 3 }, { friend: 'a5k', num: 4 }, { friend: 'uad', num: 7 }, { friend: '3w5', num: 10 }, { friend: 'vw', num: 10 }, { friend: 'ah', num: 4 }, { friend: 'qv', num: 7 }, { friend: 'ozv', num: 2 }, { friend: '9ri', num: 10 }, { friend: '7nu', num: 4 }, { friend: 'w3', num: 9 }, { friend: 'tgp', num: 8 }, { friend: 'ibs', num: 1 }, { friend: 'ss7', num: 6 }, { friend: 'g44', num: 9 }, { friend: 'tab', num: 9 }, { friend: 'zem', num: 10 }];
```
</div>
</section>
## Solution
<section id='solution'>
```js
function hashJoin (hash1, hash2) {
const hJoin = (tblA, tblB, strJoin) => {
const [jA, jB] = strJoin.split('=');
const M = tblB.reduce((a, x) => {
const id = x[jB];
return (
a[id] ? a[id].push(x) : (a[id] = [x]),
a
);
}, {});
return tblA.reduce((a, x) => {
const match = M[x[jA]];
return match ? (
a.concat(match.map(row => dictConcat(x, row)))
) : a;
}, []);
};
const dictConcat = (dctA, dctB) => {
const ok = Object.keys;
return ok(dctB).reduce(
(a, k) => (a[`B_${k}`] = dctB[k]) && a,
ok(dctA).reduce(
(a, k) => (a[`A_${k}`] = dctA[k]) && a, {}
)
);
};
return hJoin(hash1, hash2, 'name=character');
}
```
</section>