2018-09-30 22:01:58 +00:00
---
id: 5956795bc9e2c415eb244de1
2020-11-27 18:02:05 +00:00
title: Hash join
2018-09-30 22:01:58 +00:00
challengeType: 5
2019-08-05 16:17:33 +00:00
forumTopicId: 302284
2021-01-13 02:31:00 +00:00
dashedName: hash-join
2018-09-30 22:01:58 +00:00
---
2020-11-27 18:02:05 +00:00
# --description--
An [inner join ](https://en.wikipedia.org/wiki/Join_(SQL )#Inner_join "wp: Join\_(SQL)#Inner_join") is an operation that combines two data tables into one table, based on matching column values. The simplest way of implementing this operation is the [nested loop join ](<https://en.wikipedia.org/wiki/Nested loop join> "wp: Nested loop join" ) algorithm, but a more scalable alternative is the [hash join ](<https://en.wikipedia.org/wiki/hash join> "wp: hash join" ) algorithm.
2019-03-05 09:37:06 +00:00
The "hash join" algorithm consists of two steps:
2020-11-27 18:02:05 +00:00
2019-03-05 09:37:06 +00:00
< ol >
2020-11-27 18:02:05 +00:00
< li > < strong > Hash phase:< / strong > Create a < a href = 'https://en.wikipedia.org/wiki/Multimap' title = 'wp: Multimap' target = '_blank' > multimap< / a > from one of the two tables, mapping from each join column value to all the rows that contain it.< / li >
2019-03-05 09:37:06 +00:00
< ul >
< li > The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm.< / li >
< li > Ideally we should create the multimap for the smaller table, thus minimizing its creation time and memory size.< / li >
< / ul >
2019-06-14 11:04:16 +00:00
< li > < strong > Join phase:< / strong > Scan the other table, and find matching rows by looking in the multimap created before.< / li >
2019-03-05 09:37:06 +00:00
< / ol >
2020-11-27 18:02:05 +00:00
2019-03-05 09:37:06 +00:00
In pseudo-code, the algorithm could be expressed as follows:
2020-11-27 18:02:05 +00:00
< pre > < strong > let< / strong > < i > A< / i > = the first input table (or ideally, the larger one)
2019-06-14 11:04:16 +00:00
< strong > let< / strong > < i > B< / i > = the second input table (or ideally, the smaller one)
< strong > let< / strong > < i > j< sub > A< / sub > < / i > = the join column ID of table < i > A< / i >
< strong > let< / strong > < i > j< sub > B< / sub > < / i > = the join column ID of table < i > B< / i >
< strong > let< / strong > < i > M< sub > B< / sub > < / i > = a multimap for mapping from single values to multiple rows of table < i > B< / i > (starts out empty)
< strong > let< / strong > < i > C< / i > = the output table (starts out empty)
< strong > for each< / strong > row < i > b< / i > in table < i > B< / i > :
< strong > place< / strong > < i > b< / i > in multimap < i > M< sub > B< / sub > < / i > under key < i > b(j< sub > B< / sub > )< / i >
< strong > for each< / strong > row < i > a< / i > in table < i > A< / i > :
< strong > for each< / strong > row < i > b< / i > in multimap < i > M< sub > B< / sub > < / i > under key < i > a(j< sub > A< / sub > )< / i > :
< strong > let< / strong > < i > c< / i > = the concatenation of row < i > a< / i > and row < i > b< / i >
< strong > place< / strong > row < i > c< / i > in table < i > C< / i >
2018-09-30 22:01:58 +00:00
< / pre >
2019-03-05 09:37:06 +00:00
2020-11-27 18:02:05 +00:00
# --instructions--
2020-02-07 08:47:35 +00:00
2019-03-05 09:37:06 +00:00
Implement the "hash join" algorithm as a function and demonstrate that it passes the test-case listed below. The function should accept two arrays of objects and return an array of combined objects.
2020-02-07 08:47:35 +00:00
2021-01-21 08:11:46 +00:00
**Input**
2020-02-07 08:47:35 +00:00
2018-09-30 22:01:58 +00:00
< table >
2020-02-07 08:47:35 +00:00
< tr >
< td style = "padding: 4px; margin: 5px;" >
< table style = "border:none; border-collapse:collapse;" >
< tr >
< td style = "border:none" > < i > A =< / i > < / td >
< td style = "border:none" >
< table >
< tr >
< th style = "padding: 4px; margin: 5px;" > Age< / th >
< th style = "padding: 4px; margin: 5px;" > Name< / th >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > 27< / td >
< td style = "padding: 4px; margin: 5px;" > Jonah< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > 18< / td >
< td style = "padding: 4px; margin: 5px;" > Alan< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > 28< / td >
< td style = "padding: 4px; margin: 5px;" > Glory< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > 18< / td >
< td style = "padding: 4px; margin: 5px;" > Popeye< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > 28< / td >
< td style = "padding: 4px; margin: 5px;" > Alan< / td >
< / tr >
< / table >
< / td >
< td style = "border:none; padding-left:1.5em;" rowspan = "2" > < / td >
< td style = "border:none" > < i > B =< / i > < / td >
< td style = "border:none" >
< table >
< tr >
< th style = "padding: 4px; margin: 5px;" > Character< / th >
< th style = "padding: 4px; margin: 5px;" > Nemesis< / th >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > Jonah< / td >
< td style = "padding: 4px; margin: 5px;" > Whales< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > Jonah< / td >
< td style = "padding: 4px; margin: 5px;" > Spiders< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > Alan< / td >
< td style = "padding: 4px; margin: 5px;" > Ghosts< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > Alan< / td >
< td style = "padding: 4px; margin: 5px;" > Zombies< / td >
< / tr >
< tr >
< td style = "padding: 4px; margin: 5px;" > Glory< / td >
< td style = "padding: 4px; margin: 5px;" > Buffy< / td >
< / tr >
< / table >
< / td >
< / tr >
< tr >
< td style = "border:none" >
< i > j< sub > A< / sub > =< / i >
< / td >
< td style = "border:none" >
< i > < code > Name< / code > (i.e. column 1)< / i >
< / td >
< td style = "border:none" >
< i > j< sub > B< / sub > =< / i >
< / td >
< td style = "border:none" >
< i > < code > Character< / code > (i.e. column 0)< / i >
< / td >
< / tr >
< / table >
< / td >
< / tr >
< / table >
2021-01-21 08:11:46 +00:00
**Output**
2020-02-07 08:47:35 +00:00
2020-11-27 18:02:05 +00:00
| A_age | A_name | B_character | B_nemesis |
| ----- | ------ | ----------- | --------- |
| 27 | Jonah | Jonah | Whales |
| 27 | Jonah | Jonah | Spiders |
| 18 | Alan | Alan | Ghosts |
| 18 | Alan | Alan | Zombies |
| 28 | Glory | Glory | Buffy |
| 28 | Alan | Alan | Ghosts |
| 28 | Alan | Alan | Zombies |
2020-02-07 08:47:35 +00:00
2019-03-05 09:37:06 +00:00
The order of the rows in the output table is not significant.
2020-02-07 08:47:35 +00:00
2020-11-27 18:02:05 +00:00
# --hints--
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
`hashJoin` should be a function.
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
```js
assert(typeof hashJoin === 'function');
2018-09-30 22:01:58 +00:00
```
2020-11-27 18:02:05 +00:00
`hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])` should return `[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]`
2018-09-30 22:01:58 +00:00
```js
2020-11-27 18:02:05 +00:00
assert.deepEqual(hashJoin(hash1, hash2), res);
2018-09-30 22:01:58 +00:00
```
2020-11-27 18:02:05 +00:00
# --seed--
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
## --after-user-code--
2018-09-30 22:01:58 +00:00
```js
2018-10-20 18:02:47 +00:00
const hash1 = [
{ age: 27, name: 'Jonah' },
{ age: 18, name: 'Alan' },
{ age: 28, name: 'Glory' },
{ age: 18, name: 'Popeye' },
{ age: 28, name: 'Alan' }
];
const hash2 = [
{ character: 'Jonah', nemesis: 'Whales' },
{ character: 'Jonah', nemesis: 'Spiders' },
{ character: 'Alan', nemesis: 'Ghosts' },
{ character: 'Alan', nemesis: 'Zombies' },
{ character: 'Glory', nemesis: 'Buffy' },
{ character: 'Bob', nemesis: 'foo' }
];
const res = [
{ A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Whales' },
{ A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Spiders' },
{ A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
{ A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' },
{ A_age: 28, A_name: 'Glory', B_character: 'Glory', B_nemesis: 'Buffy' },
{ A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
{ A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' }
];
const bench1 = [{ name: 'u2v7v', num: 1 }, { name: 'n53c8', num: 10 }, { name: 'oysce', num: 9 }, { name: '0mto2s', num: 1 }, { name: 'vkh5id', num: 4 }, { name: '5od0cf', num: 8 }, { name: 'uuulue', num: 10 }, { name: '3rgsbi', num: 9 }, { name: 'kccv35r', num: 4 }, { name: '80un74', num: 9 }, { name: 'h4pp3', num: 6 }, { name: '51bit', num: 7 }, { name: 'j9ndf', num: 8 }, { name: 'vf3u1', num: 10 }, { name: 'g0bw0om', num: 10 }, { name: 'j031x', num: 7 }, { name: 'ij3asc', num: 9 }, { name: 'byv83y', num: 8 }, { name: 'bjzp4k', num: 4 }, { name: 'f3kbnm', num: 10 }];
const bench2 = [{ friend: 'o8b', num: 8 }, { friend: 'ye', num: 2 }, { friend: '32i', num: 5 }, { friend: 'uz', num: 3 }, { friend: 'a5k', num: 4 }, { friend: 'uad', num: 7 }, { friend: '3w5', num: 10 }, { friend: 'vw', num: 10 }, { friend: 'ah', num: 4 }, { friend: 'qv', num: 7 }, { friend: 'ozv', num: 2 }, { friend: '9ri', num: 10 }, { friend: '7nu', num: 4 }, { friend: 'w3', num: 9 }, { friend: 'tgp', num: 8 }, { friend: 'ibs', num: 1 }, { friend: 'ss7', num: 6 }, { friend: 'g44', num: 9 }, { friend: 'tab', num: 9 }, { friend: 'zem', num: 10 }];
2018-09-30 22:01:58 +00:00
```
2020-11-27 18:02:05 +00:00
## --seed-contents--
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
```js
function hashJoin(hash1, hash2) {
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
return [];
}
```
2018-09-30 22:01:58 +00:00
2020-11-27 18:02:05 +00:00
# --solutions--
2018-09-30 22:01:58 +00:00
```js
2019-03-05 09:37:06 +00:00
function hashJoin(hash1, hash2) {
2018-09-30 22:01:58 +00:00
const hJoin = (tblA, tblB, strJoin) => {
const [jA, jB] = strJoin.split('=');
const M = tblB.reduce((a, x) => {
const id = x[jB];
return (
a[id] ? a[id].push(x) : (a[id] = [x]),
a
);
}, {});
return tblA.reduce((a, x) => {
const match = M[x[jA]];
return match ? (
a.concat(match.map(row => dictConcat(x, row)))
) : a;
}, []);
};
const dictConcat = (dctA, dctB) => {
const ok = Object.keys;
return ok(dctB).reduce(
(a, k) => (a[`B_${k}`] = dctB[k]) & & a,
ok(dctA).reduce(
(a, k) => (a[`A_${k}`] = dctA[k]) & & a, {}
)
);
};
return hJoin(hash1, hash2, 'name=character');
}
```