freeCodeCamp/curriculum/challenges/english/10-coding-interview-prep/rosetta-code/tokenize-a-string-with-esca...

3.2 KiB

title id challengeType forumTopicId
Tokenize a string with escaping 594faaab4e2a8626833e9c3d 5 302338

Description

Write a function or program that can split a string at each non-escaped occurrence of a separator character. It should accept three input parameters:
  • The string
  • The separator character
  • The escape character
It should output a list of strings. Rules for splitting:
  • The fields that were separated by the separators, become the elements of the output list.
  • Empty fields should be preserved, even at the start and end.
Rules for escaping:
  • "Escaped" means preceded by an occurrence of the escape character that is not already escaped itself.
  • When the escape character precedes a character that has no special meaning, it still counts as an escape (but does not do anything special).
  • Each occurrences of the escape character that was used to escape something, should not become part of the output.
Demonstrate that your function satisfies the following test-case: Given the string
one^|uno||three^^^^|four^^^|^cuatro|
and using | as a separator and ^ as escape character, your function should output the following array:
  ['one|uno', '', 'three^^', 'four^|cuatro', '']

Instructions

Tests

tests:
  - text: <code>tokenize</code> should be a function.
    testString: assert(typeof tokenize === 'function');
  - text: <code>tokenize</code> should return an array.
    testString: assert(typeof tokenize('a', 'b', 'c') === 'object');
  - text: <code>tokenize('one^|uno||three^^^^|four^^^|^cuatro|', '|', '^') </code> should return <code>['one|uno', '', 'three^^', 'four^|cuatro', '']</code>
    testString: assert.deepEqual(tokenize(testStr1, '|', '^'), res1);
  - text: <code>tokenize('a@&bcd&ef&&@@hi', '&', '@')</code> should return <code>['a&bcd', 'ef', '', '@hi']</code>
    testString: assert.deepEqual(tokenize(testStr2, '&', '@'), res2);

Challenge Seed

function tokenize(str, sep, esc) {
  return true;
}

After Test

const testStr1 = 'one^|uno||three^^^^|four^^^|^cuatro|';
const res1 = ['one|uno', '', 'three^^', 'four^|cuatro', ''];

// TODO add more tests
const testStr2 = 'a@&bcd&ef&&@@hi';
const res2 = ['a&bcd', 'ef', '', '@hi'];

Solution

// tokenize :: String -> Character -> Character -> [String]
function tokenize(str, charDelim, charEsc) {
  const dctParse = str.split('')
    .reduce((a, x) => {
      const blnEsc = a.esc;
      const blnBreak = !blnEsc && x === charDelim;
      const blnEscChar = !blnEsc && x === charEsc;

      return {
        esc: blnEscChar,
        token: blnBreak ? '' : (
          a.token + (blnEscChar ? '' : x)
        ),
        list: a.list.concat(blnBreak ? a.token : [])
      };
    }, {
      esc: false,
      token: '',
      list: []
    });

  return dctParse.list.concat(
    dctParse.token
  );
}