freeCodeCamp/curriculum/challenges/english/08-coding-interview-prep/rosetta-code/tokenize-a-string-with-esca...

3.5 KiB

title id challengeType
Tokenize a string with escaping 594faaab4e2a8626833e9c3d 5

Description

Write a function or program that can split a string at each non-escaped occurrence of a separator character.

It should accept three input parameters:

The string The separator character The escape character

It should output a list of strings.

Rules for splitting:

The fields that were separated by the separators, become the elements of the output list. Empty fields should be preserved, even at the start and end.

Rules for escaping:

"Escaped" means preceded by an occurrence of the escape character that is not already escaped itself. When the escape character precedes a character that has no special meaning, it still counts as an escape (but does not do anything special). Each occurrences of the escape character that was used to escape something, should not become part of the output.

Demonstrate that your function satisfies the following test-case: Given string

one^|uno||three^^^^|four^^^|^cuatro|
and using
|
as a separator and
^
as escape character, your function should output the following array:

  ['one|uno', '', 'three^^', 'four^|quatro', '']
  

Instructions

Tests

tests:
  - text: <code>tokenize</code> is a function.
    testString: assert(typeof tokenize === 'function', '<code>tokenize</code> is a function.');
  - text: <code>tokenize</code> should return an array.
    testString: assert(typeof tokenize('a', 'b', 'c') === 'object', '<code>tokenize</code> should return an array.');
  - text: <code>tokenize('one^|uno||three^^^^|four^^^|^cuatro|', '|', '^') </code> should return <code>['one|uno', '', 'three^^', 'four^|cuatro', '']</code>
    testString: assert.deepEqual(tokenize(testStr1, '|', '^'), res1, "<code>tokenize('one^|uno||three^^^^|four^^^|^cuatro|', '|', '^') </code> should return ['one|uno', '', 'three^^', 'four^|cuatro', '']");
  - text: <code>tokenize('a@&bcd&ef&&@@hi', '&', '@')</code> should return <code>['a&bcd', 'ef', '', '@hi']</code>
    testString: assert.deepEqual(tokenize(testStr2, '&', '@'), res2, '<code>tokenize("a@&bcd&ef&&@@hi", "&", "@")</code> should return <code>["a&bcd", "ef", "", "@hi"]</code>');

Challenge Seed

function tokenize(str, esc, sep) {
  return true;
}

After Test

const testStr1 = 'one^|uno||three^^^^|four^^^|^cuatro|';
const res1 = ['one|uno', '', 'three^^', 'four^|cuatro', ''];

// TODO add more tests
const testStr2 = 'a@&bcd&ef&&@@hi';
const res2 = ['a&bcd', 'ef', '', '@hi'];

Solution

// tokenize :: String -> Character -> Character -> [String]
function tokenize(str, charDelim, charEsc) {
  const dctParse = str.split('')
    .reduce((a, x) => {
      const blnEsc = a.esc;
      const blnBreak = !blnEsc && x === charDelim;
      const blnEscChar = !blnEsc && x === charEsc;

      return {
        esc: blnEscChar,
        token: blnBreak ? '' : (
          a.token + (blnEscChar ? '' : x)
        ),
        list: a.list.concat(blnBreak ? a.token : [])
      };
    }, {
      esc: false,
      token: '',
      list: []
    });

  return dctParse.list.concat(
    dctParse.token
  );
}