freeCodeCamp

3.2 KiB

Raw Blame History

title	id	challengeType	forumTopicId
Tokenize a string with escaping	594faaab4e2a8626833e9c3d	5	302338

Description

Write a function or program that can split a string at each non-escaped occurrence of a separator character. It should accept three input parameters:

The string
The separator character
The escape character

It should output a list of strings. Rules for splitting:

The fields that were separated by the separators, become the elements of the output list.
Empty fields should be preserved, even at the start and end.

Rules for escaping:

"Escaped" means preceded by an occurrence of the escape character that is not already escaped itself.
When the escape character precedes a character that has no special meaning, it still counts as an escape (but does not do anything special).
Each occurrences of the escape character that was used to escape something, should not become part of the output.

Demonstrate that your function satisfies the following test-case: Given the string

one^|uno||three^^^^|four^^^|^cuatro|

and using | as a separator and ^ as escape character, your function should output the following array:

  ['one|uno', '', 'three^^', 'four^|cuatro', '']

Instructions

Tests

tests:
  - text: <code>tokenize</code> should be a function.
    testString: assert(typeof tokenize === 'function');
  - text: <code>tokenize</code> should return an array.
    testString: assert(typeof tokenize('a', 'b', 'c') === 'object');
  - text: <code>tokenize('one^|uno||three^^^^|four^^^|^cuatro|', '|', '^') </code> should return <code>['one|uno', '', 'three^^', 'four^|cuatro', '']</code>
    testString: assert.deepEqual(tokenize(testStr1, '|', '^'), res1);
  - text: <code>tokenize('a@&bcd&ef&&@@hi', '&', '@')</code> should return <code>['a&bcd', 'ef', '', '@hi']</code>
    testString: assert.deepEqual(tokenize(testStr2, '&', '@'), res2);

Challenge Seed

function tokenize(str, sep, esc) {
  return true;
}

After Test

const testStr1 = 'one^|uno||three^^^^|four^^^|^cuatro|';
const res1 = ['one|uno', '', 'three^^', 'four^|cuatro', ''];

// TODO add more tests
const testStr2 = 'a@&bcd&ef&&@@hi';
const res2 = ['a&bcd', 'ef', '', '@hi'];

Solution

// tokenize :: String -> Character -> Character -> [String]
function tokenize(str, charDelim, charEsc) {
  const dctParse = str.split('')
    .reduce((a, x) => {
      const blnEsc = a.esc;
      const blnBreak = !blnEsc && x === charDelim;
      const blnEscChar = !blnEsc && x === charEsc;

      return {
        esc: blnEscChar,
        token: blnBreak ? '' : (
          a.token + (blnEscChar ? '' : x)
        ),
        list: a.list.concat(blnBreak ? a.token : [])
      };
    }, {
      esc: false,
      token: '',
      list: []
    });

  return dctParse.list.concat(
    dctParse.token
  );
}

3.2 KiB Raw Blame History