score:0

utf-8 is a variable-length encoding, so some characters take only one byte while others take several.

if you are processing the string character-by-character using a switch statement, then you should probably use a wide-character string instead:

#include <stddef.h>

wchar_t mytext[]= l"ητια ητιααα λουλουδιασμενη!!!1234567890";

a wide-character has type wchar_t instead of char, and is intended to be large enough to store any single character in the current locale. a wide-character string constant is prefixed with the l character.

in your switch statement you can use wide-character constants in your case expressions (which are also prefixed by the l character):

switch (c)
{
    case l'λ':
    /* handle capital lambda */
    break;

    case l'α':
    /* handle capital a */
    break;

    /* ... */
}

score:1

since utf-8 characters can have multiple bytes, and strlen just counts the number of bytes until the first null character, strlen will overcount the length of utf-8 strings. one solution is to use mbstowcs() to convert the string to a wide character string, then wcslen() to get the length of the wide-character string.

p.s. here is a demonstration of the effect mentioned in the question.


Related Query