VLC 4.0.0-dev
Collaboration diagram for Character sets:

Modules

 iconv wrappers
 (defined in src/extras/libc.c)
 
 C/POSIX locale functions
 

Files

file  vlc_charset.h
 

Macros

#define FromLocale(l)   (l)
 
#define ToLocale(u)   (u)
 
#define LocaleFree(s)   ((void)(s))
 
#define FromLocaleDup   strdup
 
#define ToLocaleDup   strdup
 

Functions

ssize_t vlc_towc (const char *str, uint32_t *restrict pwc)
 Decodes a code point from UTF-8. More...
 
static const char * IsUTF8 (const char *str)
 Checks UTF-8 validity. More...
 
static const char * IsASCII (const char *str)
 Checks ASCII validity. More...
 
static char * EnsureUTF8 (char *str)
 Removes non-UTF-8 sequences. More...
 
int utf8_vfprintf (FILE *stream, const char *fmt, va_list ap)
 Formats an UTF-8 string as vfprintf(), then print it, with appropriate conversion to local encoding. More...
 
int utf8_fprintf (FILE *, const char *,...)
 Formats an UTF-8 string as fprintf(), then print it, with appropriate conversion to local encoding. More...
 
char * vlc_strcasestr (const char *, const char *)
 Look for an UTF-8 string within another one in a case-insensitive fashion. More...
 
char * FromCharset (const char *charset, const void *data, size_t data_size)
 Converts a string from the given character encoding to utf-8. More...
 
void * ToCharset (const char *charset, const char *in, size_t *outsize)
 Converts a nul-terminated UTF-8 string to a given character encoding. More...
 
static char * FromLatin1 (const char *latin)
 Converts a nul-terminated string from ISO-8859-1 to UTF-8. More...
 

Detailed Description

Macro Definition Documentation

◆ FromLocale

#define FromLocale (   l)    (l)

◆ FromLocaleDup

#define FromLocaleDup   strdup

◆ LocaleFree

#define LocaleFree (   s)    ((void)(s))

◆ ToLocale

#define ToLocale (   u)    (u)

◆ ToLocaleDup

#define ToLocaleDup   strdup

Function Documentation

◆ EnsureUTF8()

static char * EnsureUTF8 ( char *  str)
inlinestatic

Removes non-UTF-8 sequences.

Replaces invalid or over-long UTF-8 bytes sequences within a null-terminated string with question marks. This is so that the string can be printed at least partially.

Warning
Do not use this were correctness is critical. use IsUTF8() and handle the error case instead. This function is mainly for display or debug.
Note
Converting from Latin-1 to UTF-8 in place is not possible (the string size would be increased). So it is not attempted even if it would otherwise be less disruptive.
Return values
strthe string is a valid null-terminated UTF-8 sequence (i.e. no changes were made)
NULLthe string is not an UTF-8 sequence

References likely, and vlc_towc().

Referenced by filename_sanitize(), input_item_SetURI(), and InputMetaUser().

◆ FromCharset()

char * FromCharset ( const char *  charset,
const void *  data,
size_t  data_size 
)

Converts a string from the given character encoding to utf-8.

Returns
a nul-terminated utf-8 string, or null in case of error. The result must be freed using free().

References vlc_iconv(), vlc_iconv_close(), and vlc_iconv_open().

Referenced by vlc_readdir().

◆ FromLatin1()

static char * FromLatin1 ( const char *  latin)
inlinestatic

Converts a nul-terminated string from ISO-8859-1 to UTF-8.

◆ IsASCII()

static const char * IsASCII ( const char *  str)
inlinestatic

Checks ASCII validity.

Checks whether a null-terminated string is a valid ASCII bytes sequence (non-printable ASCII characters 1-31 are permitted).

Parameters
strstring to check
Return values
strthe string is a valid null-terminated ASCII sequence
NULLthe string is not an ASCII sequence

References p.

◆ IsUTF8()

static const char * IsUTF8 ( const char *  str)
inlinestatic

Checks UTF-8 validity.

Checks whether a null-terminated string is a valid UTF-8 bytes sequence.

Parameters
strstring to check
Return values
strthe string is a valid null-terminated UTF-8 sequence
NULLthe string is not an UTF-8 sequence

References likely, and vlc_towc().

Referenced by vlc_meta_Set().

◆ ToCharset()

void * ToCharset ( const char *  charset,
const char *  in,
size_t *  outsize 
)

Converts a nul-terminated UTF-8 string to a given character encoding.

Parameters
charseticonv name of the character set
innul-terminated UTF-8 string
outsizepointer to hold the byte size of result
Returns
A pointer to the result, which must be released using free(). The UTF-8 nul terminator is included in the conversion if the target character encoding supports it. However it is not included in the returned byte size. In case of error, NULL is returned and the byte size is undefined.

References unlikely, vlc_iconv(), vlc_iconv_close(), and vlc_iconv_open().

◆ utf8_fprintf()

int utf8_fprintf ( FILE *  stream,
const char *  fmt,
  ... 
)

Formats an UTF-8 string as fprintf(), then print it, with appropriate conversion to local encoding.

References utf8_vfprintf().

◆ utf8_vfprintf()

int utf8_vfprintf ( FILE *  stream,
const char *  fmt,
va_list  ap 
)

Formats an UTF-8 string as vfprintf(), then print it, with appropriate conversion to local encoding.

References likely, unlikely, and vasprintf().

Referenced by utf8_fprintf().

◆ vlc_strcasestr()

char * vlc_strcasestr ( const char *  haystack,
const char *  needle 
)

Look for an UTF-8 string within another one in a case-insensitive fashion.

Beware that this is quite slow. Contrary to strcasestr(), this function works regardless of the system character encoding, and handles multibyte code points correctly.

Parameters
haystackstring to look into
needlestring to look for
Returns
a pointer to the first occurrence of the needle within the haystack, or NULL if no occurrence were found.

References unlikely, and vlc_towc().

◆ vlc_towc()

ssize_t vlc_towc ( const char *  str,
uint32_t *restrict  pwc 
)

Decodes a code point from UTF-8.

Converts the first character in a UTF-8 sequence into a Unicode code point.

Parameters
stran UTF-8 bytes sequence [IN]
pwcaddress of a location to store the code point [OUT]
Returns
the number of bytes occupied by the decoded code point
Return values
-1not a valid UTF-8 sequence
0null character (i.e. str points to an empty string)
1(non-null) ASCII character
2-4non-ASCII character

References likely, and unlikely.

Referenced by EnsureUTF8(), IsUTF8(), print_desc(), vlc_str2keycode(), vlc_strcasestr(), vlc_swidth(), and vlc_xml_encode().