Skip to content
  • Nikolai Kosjar's avatar
    C++: Support for UTF-8 in the lexer · 70122b30
    Nikolai Kosjar authored
    
    
    This will save us toLatin1() conversations in CppTools (which already
    holds UTF-8 encoded QByteArrays) and thus loss of information (see
    QTCREATORBUG-7356). It also gives us support for non-latin1 identifiers.
    
    API-wise the following functions are added to Token. In follow-up
    patches these will become handy in combination with QStrings.
        utf16chars() - aequivalent of bytes()
        utf16charsBegin() - aequivalent of bytesBegin()
        utf16charsEnd() - aequivalent of bytesEnd()
    
    Next steps:
     * Adapt functions from TranslationUnit. They should work with utf16
       chars in order to calculate lines and columns correctly also for
       UTF-8 multi-byte code points.
     * Adapt the higher level clients:
        * Cpp{Tools,Editor} should expect UTF-8 encoded Literals.
        * Cpp{Tools,Editor}: When dealing with identifiers on the
          QString/QTextDocument layer, code points
          represendet by two QChars need to be respected, too.
     * Ensure Macro::offsets() and Document::MacroUse::{begin,end}() report
       offsets usable in CppEditor/CppTools.
    
    Addresses QTCREATORBUG-7356.
    
    Change-Id: I0791b5236be8215d24fb8e38a1f7cb0d279454c0
    Reviewed-by: default avatarErik Verbruggen <erik.verbruggen@digia.com>
    70122b30