YOU CAN CODE!

 

With The Case Of UCanCode.net  Release The Power OF  Visual C++ !   Home Products | Purchase Support | Downloads  
View in English
View in Japanese
View in
참고
View in Français
View in Italiano
View in 中文(繁體)
Download Evaluation
Pricing & Purchase?
E-XD++Visual C++/ MFC Products
Overview
Features Tour 
Electronic Form Solution
Visualization & HMI Solution
Power system HMI Solution
CAD Drawing and Printing Solution

Bar code labeling Solution
Workflow Solution

Coal industry HMI Solution
Instrumentation Gauge Solution

Report Printing Solution
Graphical modeling Solution
GIS mapping solution

Visio graphics solution
Industrial control SCADA &HMI Solution
BPM business process Solution

Industrial monitoring Solution
Flowchart and diagramming Solution
Organization Diagram Solution

Graphic editor Source Code
UML drawing editor Source Code
Map Diagramming Solution

Architectural Graphic Drawing Solution
Request Evaluation
Purchase
VX++ Cross-Platform C/C++
Overview
Download
Purchase
ActiveX COM Products
Overview
Download
Purchase
Technical Support
  General Q & A
Discussion Board
Contact Us

Links

Get Ready to Unleash the Power of UCanCode .NET

VC++ Converting ANSI to Unicode with _MSC_VER, MBCS, Multiple Byte
Having just looked at ASCII strings to Unicode in C++[^], here's my preferred solution to this part of the never-ending story of string conversion:
 
 
#include <locale>
#include <string>
std::wstring widen(const std::string& str)
{
    std::wstring wstr(str.size(), 0);
#if _MSC_VER >= 1400    // use Microsofts Safe libraries if possible (>=VS2005)
    std::use_facet<std::ctype<wchar_t> >(std::locale())._Widen_s
        (&str[0], &str[0]+str.size(), &wstr[0], wstr.size());
#else
    std::use_facet<std::ctype<wchar_t> >(std::locale()).widen
        (&str[0], &str[0]+str.size(), &wstr[0]);
#endif
    return wstr;
}
 
std::string narrow(const std::wstring& wstr, char rep = '_')
{
    std::string str(wstr.size(), 0);
#if _MSC_VER >= 1400
    std::use_facet<std::ctype<wchar_t> >(std::locale())._Narrow_s
        (&wstr[0], &wstr[0]+wstr.size(), rep, &str[0], str.size());
#else
    std::use_facet<std::ctype<wchar_t> >(std::locale()).narrow
        (&wstr[0], &wstr[0]+wstr.size(), rep, &str[0]);
#endif
    return str;
}
 
Yes, it does look nasty - but it is the way to go in pure C++. Funny enough, I never found any good and comprehensive documentation on C++ locales, most books tend to leave the topic unharmed.
 
By using the standard constructor of
std::locale in the functions, the "C" locale defines the codepage for the conversion. The current codepage can be applied by calling std::locale::global(std::locale("")); before any call to narrow(...) or widen(...).
 
One possible problem with this code is the use of multi-byte character sets. The predefined size of the function output strings expects a 1:1 relationship in
size() between the string formats.
I found another option using the code_cvt facet. This code is a "bit" more complex but will also work with MBCSs such as codepage 932 (Japanese). I have tested it with some central european and japanese characters on VS2008:
 
 
#include <sstream>
#include <locale>
#include <string>

template<size_t buf_size = 100>
class cp_converter {
    const std::locale loc;
public:
    cp_converter(const std::locale& loc) :
        loc(loc)
    {
    }
    std::wstring widen(const std::string& in) {
        return convert<char, wchar_t>(in);
    }
    std::string narrow(const std::wstring& in) {
        return convert<wchar_t, char>(in);
    }
private:
    typedef std::codecvt<wchar_t, char, mbstate_t> codecvt_facet;
 
    // widen
    inline codecvt_facet::result cv(
        const codecvt_facet& facet,
        mbstate_t& s,
        const char* f1, const char* l1, const char*& n1,
        wchar_t* f2, wchar_t* l2, wchar_t*& n2) const
    {
        return facet.in(s, f1, l1, n1, f2, l2, n2);
    }
 
    // narrow
    inline codecvt_facet::result cv(
        const codecvt_facet& facet,
        mbstate_t& s,
        const wchar_t* f1, const wchar_t* l1, const wchar_t*& n1,
        char* f2, char* l2, char*& n2) const
    {
        return facet.out(s, f1, l1, n1, f2, l2, n2);
    }
    template<class ct_in, class ct_out>
    std::basic_string<ct_out> convert(const std::basic_string<ct_in>& in)
    {
        using namespace std;
        const codecvt_facet& facet = use_facet<codecvt_facet>(loc);
        basic_stringstream<ct_out> os;
        ct_out buf[buf_size];
        mbstate_t state = {0};
        codecvt_facet::result result;
        const ct_in* ipc = &in[0];
        do {
            ct_out* opc = 0;
            result = cv(facet, state,
                ipc, &in[0] + in.size(), ipc,
                buf, buf + buf_size, opc);
            os << basic_string<ct_out>(buf, opc - buf);
        } while ((ipc < &in[0] + in.size()) && (result != codecvt_facet::error));
        if (codecvt_facet::ok != result) throw std::exception("result is not ok!");
        return os.str();
    }
};
 
In order to use the class template, create an object from it with the locale you want to use. The
widen(...) member will the convert text from the selected locale/code page to std::wstring. The narrow(...) will convert wide characters to a std::string in the selected locale/code page.
 
cp_converter<> conv_polish(std::locale("Polish"));
 
// test LATIN SMALL LETTER D WITH STROKE
assert(conv_polish.widen("\xF0") == L"\x0111");
 
// When a wide character is converted to a code page where it is not represented, 
// it is not possible to control the outcome manually
cp_converter<> conv_english(std::locale("English"));
// VC++ 2008 will return LATIN SMALL LETTER D wich is close match.
assert(conv_english.narrow(L"\x0111") == "\x64");
 
Important: Setting
buf_size to odd values (e.g. 99) may result in buffer overflows in buf. It seems that VC++ 2008 disregards the value of l2 in facet.out(...) if a wchar_t is expanded to more than one char and this happens to be at the last byte of the buffer. As mcbs windows code pages are limited (that is just a guess, correct me if i'm wrong) to 1 or 2 bytes for each character, it should be safe to set buf_size to any even number.
 
Important II: The C++ standard does not state what the native encoding should be - neither for
char nor for wchar_t. It just happens to be ANSI and UTF-16 in VC++ on Windows. Other C++ compilers may be standard compliant but using other encodings. Then this tip would still convert encodings but the result won't be as expected.

News:

1 UCanCode Advance E-XD++ CAD Drawing and Printing Solution Source Code Solution for C/C++, .NET V2024 is released!

2 UCanCode Advance E-XD++ HMI & SCADA Source Code Solution for C/C++, .NET V2024 is released!

3 UCanCode Advance E-XD++ GIS SVG Drawing and Printing Solution Source Code Solution for C/C++, .NET V2024 is released!


Contact UCanCode Software

To buy the source code or learn more about with:

 

Ask any questions by MSN: ucancode@hotmail.com Yahoo: ucan_code@yahoo.com


 

Copyright ?1998-2024 UCanCode.Net Software , all rights reserved.
Other product and company names herein may be the trademarks of their respective owners.

Please direct your questions or comments to webmaster@ucancode.net