Talk:International Components for Unicode

Latest comment: 5 years ago by Adah1972 in topic Support of "illegal UTF-8"

Support of "illegal UTF-8" edit

This sentence reads very strange:

for C/C++ UTF-8 is supported, including "illegal-UTF-8".

I checked the reference, and it turned out the link meant that ICU began to process "illegal UTF-8" as per the best practice. It seems hardly a "support of illegal UTF-8". On the contrary, it is active banning of illegal UTF-8.

I suppose the wording should be changed to either "including correct processing of illegal UTF-8", or we should remove the part after the comma altogether.

I'll do the the former if no one objects.

--Adah1972 (talk) 11:50, 10 September 2018 (UTC)Reply

Changed.

--Adah1972 (talk) 11:02, 14 September 2018 (UTC)Reply

October 2006 edit

I've just expanded the article a little and made some corrections. I used 2 sources not referenced in the article:

At one stage, IBM sold a C++ kit called the "Taligent Internationalization Library". I don't know if this came from CommonPoint, or was an early name for ICU4C.

Perhaps the article should mention a major design difference between ICU and C locales? In ICU, locales are just labels that a program can use to load an appropriate formatter, date converter, string bundle, etc. In C, locales carry all the locale-specific information with them, so one setlocale() call can change all locale-related settings.

Cheers, CWC(talk) 07:42, 15 October 2006 (UTC)Reply

Neutrality edit

User Hdante (talk · contribs) has tagged the article with Template:POV-check for the words "much richer", in the sentence

ICU provides much richer internationalization facilities than the standard libraries for C or C++, and most operating systems.

Being richer than standard C or C++ is quite easy. Being richer than "standard Unix" isn't much harder.

Presumably Hdante is concerned that ICU is not "much richer" than operating systems such as Windows (see Uniscribe) and OS X (see ATSUI). Note that Uniscribe and ATSUI both provide rendering, whereas ICU does not. Can someone familiar with Uniscribe and ATSUI tell us how they compare to ICU for text processing? Cheers, CWC(talk) 03:10, 5 November 2006 (UTC)Reply

The term "richer" is in question!? edit

C and C++ have almost no internationalization features when compared to .Net or Java. The POSIX API specification or Unix, which some people confuse to be a part of the C programming language, has some basic internationalization features. Unfortunately, most of the POSIX internationalization framework requires a whole application to use one locale at a time through setlocale instead of allowing multiple locales in use for a multithreaded application.

C and C++ do not include or promise the following:

  • A Unicode based regular expression engine in order to handle text in multiple languages
  • Unicode based collation algorithm and language sensitive string searching
  • Handle BiDi issues
  • Handle all Unicode properties needed for proper handling of text in multiple languages
  • Calendars besides the Gregorian calendar
  • An extensive timezone API. The majority of POSIX implementations don't even provide the full Olson timezone ID or rules for the timezone.
  • Promise that Unicode is always available. There are many legacy codepages that are not portable enough to use reliably in source code.
  • ... and many other features.

If you're a pure Windows programmer, it's more difficult to say that ICU has a richer internationalization framework than Windows. Windows has a great internationalization framework integrated into the OS that is available throughout C and C++. Mac OS X also has some great internationalization features, but Mac OS X already uses ICU for many of these features (so it's not a useful point to compare ICU to ICU in this case).

There is a reason why "most" is used in this sentence. There are many other operating systems besides Windows that don't provide good internationalization features without ICU, like Linux, Free BSD, Net BSD, Open BSD, z/OS, i5/OS, Palm OS, Solaris, AIX and many other lesser known operating systems. This is why many companies already use ICU. If ICU didn't have a richer internationalization API than what C/C++ provided, no one would be using it. ICU is used by many companies and open source projects out there.

ICU's layout engine is just a small part of ICU4C.

In general, I agree with CWC's comments.

User:UTF-8 (User:Rursus added by manually examining history: comment was from 08:13, 15 November 2006)

1. Four tildes (~~~~) please!
2. I think "richer" is too vague a phrasing, and too POVvy. I'll change it to "more extensive" or some such. Rursus dixit. (mbork3!) 06:48, 10 June 2011 (UTC)Reply

appears to be a variant edit

The cited ICU license is the MIT-X11 license, which for whatever reason appears to be slighted by OSI. It's used by ncurses, e.g., as noted here TEDickey (talk) 01:13, 3 February 2014 (UTC)Reply

Nice page, thanks for the link. At some point in time before 1995 porting PC Curses 1.4 to something with less or at least more interesting bugs and nearer to SVR3.2 was a part of my job; later published as version 3.5.1.1 ;-)
For the purposes of NetRexx MIT and Expat are similar enough: Expat License redirects to the MIT License mess, and commons:Template:Expat is a proper subset of commons:Template:MIT.
If you're confident fix the affected page(s) as you see fit. IANAL, I was worried about "All rights reserved" and about the missing upper case. But apparently that is no problem — the commons template even requires the copyright blurb, and the MIT license article here shows the upper case variant as, well, variant. –Be..anyone (talk) 08:46, 5 February 2014 (UTC)Reply
done TEDickey (talk) 01:41, 6 February 2014 (UTC)Reply

edit

SVG logo at http://source.icu-project.org/repos/icu/icuhtml/trunk/design/iculogo/iculogo.svg if somebody willing to login wants it here and uploads it to commons, the ICU license could be something like {{MIT|International Business Machines Corporation and others|Expat|199?-201?}}, matching the previous section or whatever the SUBJECTPAGE says. Or {{PD-textlogo}} if it's one of your bold days.  2A03:2267:0:0:4CC8:8386:D33A:A45E (talk) 12:25, 1 August 2016 (UTC)Reply

Update: 1999-2016, since May ICU is a part of Unicode with a similar license. 2A03:2267:0:0:3516:3A59:1E44:7C41 (talk) 19:59, 1 August 2016 (UTC)Reply