Jump to content

Talk:UTF-8

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Table should not only use color to encode information (but formatting like bold and underline)

[edit]

As in a previous comment https://en.wikipedia.org/wiki/Talk:UTF-8/Archive_1#Colour_in_example_table? this has been done before, and is *better* so that everyone can clearly see the different part of the code. Relying on color alone is not good, due to color vision deficiencies and varying color rendition on devices. — Preceding unsigned comment added by 88.219.179.109 (talkcontribs) 02:26, 17 April 2020‎ (UTC)[reply]

[edit]
   and Microsoft has a script for Windows 10, to enable it by default for its program Microsoft Notepad
   "Script How to set default encoding to UTF-8 for notepad by PowerShell". gallery.technet.microsoft.com. Retrieved 2018-01-30.
   https://gallery.technet.microsoft.com/scriptcenter/How-to-set-default-2d9669ae?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w&tduid=(1f29517b2ebdfe80772bf649d4c144b1)(256380)(2459594)(TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w)()

This link is dead. How to fix it? — Preceding unsigned comment added by Un1Gfn (talkcontribs) 02:58, 5 April 2021 (UTC)[reply]

That text, and that link, appears to have been removed, so there's no longer anything to fix. Guy Harris (talk) 23:43, 21 December 2023 (UTC)[reply]

The article contains "{{efn", which looks like a mistake.

I would've fixed it myself but I don't know how to transform the remaining sentence to make sense. 2A01:C23:8D8D:BF00:C070:85C1:B1B8:4094 (talk) 16:17, 2 April 2024 (UTC)[reply]

I fixed it, I think. I'm not 100% sure it's how the previous editors intended. I invite them to review and confirm. Indefatigable (talk) 19:03, 2 April 2024 (UTC)[reply]

Should "The Manifesto" be mentioned somewhere?

[edit]

More specifically, this one: https://utf8everywhere.org -- Preceding unsigned comment added by Rudxain (talk o contribs) 21:52, 12 July 2024 (UTC)[reply]

Only if it's got significant coverage in reliable sources. Remsense 22:10, 12 July 2024 (UTC)[reply]
It's kind of ahistorical, since the Microsoft decisions that they deplore were made while developing Windows NT 3.1, and UTF-8 wasn't even a standard until Windows NT 3.1 was close to being released. There was more money to be made from East Asian customized computer systems than Unicode computer systems in 1993, so Unicode was probably not their main focus at that time... AnonMoos (talk) 20:30, 15 July 2024 (UTC)[reply]

The number of 3 byte encodings is incorrect

[edit]

This sentence is incorrect:

Three bytes are needed for the remaining 61,440 codepoints...

FFFF - 0800 + 1 = F800 = 63,488 three byte codepoints.

The other calculations for 1, 2, and 4 byte encodings are correct. Bantling66 (talk) 02:56, 23 August 2024 (UTC)[reply]

You forgot to subtract 2048 surrogates in the D800–DFFF range. – MwGamera (talk) 08:58, 23 August 2024 (UTC)[reply]

Multi-point flags

[edit]

I'm struggling to assume good faith here with this edit. A flag which consists of five code points is already sufficiently illustrative of the issue being discussed. That an editor saw fit to first remove that example without discussion, and then to swap it out for the other example when it was pared down to one flag, invites discussion of why that particular flag was removed, and the obvious answer isn't a charitable one. Chris Cunningham (user:thumperward) (talk) 12:35, 17 September 2024 (UTC)[reply]

Yes it was restored to the pride flag for precisely the reasons you state. Spitzak (talk) 20:48, 17 September 2024 (UTC)[reply]

Why was the "heart" of the article, almost the whole section of UTF-8#Encoding (Old revision) removed instead of adding a note?

[edit]

NOTE: The section seems to have been renamed to UTF-8#Description in this edit.

I don't understand why such a large part of Encoding (old revision) was suddenly removed in this edit instead of either:

  • Adding a note about parts of it being written poorly.
  • Rewriting some of it. (the best and the most difficult option)
  • Carefully considering removing parts that were definitely redundant (such as arguably the latter part of Examples (old revision)).

The large removal edit I'm talking about was done by user:Thumperward.

I am strongly of the mind that the deleted text had the two most important parts of the whole article, which absolutely should be included:

  1. The Codepage layout (old revision), in my opinion the most important part of any article about a binary character encoding. This part was also in my opinion written exemplarily well here, I see no problems with it at all.
    - Precedents/Examples in other articles from the category:
  2. The first list (numbered 1..7) of Examples (old revision) that clearly, by simple example demonstrates how UTF-8 works. (I agree it could be rewritten, the language used is quite verbose)



As for the edit note:

→‎Encoding: this entire section is almost completely opaque and its inclusion stymies the addition of some clear prose describing how unicode is decoded

To me, it reads as if conflating UTF-8 with Unicode, and stating this as the reason the parts should be removed, but from the wrong article.

This article does talk about Unicode quite a bit more than I would like, maybe some of those parts should be moved to the Unicode article instead? (Such as the example of how some "graphical characters can be more than 4 bytes in UTF-8", which isn't caused by UTF-8 or any other encoding, but by Unicode itself and happens in ANY and all encodings capable of supporting those codepoints, not just UTF-8. Mossymountain (talk) 05:09, 20 September 2024 (UTC)[reply]