Wikipedia talk:Salting is usually a bad idea

Latest comment: 1 day ago by Alfa-ketosav in topic Titles

The rule of thumb I use is whether it seems more important to the repeat-creators to add their content at a specific title, or just anywhere they can. If someone's trying to get, say, their company's page on Wikipedia, salting and blacklisting usually works: they're not going to play l33tspeak games with the title like "Micr0søft Inc.", there's a finite limit of title variants they'll try at, and Special:Linksearch is usually pretty good at finding them at the stage between salting the first one or two titles and progressing to a blacklist regex. The other end is comparable to semi-protecting WP:AUTOBIO, which is just insane - every edit reverted from a page like that is one that was immediately seen, that never showed up in mainspace, and that usually didn't turn into a draft that someone had to review and decline and eventually delete. —Cryptic 22:41, 23 August 2022 (UTC)Reply

Edit filters

edit

Also I think edit filters work best for LTA, as they can't create but they may try to create the page. Thingofme (talk) 15:03, 28 August 2022 (UTC)Reply

Titles

edit

Actually, the titles cannot be longer than 255 bytes, not 256 characters. Given the bytes 00 to 1f (dec. 0 to 31) and 7f (dec. 127) are not valid characters in the titles, this means there are 223 possible bytes used.   This means that while there are still very many possibilities, their number is less than the square root of the one said in this essay. Alfa-ketosav (talk) 17:16, 2 April 2024 (UTC)Reply

Actually:

  • {, }, |, [, ], <, > and # can't be part of a title, meaning only 215 bytes are available.
  • a title's first byte can't be :, space or in the 80–bf range (these are those after the first bytes which also contain the number of bytes that can be used), reducing the available first bytes to 149.
  • F8 to FF is unused in UTF-8 due to compatibility reasons with UTF-16, reducing the number of available bytes to 207 (141).
  • C0 and C1 are unused to prevent longer-than-necessary byte sequences, reducing the different available bytes to 205 (139)
  • The final byte can't be higher than BF, so the last byte can have 150 different values. The penultimate can't be higher than DF, so that can have 181 different values, and the 3rd-to-last can't start with F, so that can have 197 values.
  • The space is treated equivalently to _, reducing these values to 204 (138 for the first, 149 for the last, 180 for the penultimate byte and 196 for the one before that).
  • Finally, the upper- and lowercase letters are treated equally in the start of the title, reducing its number of possible values to 112.

Thus, a better upper limit of the number of possible titles is   almost 20 billion times lower than the one above. Alfa-ketosav (talk) 18:24, 3 April 2024 (UTC)Reply

@Alfa-ketosav: Corrected. Although, does this account for the fact that sequences of multiple spaces and underscores are treated as a single space, and cannot end a title? (For instance, X_______X (band) actually just has one space and Lozman v. City of Riviera Beach, 585 U.S. ___ actually ends at ".") Or would that not change things to beyond an order of magnitude? -- Tamzin[cetacean needed] (they|xe) 00:30, 17 August 2024 (UTC)Reply
Partly (not for multiple spaces, yes for _). Alfa-ketosav (talk) 03:34, 17 August 2024 (UTC)Reply
I changed the above number due to errors I made when checking. Alfa-ketosav (talk) 12:42, 18 August 2024 (UTC)Reply

F5–F7 are also unused (I checked this by using a code point starting with F4), as the code point 10FFFF corresponds to F4-8F-BF-BF. Thus, the upper bound is  , c. 2.24% of the above number. Alfa-ketosav (talk) 12:42, 18 August 2024 (UTC)Reply

The first 2 bytes may not be b, c, d, f, m, n, q, s, v, w followed by a colon (ten 2-byte sequences). If the first byte is in C2–F4, the second byte must be in 80–BF (except for F4, where it is 80–8F due to limitations) (7035 sequences removed), and if the first byte is in 20–7E, the second byte cannot be in 80–BF (5568). "./" cannot occur as the beginning of a title (1), as it redirects to the title without "./". So the number of valid 2-byte starting sequences is no more than  . The theoretical maximal title number is thus  , less than half the above value. Alfa-ketosav (talk) 12:46, 19 August 2024 (UTC)Reply