User:Azylber/RaBOTnik/Task0/Question1

Hi guys,

I've been working on RaBOTnik's task zero. I've been dealing with millions of problems, for example people call the lang-ru template with so many crazy and invalid content that you wouldn't even imagine.

I've been filtering all the non-Russian stuff I've found inside calls to lang-ru. For example I've been filtering Latin letters, numbers, symbols, wrongly encoded characters, references (yes, some people cite sources inside calls to lang-ru...), links (yes, some people put wikilinks inside calls to lang-ru...) and so many other things...

But now, after about 1,000 lines of code and 5,000 hairs pulled out, I think I've now managed to filter everything that is not Russian names and create a proper list that contains each and every Russian name used in calls to lang-ru.

I've further depurated that list and I'm ignoring words where stress is obvious, such as the ones that have only 1 vowel and the ones that contain ё.

I've also found many errors in the existing calls to lang-ru, for example:

1) I've detected 600 calls to lang-ru that contain Russian words where some of the letters are not actually Cyrillic letters - they're Latin letters that look like Cyrillic letters. For example, Барковa. If you look closely, you will notice that the last a is not a Cyrillic a, it's a Latin one. I think it would be nice if RaBOTnik fixed those entries. I might incorporate it into task 1, or separate it as task 2.

2) When analysing the data, I found what I call "incompatible stress marks". That is, certain Russian words appear in different articles with the stress marks placed on different vowels. I think some of them might be errors, and some of them might be different words.

So now I'm going to show you the list of all the "incompatible stress marks" that I've found in the entire English Wikipedia. There are 49 pairs.

What I need your help with:

Please go through this list, and tell me, for each pair, which one is right and which one is wrong. Please put "--" next to the one that is correct. If both are valid, please put "++" next to both members of the pair.

Azylber (talk) 08:39, 25 September 2013 (UTC)

0	Ива́нов++<!--rare but valid; could be a Bulgarian last name or some such-->
1	Ивано́в++<!--a very common last name-->
2	Сергее́вич
3	Серге́евич--<!--a common patronymic-->
4	Абрамо́вич++<!--a valid last name-->
5	Абра́мович++<!--a valid patronymic-->
6	Каме́нский++<!--rare but valid; usually as the last name-->
7	Ка́менский++<!--a common adjective and a relatively common last name-->
8	Александро́вич++<!--rare; likely a last name-->
9	Алекса́ндрович++<!--a common patronymic-->
10	Само́йлович--<!--patronymic-->
11	Самойло́вич<!--might be a rare last name or just wrong-->
12	Антоно́вич<!--might be a rare last name-->
13	Анто́нович--<!--a common patronymic-->
14	Павло́вский++<!--rare but valid; usually as the last name-->
15	Па́вловский++<!--common adjective-->
16	Наумо́вич++<!--rare but valid; usually as the last name-->
17	Нау́мович++<!--a valid patronymic-->
18	Кушни́р<!--no idea, but this one 'sounds' more natural-->
19	Ку́шнир
20	Адольфо́вич<!--might be a rare last name or just wrong-->
21	Адо́льфович++<!--a valid patronymic-->
22	Лазаре́вич++<!--rare but valid; usually as the last name-->
23	Ла́заревич++<!--a valid patronymic-->
24	Фо́мич
25	Фоми́ч--<!--a valid patronymic-->
26	Владими́р
27	Влади́мир++<!--a common first name-->
28	Ва́димович<!--a common mistake-->
29	Вади́мович--
30	Жу́ковский++
31	Жуко́вский++
32	Ду́бовский++
33	Дубо́вский++
34	О́стровский++<!--rare-->
35	Остро́вский++
36	Воскре́сенский
37	Воскресе́нский--
38	Соко́льский--
39	Со́кольский<!--could be a rare usage-->
40	Ме́нделевич--<!--more likely to be a patronimic-->
41	Менделе́вич--<!--more likely to be the last name-->
42	Кото́вский++<!--a valid last name-->
43	Ко́товский--<!--a valid but rare adjective-->
44	Кога́н++
45	Ко́ган++
46	Новико́в
47	Но́виков--
48	Сусли́н++
49	Су́слин++<!--I think this one is more common-->
50	Лаби́нск
51	Ла́бинск
52	авто́номная
53	автоно́мная--
54	Се́рги--<!--a toponym; e.g. [[Verkhniye Sergi]]-->
55	Серги́
56	Вы́готский<!--no idea, but this one 'sounds' more natural-->
57	Выго́тский
58	Быко́вский++<!--rare but valid-->
59	Бы́ковский++<!--a common adjective-->
60	Ароно́вич++<!--rare but valid; usually as the last name-->
61	Аро́нович++<!--a valid patronymic-->
62	У́да
63	Уда́
64	Ко́рсаков++
65	Корса́ков++<!--may also be Корсако́в-->
66	Ни́колай
67	Никола́й--
68	Максими́лиан
69	Максимилиа́н--
70	Гурко́
71	Гу́рко
72	Кере́нский
73	Ке́ренский--
74	Кара́
75	Ка́ра
76	Оре́ст--
77	О́рест
78	О́ла
79	Ола́
80	И́льич
81	Ильи́ч--
82	Максимо́вич++<!--rare but valid; usually as the last name-->
83	Макси́мович++<!--a common patronymic-->
84	Ахту́ба
85	А́хтуба ++<!--correct, but not a person's name-->
86	Гу́ставович--<!--correct-->
87	Густа́вович++<!--rare but can be correct-->
88	Ру́бин++<!--a valid last name-->
89	Руби́н++<!--a noun meaning "ruby"-->
90	Быко́во <!--correct, but not a person's name-->
91	Бы́ково--
92	Ра́кетный
93	Раке́тный--
94	Чу́па
95	Чупа́ ++<!--correct, but not a person's name-->
96	Алексее́вич <!--incorrect-->
97	Алексе́евич--<!--correct-->