Wikipedia talk:Wikipedia Signpost/2023-01-01/Technology report

Latest comment: 1 year ago by Bennylin

Boards approve high level plans not detailed implementation. Doc James (talk · contribs · email) 06:21, 1 January 2023 (UTC)Reply

  • I think the report's critiques of the current progress are fair and useful, but I'm actually glad that the WMF is taking a shot on the project. It might work out, it might not, but it's the kind of interesting, innovative idea that's well worth exploring. —Ganesha811 (talk) 13:21, 1 January 2023 (UTC)Reply
  • For those reading this who might not be aware Doc James has extensive WMF Board experience and knows what he is talking about. Best, Barkeep49 (talk) 16:43, 1 January 2023 (UTC)Reply
    • And in fact he was one of the Board members who approved the resolution in question.
      Speaking of the Board, one of the current trustees who appears to be most informed about the project may be the recently re-elected Shani, she has been conducting several interview-like conversations with Denny about it. This part (from May 2022) is interesting with regard to the question about the scope of Wikifunctions: In response to a question about what Wikifunctions is and why it is needed, Denny spends several minutes explaining the pedestrian Wikipedia-focused use cases like calculating a person's age, but never mentions the expansive vision of democratizing programming, providing a platform "where scientists and analysts could create models together", etc. Regards, HaeB (talk) 19:12, 1 January 2023 (UTC)Reply
      This makes it sound like the full scope of Wikifunctions was somehow obscured or hidden. It never was.
      Here are quotes from the Introduction from the April 2020 paper: "Wikilambda is a wiki of functions. Functions take some input and return an output. [...] Wikilambda aims at making access to functions and writing and contributing new functions as simple as contributing to Wikipedia, thus aiming to create a comprehensive repository of algorithms and functions." Note that Wikilambda was the working name for the project, and was later changed by a community vote to Wikifunctions.
      The project plan which was presented to the community also introduced the new wiki project as "a project to create, catalog, and maintain an open library (or repository) of “functions”, which has many possible use cases." (see the proposal for a new sister project, May 2020 version) You can find the vision described in detail and its full extend already in the May 2020 overview page.
      Also the mostly weekly news updates frequently discuss the scope of functions, e.g. in its second edition, discussing the scope of functions, where we explicitly speak about democratizing functions; when introducing the mission statement, or when talking about inclusion despite math, or diversity and equity in programming.
      I think there are many valid criticisms that can be raised against Wikifunctions, but implying that we were hiding the "expansive vision of democratizing programming", either before the project started or during its ongoing development, is not one of them. --DVrandecic (WMF) (talk) 19:20, 2 January 2023 (UTC)Reply

I'm always disappointed when a response is entirely defensive. It offers a terrible start at a dialog, if nothing else, and will predictably divide responses into polarized sides. I sympathize more with the critique than the response, myself, and so perhaps this isn't a neutral reaction, but I really wish the official foundation response had found opportunities to embrace criticism and a few opportunities to admit that a change in direction might be warranted. Ie, a "these are the critiques we feel are more valid" instead of a blanket "none of the critiques are valid". You don't have to agree, but offer a counterproposal at least. Doubling down on the original plan with no changes after the initial years experience seems to indicate management failure, regardless of the technical merits. No project survives initial implementation completely unchanged. C. Scott Ananian (talk) 15:43, 1 January 2023 (UTC)Reply

I agree with you that it would be very disappointing if a response were entirely defensive. If I were to solely rely on The Signpost's reporting above, it might easily seem that way. Fortunately, the entire evaluation is available - and it is lengthy, as The Signpost correctly states. As we write in the response: "We have or plan to implement many of the recommendations the fellows have made regarding security, the function model, the system’s usability, etc. Many of those did not make it in either the evaluation or this answer, as both documents focused on the remaining differences, and less on the agreements."
There are a few recommendations we do not agree with. But with many we agreed, and we either already implemented them, sometimes together with the fellows, or have tasks on our task board to implement them, many before launch. --DVrandecic (WMF) (talk) 19:31, 2 January 2023 (UTC)Reply
Denny, it's very disappointing to see you double down on such deceptive communications tactics here. I'm excerpting below here the full set of recommendations from m:Abstract Wikipedia/Google.org Fellows evaluation#Recommendations:

Wikifunctions edit

  • Wikifunctions should extend, augment, and refine the existing programming facilities in MediaWiki. The initial version should be a central wiki for common Lua code. [...]
  • Don’t invent a new programming language. [...] It is better to base the system on an existing, proven language.
  • The Z-Object system as currently conceived introduces a vast amount of complexity to the system. If Wikifunctions consolidates on a single implementation language (as we believe it should), much of the need for Z-Objects goes away. If there is a need to extend the native type system provided by the chosen implementation language, it should be with a minimal set of types, which should be specified in native code. They likely do not need to be modifiable on wiki.

Abstract Wikipedia edit

  • The design of the NLG system should start with a specification of the Abstract Content [...].
  • Rather than present to users a general-purpose computation system and programming environment, provide an environment specifically dedicated to authoring abstract content, grammars, and NLG renderers in a constrained formalism.
  • Converge on a single, coherent approach to NLG.
  • If possible, adopt an extant NLG system and build on it. One of two alternatives we mentioned above is Grammatical Framework, which already have a vibrant community of contributors.
  • Alternatively, create a new NLG system adapted to the scope and contributor-profile of Abstract Wikipedia, as previously suggested by Ariel Gutman
As far as I can see, you have dismissed every single of these 8 recommendations (three for Wikifunctions and five for Abstract Wikipedia). What's more, a CTRL-F through the entire document shows that these are the only statements that the authors refer to as "recommendations" (or where they use the term "recommend").
So Cscott seems quite correct in describing your reaction to the evaluation's criticism as a blanket rejection, at least with regard to the the resulting recommendations. The "many of" handwaving to obscure that fact and the goalpost-shifting (nobody had claimed that you had disagreed with every single thing the fellows had said outside this evaluation) really don't look good.
Regards, HaeB (talk) 21:20, 3 January 2023 (UTC)Reply
@HaeB: Apologies. You are right, we indeed reject these eight recommendations. To explain what I meant: throughout the evaluation the fellows give many suggestions or proposals and raise many a good point, and of these, we accepted many. We have implemented many, and others are currently open tasks and planned to be implemented. I did express myself badly. I apologize for using the wrong word here. It is indeed an excellent point that, if a paper calls a section "Recommendations", that if I refer to recommendations, that it should mean the points in the recommendations, and not the generic sense of "things that are suggested throughout the paper". Sorry! --DVrandecic (WMF) (talk) 22:53, 4 January 2023 (UTC)Reply

I feel like the team's response to the criticism kind of missed the mark. The criticism raised some risks, and then suggested some solutions. The response seemed to focus on the suggested solutions and why they didn't go with them originally, which isn't what i would call the meat of the criticism. The meat of the criticism comes down to pretty generic concerns - too much waterfall, not agile enough (e.g Trying to do everything all at once), too much NIH syndrome, too much scope creep, not enough focus on an MVP. These are all very common project management risks to focus on in the tech world, and there are many solutions. The critics suggest one possible thing to do, but certainly not the only things possible. I would expect a response to something like this talk about how those risks will be mitigated (or dispute the significance of these risks), not just talk about how they don't like one potential solution. I also am pretty unconvinced with the appeal to avoid cultural bias. Not because i dont think that is important, but because it is being treated as a binary instead of something to minimize. Yes, its an important thing to try to reduce, but you will never fully eliminate as everything anyone done is informed by cultural context. It is a risk that needs to be balanced against other risks. You can't think of it as something to eliminate entirely as the only way to do that is to do nothing at all. Bawolff (talk) 22:05, 1 January 2023 (UTC)Reply

These are great points, thank you. The response indeed focused very much on the points of disagreement, and not so much on the agreements. A lot of the things we agreed with have already been implemented, or are in the course of being implemented. This particularly includes the project management risks you call out. It is, for example, thanks to the fellows that we refocused, which allowed us to launch the Wikifunctions Beta during the fellowship. The fellows also contributed to our now much more stable and comprehensive testing setup. We have already and are continuing to reduce scope, and to speed up the launch of Wikifunctions proper, to focus more on an MVP given the place we are now.
Some of the criticisms that are raised though are difficult to fix: we would love to have two dedicated teams, one to work on Wikifunctions, one to work on Abstract Wikipedia, but for that, we do not have the resources available. Other criticisms would have made a lot of sense to discuss in 2020 around the original proposal, but seem less actionable now, given the development in the meantime, e.g. the Python and JavaScript executors are already implemented and running on Beta.
I found the evaluation very helpful. I promise that I will keep the evaluation in mind. We will continue to focus to get us to an MVP and to get us to launch. That is our priority. --DVrandecic (WMF) (talk) 19:52, 2 January 2023 (UTC)Reply
Agreed w bawolff on "the appeal to avoid cultural bias." I hope the team finds ways to work w / extend GF or equivalent! And hope a global template repo is still one of the core early goals, since it is mentioned prominently in both the initial design and in this critique.
I am delighted to see this depth and clarity of discussion about the scope and impact of a Project, this is something we have been missing across Wikimedia for some time. Thanks to all involved for tackling new ideas substantive enough to warrant this. – SJ + 16:36, 3 January 2023 (UTC)Reply
  • To find out if something will work, at some stage it is necessary to test it in the real world, but if it fails that does not always prove that the concept was wrong. Shit happens, not all of it predictable. Some flexibility is often useful. I supported this project at proposal stage, not because I knew it would succeed, but because I thought it might succeed. If it works, fantastic. If it doesn't, we might be able to work out why, and not in the superficial "I told you so" sort of way. I wish it success, and the luck these things often need. · · · Peter Southwood (talk): 09:57, 3 January 2023 (UTC)Reply
There are two kinds of cultural bias involved, really. In terms of content, there is a cultural bias built into Wikidata anyway, just on the basis of Wikidata demographics (Western views, interests, preoccupations, etc.). The linguistic bias, in terms of being able to handle agglutinative or ergative grammars etc., is a different one. I think it will have a negligible impact on community demographics and the amount of content bias there is (I don't foresee large number of, say, Niger-Congo language speakers coming in and taking over if their language can be handled well).
Personally, I've always been worried that Wikidata and Abstract Wikipedia will create a sort of digital colonialism, not least because the companies likely to benefit most are all in the US, and multilingual free content is their ticket to dominating new markets currently still closed to them. Andreas JN466 17:00, 3 January 2023 (UTC)Reply
Leaving aside Wikidata (where the Wikimedia approach has basically succeeded with "semantic web" ideas by intelligent selection), I would say that the Silicon Valley approach to language translation is firmly based at present on machine learning, massive corpus computation, and other empirical ideas. What Abstract Wikipedia intends, as can be seen already in painstaking lexeme work, is so different as almost to be considered orthogonal to current orthodoxy. The outputs from the abstract syntax are heavily conditional. If you can give a formal description of how enough sentences work in language L, and can supply enough accurate translations for nouns, verbs etc. into L from abstracted concepts, you can start getting paragraphs of Wikipedia-like content, typically of assertoric force on factual subjects. All this can generate debate and refinement of the linguistic inputs via L; and possibly cultural feedback too. It seems a long way from quick wins such as machine translation offers now, and the time scale is around ten years to see what "production mode" might mean. (I base some of this on conversations around Cambridge with people having relevant business experience.) Charles Matthews (talk) 12:35, 4 January 2023 (UTC)Reply
Charles, according to our article on it, the idea for Abstract Wikipedia was first developed in a Google paper (see also HaeB's intro above) and we're discussing the input of Google Fellows seconded to the project, whose stated aim was "to support the backend of Wikifunctions, enabling the team to speed up their schedule" (see Wikipedia:Wikipedia_Signpost/2022-06-26/News_and_notes). So I wouldn't think that Google and others have no interest in this work. Simple articles in languages like Yoruba, Igbo or Kannada, drawing on Wikidata's vast storehouse of data, would surely be a boon to any search engine maker wanting to display knowledge panels in languages that are currently poorly served and have very small online corpora, and the same goes for makers of voice assistants. (Having said that, I wouldn't discount the possibility that machine translation may advance fast enough to significantly reduce the perceived need for something like Abstract Wikipedia.) Andreas JN466 14:08, 4 January 2023 (UTC)Reply
I wasn't discounting your argument as it applies to Wikidata. I expect Google and other big players (what's the current acronym?) would be interested in AW just because they should "keep an eye on" this part of the field. The approach in machine learning can take decisions well enough from noisy data, can make money and so on. Basically it extends the low-hanging fruit available. Trying to get AW to populate (say) the Luganda Wikipedia in order to create "core" articles in medicine, chemistry and biology is a very different type of project. It is fundamentally about mobilising people rather than machines. Wikimedia should try to get to the point where it is a routine application of known technology.
To get back to the contentious point here: if there was no real need for innovative tech in this area, I would think a rival to AW would have been announced by now (and even a spoiler project started). I would say the type of innovation required probably has unpredictable consequences, rather than predictable ones. It should increase somewhat the connectedness of the Web. By the way, the voice assistant expert I talked to obviously thought the basic approach was wrong. Charles Matthews (talk) 16:06, 4 January 2023 (UTC)Reply
@Charles Matthews I'm having trouble parsing this part of your reply: if there was no real need for innovative tech in this area, I would think a rival to AW would have been announced by now (and even a spoiler project started). If there was no need for innovative tech, someone would have announced a rival?
One interesting example given in the presentation on Meta was the Simple English article on Jupiter (see m:Abstract_Wikipedia/Examples/Jupiter#Original_text. Having CC0 or CC BY-SA (this is a decision the Abstract Wikipedia team has dithered about; last I looked, it was postponed) articles like this available in dozens of languages, spoken collectively by hundreds of millions of people in Asia, Africa, South America and Oceania, would surely be of interest to voice assistant makers. I can't imagine that they would turn their noses up at it, given that they're all over Wikipedia as it is.
The other question is whether articles like this, written in simple English, are actually within reach of machine translation capability today. (DeepL certainly produces perfect translations of that Jupiter article.)
I always thought an alternative – or complementary – approach to serving more languages might be to actually put some energy into Simple English Wikipedia: write texts about key topics in that project that are designed for machine translation and avoid all the things translation engines (and learners of English) have problems with – and then advising users to let emerging translation engines have a go, having them translate articles on the fly, and reviewing what problems remain.
This might be easier and quicker than the WMF and a very limited subset of volunteers coming up with
  • Wikifunctions grammar,
  • thousands of articles written in that grammar – essentially a special meta-language for writing articles only used in that project – and
  • natural-language generators to translate this metalanguage into dozens upon dozens of human languages.
I understand that the idea is to leverage Wikidata content, making ultimately for a far more powerful product; I just fear it might take decades, i.e. so long that by the time it could be done, everybody else will have moved on. Andreas JN466 10:38, 5 January 2023 (UTC)Reply
@Jayen466: Well, you may even be right about the feasibility of "simple English" as a starting point for some progress in the direction of creating "core" content. I was thinking, rather, of the application of existing linguistic theory to provide some substitute for AW: there were mails early on its list saying "you do realise that certain things are already done/known to be very hard" about the prior art in this field. I don't know that prior art, so I can't comment. The approach being taken is blue skies research. It has my approval for that reason: if once a decade the WMF puts resources behind such a project, that seems about right.
The use of lexemes in AW is a coherent strategy. Wikidata has begun to integrate Commons and Wikisource with the Wikipedias in a way I approve of also. What I wrote below about the actual linguistic approach being adopted is something like "brute-force solution mitigated by the use of functional programming in a good context for it". It hasn't been said explicitly, but seems quite possible as an eventual outcome, that the Module: namespace in Wikipedia would, via Wikifunctions, be broadened out to include much more diverse code than is currently used, with just Lua. That is all back-office infrastructure, but promising for the future of Wikipedia (which is managed conservatively from the tech point of view).
There are people asking all the time for more technical attention across Wikimedia, and they will go on doing that. We see some incremental changes on Wikipedia, of the meat-and-potatoes kind. It seems to me satisfactory that there is an ambitious project launched in 2020 which might look pretty smart by 2030. In any case I come down here on Denny's side, rather than supporting the critics, because it seems care and thought is going into the design, as opposed to a race for quick results. My own limited experience of software pipelines suggests that everyone is a critic. Charles Matthews (talk) 11:20, 5 January 2023 (UTC)Reply
  • I am interested in the project Wikifunctions and I hope that it will be easy enough to contribute to it. In the last 2 years I created several scripts to convert a input to source code. I focused on Spreadsheet functions and self defined blocks from visual programming languages Snap and Scratch and hope that Wikifunctions can help democratizing programming.--Hogü-456 (talk) 21:28, 3 January 2023 (UTC)Reply
  • I have been following the AW project with considerable interest from the outset. I have some (entirely theoretical) background in functional programming (FP), but am not in any sense a developer. I come at all of this rather obliquely. For me, there have been a number of "reveals" from Denny, and at some point I felt convinced that what AW was trying to achieve in the language field was worthy if ambitious. FP is well-worn ground in computer science, but in some sense it still feels like a "European" approach (was true 30 years ago, for sure). The assembler-level approach to FP is by combinators, and to "get" the idea you have to understand that combinator to constructor is not a big step, from a purely algebraic point of view. To get a universal view of language translation, you need something of the power of rewriting combinators to other combinators, using rules. Then you have a chance of treating language syntax on its merits, case by case. I think Leibniz might have got that point. Then in order to have an implementation of that rewriting that is modular, transparent, clean, all the good virtues in what will be massively complex ultimate code, you should invoke FP because it is best in breed for those things. (I apologise to Denny if I have misunderstood. FP in itself is not innovative, but the Wikifunctions take is post-Python, to put it shortly, in terms of programming language design.) I can quite see that traditional project management starts by saying "how much of this do you actually need, and what here is just nice-to-have?". The WMF emphasis on linguistic universality argues against premature rationalisation here, is what I also see. Charles Matthews (talk) 12:12, 4 January 2023 (UTC)Reply
  • Other than the sections on asynchronous content rendering speed and issues with support for multiple programming languages (which I don't think I can give a sufficiently informed opinion on), I disagree with a lot of the criticism in the evaluation, and agree with a lot of the team's response. Most critically:
    • I don't think that an Abstract Wikipedia project separated from Wikifunctions with a separate NLG system, would be feasible from a volunteer community-development perspective.
    • I very strongly agree with this line from the team's response: "[E]very step of the process of natural language generation should be visible, defeasible, and subject to participation by the broader community. As much as possible, the line between producer/consumer or technologist/user must be made to evaporate."
  • We can't be dependent on a separated "back-end" away from wiki procedures, and still remain a Wikimedia-style effort. (As an aside: I'm seeing some parallels between this dispute and the argument about whether Wikidata should've had a hard-coded ontology and hard-constrained properties/data structure.) --Yair rand (talk) 03:40, 5 January 2023 (UTC)Reply
  • "would be feasible from a volunteer community-development perspective." Would this new programming language have any non-Wiki uses? If not who would learn it? Wakelamp d[@-@]b (talk) 08:35, 5 January 2023 (UTC)Reply
  • It’s not the case that I “disputed the Foundation's characterization of her concerns”, as if it were a false statement. I clarified and refined it in my comment. A sort of tl;dr of that lengthy comment is that, I do, in fact, state that I agree with the Foundation’s response that the Fellow’s statement of “There are sufficiently general NLG systems which could cover all (written) human languages (for example: Grammatical Framework, templatic system)” deserved to be critiqued and corrected. Those two systems currently won’t do for many languages. I gave a short explanation why they don’t. It is those reasons that are different from those mentioned in the Foundation’s answer. (and, fwiwi: I do have lots more to say about AW as well, but that won’t quite fit buried in a comment to an article. TBC.) Keet10 (talk) 11:32, 9 January 2023 (UTC)Reply
  • allowing the functions in Wikilambda to be called from wikitext - THAT is not happening without consensus, and there's going to be quite a bit of opposition considering the endless disruption and warfare ever since the ability to call Wikidata was added to wikitext. We don't even have consensus on whether Wikidata is acceptable for use in infoboxes, yet Wikidata enthusiasts are constantly wasting time shoving Wikidata calls everywhere else - which inevitably results in more time wasted on RFCs to ban/rollback those deployments and then yet more labor wasted actually preforming deletion or reversal of it all. Alsee (talk) 15:06, 14 January 2023 (UTC)Reply
  • For those interested hearing more about this, I have been interviewed to Yaron Koren's podcast Between the Brackets giving some more context about the evaluation and answering some of the counter-arguments given in the answer. Ariel Gutman (talk) 18:53, 18 January 2023 (UTC)Reply
  • I think that this project is aiming to become the (open) alternative for WolframAlpha. I see some similarities on how the focus on NLP, and creating functions behind it, might be beneficial, but I think the wheel have been invented, maybe try to work together with Stephen Wolfram instead? Hopefully this project doesn't repeat the mistake of Knowledge Engine (search engine) trying to compete with Google. Bennylin (talk) 09:46, 28 February 2023 (UTC)Reply