Wikipedia talk:Google Books and Wikipedia

Latest comment: 6 months ago by Joe Roe in topic New related essay
WikiProject iconEssays Low‑impact
WikiProject iconThis page is within the scope of WikiProject Wikipedia essays, a collaborative effort to organise and monitor the impact of Wikipedia essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.
LowThis page has been rated as Low-impact on the project's impact scale.
Note icon
The above rating was automatically assessed using data on pageviews, watchers, and incoming links.

Title edit

ALL CAPS title feels silly but Wikipedia:Google Books goes elsewhere. What about Wikipedia:Google Books considered harmful or Wikipedia:Avoid Google Books or Wikipedia:Say no to Google Books or whatever? Nemo 23:45, 7 March 2020 (UTC)Reply

I don't care. I just need something to remember easily and the all caps is an easy way, but it can also be a redirect. But then someone else will rename it to GOOGLEBOOKSHARMFUL or something difficult to remember :) Just wanted something short and simple. -- GreenC 16:14, 8 March 2020 (UTC)Reply
Wikipedia:No Google Books? WP:NGB is free! Nemo 18:17, 8 March 2020 (UTC)Reply
That could work but it kind of sounds like a rallying cry to eliminate GB which might be a trigger (granted the essay does take that tone anyway). The idea is to communicate how unreliable GB is so maybe WP:Google Books Reliability / WP:GBR would be a neutral descriptor. -- GreenC 19:29, 8 March 2020 (UTC)Reply
Hm not sure "reliability" is wide enough to cover problems about for-profit exploitation, privacy, information science standards etc. Maybe "suitability"? But then it's your essay! Nemo 19:37, 8 March 2020 (UTC)Reply
Thanks your feedback is helpful. What about WP:Google Books and Wikipedia / WP:GBWP  ? -- GreenC 20:19, 8 March 2020 (UTC)Reply
Sounds like a good solution! Nemo 21:41, 8 March 2020 (UTC)Reply

Link trend edit

The number of links in articles didn't change much between December 2019 and February 2020 (I'm still waiting for the March 2020 dump):

$ bzip2 -dck enwiki-20200220-pages-articles-multistream.xml.bz2|grep -c books.google
1365146
$ bzip2 -dck enwiki-20191220-pages-articles-multistream.xml.bz2 | grep -c books.google
1366451

Nemo 09:10, 12 March 2020 (UTC)Reply

The numbers are quite different in the 2020-03-01 dump, which shows a shift of about 60k URLs (GB 1306076; IA 400382, from 346353 3 months earlier). Nemo 15:19, 15 March 2020 (UTC)Reply


Unbalanced opinion and faulty arguments edit

This essay starts off on the wrong foot with its lede statement:

"The document helps to educate why we prefer not to use Google Books (GB) where better options exist. "

The essay is clearly attempting to speak for all editors by saying "...why we prefer...", and has not made the argument that "...better opinions exist".

Regarding the proposed reasons..."Why we should stop when possible :"

  • "Commercial book seller vs. non-profit archival library."
It makes no difference to editors if Google offers a book for sale, the option to buy is the choice of a given editor. In fact, it's convenient that an editor can see where to purchase a book right off the bat. Further, if you click on Buy this book in print, google gives you several links to click on, including those for Oxford University Press, Amazon.com, Barnes&Noble.com, Books-A-Million, and IndieBound. See example.
  • "WP:Verifiability. Links that break create problems with verification."
Any url link can go south, including the ones at archive.org If and when they do, we fix them as we've done for more than 16 years.
  • "WP:Link rot. Links should be reliable and stable."
This is just repeating the same idea as the above statement. Again, there is no guarantee that any URL link will last forever.
  • "Lacks Controlled Digital Lending. Free previews are great, but viewing the complete book for free is better."
Yes, previews are great, and if we take away the google links we deprive thousands of editors of this option. Editors are still free to go to archive.org to view the book, if that option is available. Like google books, viewing is often limited. If they want to borrow a book they have to sign up and establish an account. Often times there is a wait list to borrow a given book. Or, they can always find an inexpensive used copy at eBay and have the book in their hands within days, the next day, if they're in a hurry. None of this is a reason to stop using google books here at WP.

Given the above realities we actually have a good case here to prefer the use of google books as sources in many cases. This is not to dump on archive.org. They are a great source for public domain literature. Currently, active editors make good use of books from both google and archive.org. We should keep it that way.

See a further discussion about this here. -- Gwillhickers (talk) 23:16, 30 March 2020 (UTC)Reply

For the sake of historic reference let me just state that I have raised questions about this all a short while ago (initially archived without answer). It is particularly disturbing that this pushed by editors who get paid by Archive.org (and they don't even bothered to mention it properly on their User page, only updating after I raised questions). The attitude of GreenC in this all is terrible, to put it mildly: when asked, by someone else, for a change of the previous ALL CAPS page title, user's answer "I don't care", followed by a suggestion someone could change it to GOOGLEBOOKSHARMFUL. Shamelessly. My unpleasant feeling is that those who are behind the extreme pumping up of Archive.org – compare also the mass archiving of Flickr-pages, of uploads that have passed review already (in a trial the reviews were even removed), of which I was told it's just because FlickrReview is not up to date, etc. – are not bothered by any critics, because they get their monthly cheque, contrary to volunteers. I have already enough struggles going on, but I really don't know why paid editors with zero consideration for their surroundings should continu working on Wikimedia Commons and Wikipedia. Eissink (talk) 00:01, 31 March 2020 (UTC).Reply

Can you call off your bot? edit

This discussion was archived, but has been moved here for those still concerned about this issue.

Re this edit The bot switched from several google books, which in most cases is preferred, to those found at archive.org. Several of the new links get error results also. Is it possible to let active editors decide where to link to? Sometimes we indeed use archive.org, but not as a rule, and usually only in cases where a publication can't be found via google books. Would appreciate it greatly if the editors who are most familiar with the publications were allowed to make the decision. Thanx for your efforts. Cheers. -- Gwillhickers (talk) 19:40, 29 March 2020 (UTC)Reply

Gwillhickers, Wikipedia is moving in the opposite direction now, and the bot is just realizing that movement. Please see WP:GOOGLEBOOKS.—CYBERPOWER (Chat) 18:38, 30 March 2020 (UTC)Reply
There are many books whose google url address still holds. What guarantee do we have that the url links at archive.org will last forever? Many publications are not listed on archive.org, where do we go then? Amazon? The only thing we use google for is to verify source info. i.e.isbn #, date of publication, number of pages, etc. Many of the books your bot linked to have different isbn numbers, while some of them have no isbn numbers listed, like this one, and this one. As url links go in general, they are used in citations throughout WP, and could also change. Regardless if google books tries to encourage personal accounts, they are only used as a book reference and have nothing to do with public domain concerns here. Many WP articles refer to books not in the public domain, so I fail to see any real concern about google on that account. Also, regarding this statement: "a link to a 1985 edition might in the future become the 2019 edition because the publisher released a new edition." Nearly all reprints have the exact same format, page per page. What guarantee do we have that archive.org will not switch to a different year of publication? Given the faulty arguments presented in this 'essay', what I'm seeing is something written by someone, a disgruntled publisher perhaps, who has personal issues with google because the writer is obviously reaching for reasons for everyone else not to use it. It seems using this bot is creating more problems than we already have, such that they are. Until WP establishes a Policy, based on a clear consensus and sound reasoning, that forbids linking to a google book, it is not right to try to prod editors to do so by means of a bot. The best approach is to bring the issue up in the proper forum. -- Gwillhickers (talk) 20:48, 30 March 2020 (UTC)Reply

Gwillhickers, I made some changes. When you revert the bot it will learn and respect that and not restore the IA links, this was a design oversight not intentional - it took me a while to understand what you were saying about 'forbidding Google links', but now that I see your predicament it's understandable because the bot was acting like a dictator. If it continues to fight your editorial decisions let me know ASAP, I now understand the situation. The bot has mostly completed its work, it converted less than 10% of the installed Google book links, and is now only running on newly added links which results in less than 100 changes a week vs. thousands of new Google links added during the same period. Google the elephant and IA the mouse. -- GreenC 04:52, 31 March 2020 (UTC)Reply

Thank you for your words of understanding. Hope I didn't come off too curt. Just a thought about url addresses in general. As I've indicated elsewhere, any url address can go south at any time. That's the nature of the beast we call the internet. Regarding all the new google books arriving: will these books also be assigned archive.org addresses, and if so, aren't we just passing the problem along to a different url? I'm not sure focusing on Google links is accomplishing anything, as the bulk of WP articles use websites/url addresses in their citations, with new ones appearing every day also. Again, there's no guarantee any url link is more stable than any other, including google's. These concerns are further expressed on the Essay Talk page here. Best' -- Gwillhickers (talk) 19:47, 31 March 2020 (UTC)Reply
It's no problem and sorry for the bad experience I can totally understand the frustration of a bot reverting over and over. Archive.org runs the WaybackMachine which is the largest web archiving site in the world (100s of billions of pages). They are in the business of link stability, there is probably no more reliable website in existence because it is their core "business" (a non-profit Internet archival library). Google we know by verifying their links on Wikipedia via bot and running stats they have link stability problems. I've never seen an Internet Archive book link stop working, though I suppose it is possible such as taking down a book at the request of an author, they don't really have a dead link problem that I can detect. There is a policy to use non-profit over profit when possible in the same way we discourage linking books at amazon.com -- GreenC 20:12, 31 March 2020 (UTC)Reply
Well, you linked to a guideline, not a policy. In any case, from my experience, over the last 13+ years, google book links have been stable. In the rare case that one goes south, the template still carries the title, author name, date, isbn number, etc, making it only a matter of routine to search for the latest url link. Also consider that many WP articles make use of millions of url links in their citations. When you also consider that all the articles on video games, etc, very often have citations that link to the company's website, (promotional) I just can't bring myself to seeing google books as anything that editors should make a specific issue over. ... -- Gwillhickers (talk) 22:23, 1 April 2020 (UTC)Reply

The issue continues edit

@Eissink: and anyone else interested — After being told by GreenC that the IA bot would not return to the American Revolutionary War article it has been back at least twice. See latest discussion here. -- Gwillhickers (talk) 21:24, 19 April 2020 (UTC)Reply

@Gwillhickers: see also the discussion currently on Wikipedia:Village_pump_(policy)#Stop_InternetArchiveBot_from_linking_books. Eissink (talk) 02:09, 16 June 2020 (UTC).Reply

Thank you for writing this edit

I find it shocking that a site like Wikipedia would wilfully link to a Google service. It goes completely against the ethos of freedom that this project was built on. Not only do we have editors linking to Google Books for sources, which is bad enough, we also have multiple templates which link to Google (for instance, "find sources"). While I understand that we're currently pretty much trapped in a world of technology not yet fit for purpose because it isn't yet ethical, it's still frustrating to see this happening. Something really needs to be done. How can this problem be solved overall? I think it will take a lot of effort and time, but sitting idle and doing nothing will only allow the problem to get worse and worse, as has been the case for years now. DesertPipeline (talk) 07:20, 11 March 2021 (UTC)Reply

@DesertPipeline: How can this problem be solved overall? As you can see above in § Can you call off your bot?, there is (or at least there was) a bot that converts Google Books links to Internet Archive links in citation templates. As you may know, major publishers sued the Archive last year and that lawsuit is ongoing. If the Archive loses that lawsuit then the accessibility of many (non-free) books hosted at the Archive may change. Biogeographist (talk) 15:45, 11 March 2021 (UTC)Reply
The Google conversion bot only ran for about 6 weeks it has not run since. The lawsuit only concerns full access (borrowing complete digital copies). Preview access, like Google does, is perfectly legal, so even if archive.org looses the full-access lawsuit, preview links still work. Notably, the court has not ordered archive.org to cease offering full-access while the case is proceeding. -- GreenC 16:46, 11 March 2021 (UTC)Reply
Wikipedia linked to Google Books during its epic 10+ year lawsuit with the publishers. There was never a discussion or concern raised by the community over the legality of these links, even though the legality was still an open question. Also, we have about 12:1 ratio of Google to Internet Archive links, and this ratio continues to climb. We are and have been a Google Book shop for whatever reasons. -- GreenC 16:46, 11 March 2021 (UTC)Reply
It just shows you how deeply embedded this problem is. The unethical technology giants are "winning" right now, but in the end, everyone loses, including them, due to their actions...
I think the first step is to to foster a culture on Wikipedia where people understand the importance of freedom. This is something that I think has to happen in general for technology woes to be solved. It would be all well and good if developers could somehow be convinced to be ethical, and develop libre software, but if the general public doesn't understand why it's important, they won't defend their freedom, and we'll lose it.
Some other ethical problems we need to solve on Wikipedia:
1. Non-neutrality towards libre software. I realise that we can't call it "free/libre software" and be done with it, unfortunately, but at the very least we could use a neutral name for it rather than something that suggests "free" refers to "zero price". Right now, I see a lot of "free and open source" on Wikipedia. People will assume the "free" means gratis. "free (libre) and open source" would be a neutral term, and the FSF supports it, but quite frankly I'm not sure if even that works; I don't consider the two to be very comparable, considering one is about ethics and the other doesn't bring that up at all. I prefer the term "free (libre) or open source", although I suppose that also has problems. I don't really know. Any better suggestions?
2. Links to YouTube on Wikipedia. It makes me very uncomfortable any time I see those – we're sending people right into the jaws of Google spyware without so much as a warning. I realise that we don't do disclaimers in articles, but at the very least, couldn't a box pop up when clicking a YouTube link that warns the user of the dangers? Then they can make an informed choice regarding whether or not they actually visit it at least.
3. Loaded terms on Wikipedia. They're embedded deep within public consciousness, and that's no surprise, but we need to fix this. Terms like "consumer" (which is practically meaningless anyway – for instance, "consumer electronics". How many people do you know that eat electronics? In a non-literal sense, all it's really communicating is that "people use this". Thanks for the clarification, I wasn't sure if it'd be people or space aliens...!) – or "intellectual property" (a made-up concept to group together several different and unrelated laws and to make people think of ideas and concepts in the same way as physical items), and other words that I'm probably forgetting too. (Two useful resources: https://www.gnu.org/philosophy/words-to-avoid.html https://www.gnu.org/philosophy/not-ipr.html)
4. Not exactly related to Wikipedia itself, but the Wikimedia Foundation has encouraged the use of proprietary software. There was some video conference last year (I couldn't figure out what it was about), but to my horror the page mentioned Zoom – yes, that Zoom. I was disgusted. The Wikimedia Foundation are supposed to be committed to libre software, aren't they? If even they won't refuse to use it in some instances, then what message do they send to others? Even worse in this case because whoever wanted to participate in the conference had to run proprietary software.
There are probably more, but that's all I could think of right now. DesertPipeline (talk) 03:26, 12 March 2021 (UTC)Reply
@DesertPipeline: Free and open-source software is a widely used term. Sure, we all have our preferences and wish that another term were more widely used than some term, but this talk page isn't the place to try to change the usage of a widely used term. Regarding the word "consumer", you may enjoy this anthropological discussion: Graeber, David (August 2011). "Consumption" (PDF). Current Anthropology. 52 (4): 489–511. doi:10.1086/660166. JSTOR 10.1086/660166. Google may be evil, but I must confess that deep down in my darkest most evil self I love Google Scholar. Biogeographist (talk) 13:52, 12 March 2021 (UTC)Reply
Be sure to check out scholar.archive.org if not already. Not sure how it compares with Google. -- GreenC 14:57, 12 March 2021 (UTC)Reply

New related essay edit

I recently wrote Wikipedia:Product placement to try and explain why we should avoid projectspace recommendations of commercial products like Google Books. It was triggered by a discussion about links to Google in the AfD instructions. – Joe (talk) 06:05, 8 November 2023 (UTC)Reply