Talk:Vanishing gradient problem

This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles

Mid

This article has been rated as Mid-importance on the project's importance scale.

This article is supported by WikiProject Computer science.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Uh... what is the problem itself?

Latest comment: 5 years ago4 comments3 people in discussion

Shouldn't the article define what the problem is? --Doradus (talk) 02:29, 23 January 2015 (UTC)Reply

I made an attempt. It is difficult to explain this in a non-technical way. Bhny (talk) 17:12, 23 January 2015 (UTC)Reply

Well, I am a student in ML, I understand everything what article says, but it just says nothing about what the problem actually is. Linguiloce (talk) 14:04, 1 October 2016 (UTC)Reply

I just came back to this article, and found this quote: "The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value." Works for me. --Doradus (talk) 16:35, 2 December 2018 (UTC)Reply

Size of Problem?

Latest comment: 7 years ago1 comment1 person in discussion

How many nodes in an unfolded RNN are viable without LSTM? i.e. where is the practical cut off point where the gradient hasn't vanished? There must be some rule of thumb that if your patterns in time occur in less than N samples then you can use RNN. If greater than M samples you are better off with LSTM? robertbowerman (talk) 04:30, 9 February 2017 (UTC)Reply

Suggested rename: extreme gradient problem

Latest comment: 5 months ago3 comments3 people in discussion

I really don't see the point of having both vanishing gradient and exploding gradient pages. We just have two inbound redirects, and bold both inbound terms in the lead. Should be fine IMO. — MaxEnt 00:10, 21 May 2017 (UTC)Reply

It is a well known problem in ML and pretty much everyone calls it the vanishing gradient problem. Sometimes they'll say vanishing/exploding gradient problem, but even that is rare. I've never heard it called the extreme gradient problem. Themumblingprophet (talk) 02:21, 15 April 2020 (UTC)Reply

Fundamentally, the problem is about attractors in the parameter space of the error function; the problematic regions are stabilisers when you consider derivatives of this space parallel to various axes. This perspective is probably more abstract than the level at which most programmers operate, whereas "vanishing gradient" is reasonably concrete. 80.230.156.224 (talk) 08:55, 7 May 2024 (UTC)Reply

Add topic

Talk:Vanishing gradient problem

Contents

Uh... what is the problem itself?

Other solutions

Size of Problem?

Suggested rename: extreme gradient problem