Talk:Hamming distance

Latest comment: 24 days ago by Rudxain in topic Normalized HD

Distance vs. difference edit

Why the difference between the bits called as "Hamming distance" and not "Hamming difference" —Preceding unsigned comment added by 163.152.45.51 (talkcontribs)

I don't know the historical reasons, but I can think of a couple of plausible explanations. First, Hamming distance forms a metric and functions with that property are often called distances. Second, the difference of A and B is a quantity that, when added back to B, gives A again; that is, it is inverse to addition. Hamming distance doesn't have that property, so it doesn't make sense to call it a difference. One function that is called a difference and is closely related to Hamming distance is the symmetric difference. —David Eppstein 04:53, 5 January 2007 (UTC)Reply

2-D graph edit

What about the hamming distance between two points in a 2-D graph for example? —The preceding unsigned comment was added by 128.200.69.104 (talkcontribs).

Source code edit

Per Wikipedia:WikiProject Computer science/Manual of style (computer science) (a draft in progress, but I think very relevant), "Multiple source code implementations are not appropriate unless they contrast specific aspects of the code and that contrast is important to the encyclopedic content of the article. If possible, accentuate differences by providing the alternate implementation in the same language as the original." The article had three implementations, not very different; I kept the most compact of them (the Python), modified the most different of them to be even more different (the C++, which uses numeric arguments rather than strings, and which I modified to take time proportional to the number of differing bits rather than the total bitlength of the arguments to further differentiate it from the Python), and removed the third example (Javascript) which I think does not add anything vis-a-vis the other two: it is not significantly different algorithmically from the Python, and its added length relative to the Python has more to do with Javascript programming than anything relevant to Hamming distance. For reference, here is the removed code:

function hamdist(s1, s2)
{ 
   // We alert user if the two strings differ in length.
   if (s1.length != s2.length) {
      alert("Both sequence must be of the same length!");
      return -1;
   }
   
   var distance = 0;      // zero = no difference between two strings.
   
   // Match the two strings, character by character.  If they are NOT
   // identical, increment distance by 1.
   for(var i = 0; i < s1.length; i++)
      if (s1.charAt(i) != s2.charAt(i))
         distance++;
   return distance;
}

David Eppstein 06:46, 8 June 2007 (UTC)Reply

I also note that the C implementation is buggy. The numbers should be unsigned ints. Dr Tom Conway (talk) —Preceding undated comment was added at 03:06, 5 August 2008 (UTC)Reply

Someone added a C# implementation, and incorrectly copied its running time from the C implementation; I corrected this (and also clarified the Python description). I question whether we even want this third example in a different language, though. Joule36e5 (talk) 00:09, 20 July 2013 (UTC)Reply

I have removed it, as I don't think it adds anything useful — for numeric input, the algorithm of the C implementation is always better. —David Eppstein (talk) 00:44, 20 July 2013 (UTC)Reply

on a grid edit

The article once claimed

On a grid (such as a chessboard), the points at a Hamming distance of 1 constitute the von Neumann neighborhood of that point.

That seems incorrect to me. Would it be best to simply delete this sentence from the article?

I temporarily fixed this to

On a grid (such as a chessboard), the points at a Lee distance of 1 constitute the von Neumann neighborhood of that point.

but that begs the question: What are the points on a chessboard with a Hamming distance of 1 from some particular point?

  • If we represent the points with a 2 symbol string ("a1", "h7", etc.), then points on a chessboard with a Hamming distance of 1 from that point represent possible rook's graph single-move destinations of a rook on that point.
  • If we represent the points with a bit string, then points on a chessboard with a Hamming distance of 1 from that point represent various 2x1 "boxes" used in a Karnaugh map -- for large maps in N binary variables (N = 6 variables to get a map the size of a chessboard), those N "neighboring" points may appear to be quite far away (in Euclidean distance) -- is there a name for this kind of neighborhood?
  • Other ways of representing points, such as the bit string representing the distance along a Hilbert curve covering the chess board -- as used in a map of the internet -- seem like they might give yet another set of points with a Hamming distance of 1 from some particular point.

Do any of these neighborhoods really help this article explain "Hamming distance"? --68.0.124.33 (talk) 01:39, 30 December 2009 (UTC)Reply

Not only do we have Hamming distance 1 equivalent to single rook move, but more generally, Hamming distance measures the number of rook moves required. This seems like a good intuitive connection, so I made the change. I didn't explicitly mention that it remains true in more than two dimensions, because that would seem to require defining the rook as a piece that changes exactly one dimension at a time, and ultimately this is just a restatement of Hamming distance anyway. Joule36e5 (talk) 19:48, 21 July 2013 (UTC)Reply

I think your recent edits are definitely an improvement, but there is a real sense in which the sentence at the top of this talk section about the von Neumann neighborhood is correct. Namely, grid graphs are examples of partial cubes, graphs that can be labeled by bitvectors so that graph distance equals Hamming distance, and with that labeling the 1-ball of a point is its von Neumann neighborhood. I suspect it would be more confusing than helpful to say that in the article, though. —David Eppstein (talk) 20:15, 21 July 2013 (UTC)Reply

Error detection edit

I've removed the new section on error detection. Firstly, it is unsourced; it should at the very least refer to the Error detection and correction article. Secondly, it confuses the general concept of Hamming distances with the more specific concept of the the minimal Hamming distance of an error-detecting code. Thirdly, it is incorrect; an error-detecting code with minimal Hamming distance dmin can reliably detect up to dmin - 1 errors and correct dmin - 2 errors. DES (talk) 15:05, 21 March 2011 (UTC)Reply

I've added a referenced version. Your math is wrong. 5.12.85.61 (talk) 02:46, 2 May 2015 (UTC)Reply

Dead link edit

The link for original Hamming article is dead. — Preceding unsigned comment added by 189.35.30.138 (talk) 12:07, 1 June 2011 (UTC)Reply

while(val) edit

I'm 10 years out of practice with C. What does while(val) check for? A non-zero value for val? A non-negative value for val? May want to clarify in the article. --1000Faces (talk) 22:28, 22 September 2013 (UTC)Reply

A nonzero value. And this article is not the place to discuss the basic syntax of a widely used programming language. —David Eppstein (talk) 05:14, 23 September 2013 (UTC)Reply

"Humming" Distance? edit

I've found quite a few references to "Humming distance", and I'm not sure if these are just typos or some subtle variation on Hamming distance.

The phrase appears in print fairly often, but mostly seems to be in translated far-east work, for example "Manufacturing Challenges in Electronic Packaging", edited by Yung-Cheng Lee, W.T. Chen, page 249.

Google lists 400K hits for "Hamming distance", and 4K hits for "Humming distance", about half of which are obvious typos or OCR errors. There are only 1K hits for "Hemming distance", which all seem to be errors.

My operating assumption is that "Humming distance" is just a typo for "Hamming distance". 137.229.78.7 (talk) 05:11, 10 December 2013 (UTC)Reply

correction to Python example edit

The current Python example code is:

def hamming_distance(s1, s2):
    #Return the Hamming distance between equal-length sequences
    if len(s1) != len(s2):
        raise ValueError("Undefined for sequences of unequal length")
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

But that simply tells us how many characters are different (that is, how many bytes are difference), not how many bits are different. I discovered this discrepancy as I was solving a cryptographic challenge. The given hamming_distance function says that the Hamming distance between this is a test and wokka wokka!!! is 14; the true distance is 37. Therefore I am replacing the code in Hamming distance with:

def hamming_distance(s1, s2):
    #Return the Hamming distance between equal-length sequences
    if len(s1) != len(s2):
        raise ValueError("Undefined for sequences of unequal length")
    bytes1, bytes2 = bytearray(s1), bytearray(s2)
    diff = 0
    for i in range(len(bytes1)):
        if bytes1[i] != bytes2[i]:
            diff += bin(bytes1[i] ^ bytes2[i]).count("1")
    return diff

and I welcome improvements. Sumana Harihareswara 20:35, 15 October 2014 (UTC)Reply

The Hamming distance between two lists of things is the number of things that are different. The things do not have to be bits. —David Eppstein (talk) 20:40, 15 October 2014 (UTC)Reply

Too Narrow edit

The hamming distance is more general than just string difference calculations. It can be thought of more generally as the number of differences of some attribute between two objects. This allows for the string distance currently shown as well as distances between nodes as shown in the figures. I suggest that the page be made more general with a listing of applications to include the string differences and graph applications, along with some others.Cephaeus (talk) 16:24, 7 April 2015 (UTC)Reply

Metric space rather than vector space? edit

In the article currently: "For a fixed length n, the Hamming distance is a metric on the vector space of the words of length n (also known as a Hamming space), as it fulfills the conditions of non-negativity, identity of indiscernibles and symmetry, and it can be shown by complete induction that it satisfies the triangle inequality as well.[1] ". I'm not confident enough in my understanding to alter this but I believe that it's not quite right. Specifically:

  • satisfying the requirements of being a metric gives you a metric space, but not necessarily a vector space (after all Levenshtein distance is also a metric - satisfies triangle inequality etc).
  • Hamming distance gives you a vector space only for binary strings; otherwise the Hamming space is a metric space with the Hamming distance as a metric (but surely not a vector space?)

If this is correct it should be stated explicitly; right now it sounds like Hamming distance always gives you a vector space and that this is merely because it fulfills the metric conditions. 118.209.2.85 (talk) 05:35, 5 October 2015 (UTC)Reply

You're right, it's not in general a vector space (e.g. not for alphabets of size six). I changed "vector space" to "set". —David Eppstein (talk) 06:10, 5 October 2015 (UTC)Reply

Doesn't the same problem apply to the phrase in the beginning of the article: "block codes, in which the equal-length strings are vectors "? It seems to me that the word "vector" is misused here. Helenuh (talk) 19:55, 18 August 2022 (UTC)Reply

Complexity edit

The article currently states that the running time of the C function hamming_distance is proportional to the Hamming distance of its inputs rather than the number of bits in the inputs—in other words, proportional to the output, dist. This appears to be incorrect. Unless the /= and > operators are assumed to require 0 time, the running time would appear to be proportional to   for  .

P. N. Hilfinger (talk) 22:44, 24 March 2019 (UTC)Reply

That algorithm is suitable for quantities representable in machine arithmetic values (int, long, or long long). It is very standard to assume in algorithm analysis that single bitwise operations on machine words take a single unit of time. This is in the RAM model of computing, which matches pretty well what actual computers do. What model of computing are you using that can run C code and yet takes a non-constant amount of time for an equality test? —David Eppstein (talk) 03:40, 25 March 2019 (UTC)Reply

Definition edit

This article would benefit from a definition section with proper references. — Preceding unsigned comment added by AVM2019 (talkcontribs)

Done (section and one ref, anyway). Dicklyon (talk) 05:03, 13 June 2020 (UTC)Reply

Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming. Hamming distance edit

Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming.

Hamming distance 223.186.35.189 (talk) 00:48, 16 February 2022 (UTC)Reply

Normalized HD edit

There is no mention of "relative Hamming Distance", defined as HD(a, b) / length Rudxain (talk) 03:54, 9 April 2024 (UTC)Reply