Talk:Double-precision floating-point format

This is the talk page for discussing Double-precision floating-point format and anything related to its purposes and tasks.
This is not a forum for general discussion of the subject of the article.

Add new text under old text.
New to Wikipedia? Welcome! Learn to edit; get help.

Start a new topic

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 6 months

Computing: Software Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Low-importance).
	This article is supported by Computer hardware task force (assessed as Low-importance).

Semi-protected edit request on 28 December 2024

Latest comment: 1 year ago9 comments5 people in discussion

This edit request has been answered. Set the |answered= parameter to no to reactivate your request.

1.) change: 'The sign bit determines the sign of the number (including when this number is zero, which is signed).' into: 'The sign bit determines the sign of the number (including when this number is zero, which is signed). "1" stands for negative.'

2.) change: 'The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16). If a decimal...' into: 'The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16) for "normal" numbers, denormal values have graceful degrading precision down to only one bit for the smallest value different from zero. If a decimal...'

3.) add a section "Additional info and curiosities" above "Notes and references" with the following content: '== Additional info and curiosities == The IEEE 754 standard allows two different views / decodings for the numbers, see Section 3.3 "Sets of floating-point data" in 2019 ver. of the standard. One described above with a fractional understanding of the significand and a bias of 1023 for the exponent, the other understanding the significand as binary integer, 2^52 times larger, and in turn the bias for the exponent 52 larger, 1075, which produces smaller effective exponents and by that the same final result. The fractional view is common for binaryxxx datatypes, while the integral is for decimalxxx datatypes.' 176.4.142.98 (talk) 23:37, 28 December 2024 (UTC)Reply

Not done: please provide reliable sources that support the change you want to be made. MadGuy7023 (talk) 23:41, 28 December 2024 (UTC)Reply

While (1) and (2) are almost OK for me (just note that the standard term is "subnormal", not "denormal"), (3) does not make sense; it is so badly written that I can hardly see what the user wants to say; there is a possible confusion between what the standard describes for its internal specification and what is allowed to do (by whom?). — Vincent Lefèvre (talk) 01:30, 29 December 2024 (UTC)Reply

@Vincent Lefèvre: if you feel correct information 'badly written' just improve instead of suppressing. As well in the standard as in wikipedia.

176.4.142.98 (talk) 10:48, 29 December 2024 (UTC)Reply

@MadGuy: ( nice name ), the reliable source is the standard itself, 1) and 2) are obvious, for 3) I pointed to the section, more detailed quote:"It is also convenient for some purposes to view the significand as an integer; in which case the finite floating-point numbers are described thus: ...".

176.4.142.98 (talk) 10:47, 29 December 2024 (UTC)Reply

For (3), you are misreading the standard. Concerning the ability to view the significand as an integer or some other way, this is a generality (independent from the IEEE 754 standard) already covered by both Floating-point arithmetic and Significand (if not detailed enough, these articles could be improved). — Vincent Lefèvre (talk) 11:43, 29 December 2024 (UTC)Reply

Not done for now: please establish a consensus for this alteration before using the {{Edit semi-protected}} template. – Anne drew (talk · contribs) 03:54, 31 December 2024 (UTC)Reply

Hello, I think for points 1.) and 2.) we have consensus, and they provide valuable information. For 3.) it's difficult to find consensus with Vincent Lefèvre, he's a notorious 'no no' reverter, and prefers his very own understanding of 'good' or right. IMHO the info provided is correct, is qualified, is backed by citation, and is valuable for users to see the differences between the encodings and understandings, else some may be irritated about the different options. To keep the main article 'clean' I proposed to put into the separate section as described, but it is relevant info and should not be suppressed because one special user is not common with it. As the citation / the IEEE 754 standard paper is behind a paywall and can't be checked by everybody I provide a longer citation:

"In the foregoing description, the significand m is viewed in a scientific form, with the radix point
immediately following the first digit. It is also convenient for some purposes to view the significand as an
integer; in which case the finite floating-point numbers are described thus:
― Signed zero and non-zero floating-point numbers of the form (−1)s ×b q ×c, where
― s is 0 or 1.
― q is any integer emin ≤ q + p − 1 ≤ emax.
― c is a number represented by a digit string of the form
d0 d1 d2...dp −1 where di is an integer digit 0 ≤ di < b (c is therefore an integer with 0 ≤ c < b p).
This view of the significand as an integer c, with its corresponding exponent q, describes exactly the same
set of zero and non-zero floating-point numbers as the view in scientific form. (For finite floating-point
numbers, e = q + p − 1 and m = c × b1− p.)"

This info isn't widespread, but is relevant, at least for people who want to understand / deal with binary and decimal datatypes. The info provided is correct, Vincent's 'you read wrong' is simply wrong, he know's about the point and accepts the info elsewhere, but - for whatever reason - doesn't want it in this article. That's personal preference, technical / enceclopedical it belongs into this article because this datatype is affected. If it's 'not well written' I encourage every experienced editor to improve, but do not suppress! So pls. implement or explain why not. 176.4.135.141 (talk) 15:35, 31 December 2024 (UTC)Reply

These two views are just used for the internal specification in the standard ("In the foregoing description"). There are no requirements on which view(s) to choose by implementations (for their own descriptions, API, etc.). For instance, the ISO C language chooses a 3rd one, where the fractional point is before the first digit (most significant digit). Note also that the article Floating-point arithmetic about the generalities already mentions the above two views as they are quite general common views, often used in practice. Moreover, while the text from the IEEE 754 is clear, yours is unclear and has various mistakes. For instance, there are two (internal) views, but decoding is not affected (and the above citation has nothing to do with decoding). — Vincent Lefèvre (talk) 09:17, 3 January 2025 (UTC)Reply

More on the two (or maybe three) "views"

Latest comment: 3 months ago3 comments3 people in discussion

I think the underlying issue here is that it can be hard for a newcomer to understand exactly where the radix point is. I had a lot of difficulty with this the first time I started actually digging into IEEE754 floating point at the bit level, and I assume lots of other learners do, too.

Everybody knows (every description explains) that there's a fractional part fff, a hidden bit H, and an exponent ee. But if you don't know any better, there are at least three possibilities for how to put them together to compute the represented value:

Hfff. × 2^ee
H.fff × 2^ee
0.Hfff × 2^ee

Now, in fact, IEEE754 primarily uses formulation #2, and most descriptions of IEEE754 do, too. (The other important fact is that the formulation H.fff × 2^ee holds regardless of whether the hidden bit H is 1 or 0, that is, whether we're dealing with normal or subnormal numbers. For the subnormal numbers, of course, there's an additional wrinkle with the value of ee.)

Formulation #1, on the other hand, though it's not typically used when discussing IEEE754 floating point, does have a certain amount to recommend it. In particular, represented that way, the significand is an integer, which may make manipulating it easier. I recently noticed that formulation #2 is used rather extensively by Muller et al. in their Handbook of Floating-Point Arithmetic, where they call the exponent in that representation the 'quantum'.

And then to complete the picture, at least if you're a C programmer, formulation #3 is effectively what the standard library function frexp gives you.

(Needless to say, for formulations 1, 2, and 3 to represent the same value, they all have to use different values for ee, differing by offsets equal to the number of significand bits, ±1.)

Now, I realize that I haven't said anything here that Vincent Lefèvre hasn't said, or that 176.4.1xx.xxx hasn't said, or that the passage from IEEE 754 cited by 176.4.1xx.xxx hasn't said, or that various Wikipedia articles haven't said — somewhere. I'm summarizing this just to make the point that although it's all second nature to the experts, it can really be pretty hard to "get" at first, so it's worth working to make sure that our description(s) are clear and complete (but hopefully also concise). The mechanics of floating point formats are — necessarily but perhaps unfortunately — spread out in lots of articles: in the descriptions of specific formats like this one and Single-precision floating-point format and Quadruple-precision floating-point format, but also the more general articles like IEEE 754 and Floating-point arithmetic. There's a delicate balance to be struck between saying everything everywhere, versus having thumbnail summaries in most articles but referring to one, central article for the gory details.

We're probably doing a good enough job of striking that balance already — I'm not trying to suggest otherwise. And going back to my first point, this article, at least, does make it nice and explicit where the radix point is. (I wonder if it's been rewritten since the time I had such trouble understanding this point?). But there's always room for improvement, and I think having an aside, somewhere, along the lines of IEEE754's "It is also convenient for some purposes to view the significand as an integer" would be useful. —scs (talk) 14:47, 11 January 2025 (UTC)Reply

You should be careful mentioning radix point, when IEEE 754 includes decimal formats. The decimal formats, first, don't have a hidden 1, but also the position of the decimal point might be different from the binary point. Gah4 (talk) 08:49, 16 February 2026 (UTC)Reply

The "hidden bit" is actually a notion that occurs only at the encoding level, and BTW, I now think that it is better to view it as being encoded in the exponent field E (as this was mentioned by Eric Postpischil in the stds-754 list): it is 1 for E > 0 (i.e. the normal numbers), and 0 for E = 0 (i.e. the subnormal numbers and zero). So, E encodes both the actual exponent e and the leading bit; and the exponent e_min has 2 associated values of E: E = 1, for which the leading bit is 1, and E = 0, for which the leading bit is 0.

@Scs: In the Handbook of Floating-Point Arithmetic, the quantum is defined as the weight (or unit) of the last represented digit, i.e. β^e−p+1, like in the IEEE 754 standard. And FYI, #3 is also used by GNU MPFR as it makes more sense in multiple precision: thus the radix point falls on a word boundary; and when doing computations on the exponent and the precision, this avoids "+1" in formulas compared to #2.

— Vincent Lefèvre (talk) 15:40, 16 February 2026 (UTC)Reply

"Fortran was one of the first languages to provide DPFP"

Latest comment: 2 months ago9 comments3 people in discussion

While this statement may be literally true, stating it so bluntly in the lead suggests that computer programmers were waiting for Fortran or any other programming language to provide DPFP. In fact DPFP developed in the programming community in the ~~late 1940s~~ early 1950s, about as soon as there was digital hardware to run it and before any automatic compilers. Fortran did not "provide" DPFP until sometime between 1958 and 1961. ~2026-79235-7 (talk) 23:24, 13 February 2026 (UTC)Reply

The 704 was the first commercial computers to have floating point hardware at 36 bits. In software, you can do floating point in any size you want, though I suppose one can still be called double of some other one. But okay, can we say that Fortran was the first high-level language with double precision floating point? Did some assemblers provide support for it in the assembler? Gah4 (talk) 08:57, 16 February 2026 (UTC)Reply

All assemblers "support" jumping into a provided routine, and code libraries for DPFP had been developed. ~2026-79235-7 (talk) 19:57, 28 February 2026 (UTC)Reply

Fortran supports calling subroutines for quintuple precision floating point, but that doesn't mean that Fortran supports quintuple precision. An assembler might support generation of single or double precision floating point constants, though I would be surprised to see that before hardware floating point. Gah4 (talk) 06:49, 2 March 2026 (UTC)Reply

Again, programmers didn't wait for the tools to provide formal DPFP support when a task demanded precision. Thus one finds DPFP computing already in practice in 1952. (I previously found a report in which Hopper refers to double precision around that time, but I'm having difficulty finding that report again.) What I'm challenging is the way that the article presents the history of DPFP in reverse, foregrounding formal expressions over the tradition that those formalizations arose from. Admittedly when these early sources refer to "double precision" they may not mean precisely 52 bits, but the idea of a floating point type for calculations in progress vs. final results was already in place. ~2026-79235-7 (talk) 18:35, 3 March 2026 (UTC)Reply

Here's the Hopper report that I mentioned. The report refers to double precision on a "UNIVAC", although Hopper notes at the beginning that she uses that term loosely. ~2026-79235-7 (talk) 20:32, 3 March 2026 (UTC)Reply

In the WP article, since this is about the programming language, what matters is just what the specifications of the language say. — Vincent Lefèvre (talk) 14:24, 4 March 2026 (UTC)Reply

This is not an article about a programming language. ~2026-79235-7 (talk) 20:29, 6 March 2026 (UTC)Reply

I meant that the sentence in the WP article (One of the first programming languages [...]) was about the programming language. — Vincent Lefèvre (talk) 14:59, 8 March 2026 (UTC)Reply

Add topic