Floats
This page documents Loom’s binary IEEE-754 floating-point types. Use these for real-number arithmetic, scientific formulas, graphics, and simulations where fractional values are required.
Quick reference
| Type | Bits | IEEE-754 format | Sig. precision (bits / ~digits) | Min/Max finite | Default? |
|---|---|---|---|---|---|
f16 | 16 | binary16 | 11 bits ≈ 3–4 dec digits | ~±6.10e−5 … ±6.55e+4 | No |
f32 | 32 | binary32 | 24 bits ≈ 6–7 dec digits | ~±1.18e−38 … ±3.40e+38 | No |
f64 | 64 | binary64 | 53 bits ≈ 15–16 dec digits | ~±2.23e−308 … ±1.79e+308 | Yes |
- Representation: IEEE-754 with sign, exponent, fraction; round-to-nearest, ties-to-even by default.
- Specials:
+0.0,-0.0,+∞,-∞,NaN(quiet NaN used by default). - Subnormals: Supported (gradual underflow).
f16is space-efficient but imprecise; preferf32for UI/graphics andf64for scientific/finance. Bare float literals without suffix infer tof64unless context suggests otherwise.
Literals
Decimal literals with optional exponent and underscores:
let a: f64 = 1.0
let b: f32 = 3.1415927f32
let c = 6.022_140_76e23 # inferred f64
let d: f16 = 1.0e-3f16
let n: f64 = 0.0 # +0.0 and -0.0 both existType suffixes: f16, f32, f64.
Special constants:
let inf = f64::INFINITY
let ninf = f64::NEG_INFINITY
let nan = f64::NANOperators & semantics
- Arithmetic:
+ - * / - Remainder:
a % bis IEEE remainder (equivalent toa - trunc(a/b) * b) - Comparisons:
== != < <= > >=(see NaN rules below)
let x: f32 = 5.5
let y: f32 = 2.0
let q = x / y # 2.75
let r = x % y # 1.5Exceptional results
- Divide by zero:
(+ value) / 0.0 → +∞,(- value) / 0.0 → -∞ 0.0 / 0.0,∞ - ∞,sqrt(-1.0)→NaN- Overflow →
±∞; underflow → subnormal or±0.0with loss of precision
NaN, ±0.0, and comparisons
- Any comparison with
NaNis false except!=, which is true. +0.0 == -0.0is true, but they have different signs.- For a total ordering (sorting, maps), use
total_cmp(a, b).
Helpers:
x.is_nan()
x.is_finite()
x.is_infinite()
x.is_subnormal()
x.signum() # +1.0, -1.0, or NaN
copysign(mag, sign_source)Rounding & next-after
- Default rounding: nearest, ties-to-even.
- Step to adjacent representable values:
x.next_up()
x.next_down()Conversions
Float ↔ Float
-
Widening (e.g.,
f32 → f64) is exact. -
Narrowing (e.g.,
f64 → f32) rounds; use:ascast (rounds to nearest)to_f32_checked()→(f32, overflowed: bool)(flags∞,NaN, or out-of-range)to_f32_saturating()→ clamps to finite max/min
Integer ↔ Float
-
Int → Float: exact if within mantissa precision; else rounded.
-
Float → Int: truncates toward zero.
- Checked variants:
to_i32_checked(),to_u32_checked(), … - Saturating variants:
to_i32_saturating(), …
- Checked variants:
let i: i32 = (3.9f32) as i32 # 3
let f: f64 = (1_000_000i32) as f64Promotions & mixed-type rules
f16,f32,f64in an expression promote to the widest present (f64>f32>f16).- Mixing ints and floats promotes the int to the float’s type.
- No implicit conversion between floats and strings; use parse/format APIs.
Math library (selected)
- Roots & magnitudes:
sqrt,cbrt,hypot - Rounds:
floor,ceil,round,trunc,fract - Exponentials & logs:
exp,exp2,ln,log10,log2,powf(y),powi(k) - Trig:
sin,cos,tan,asin,acos,atan,atan2(y, x) - Hyperbolic:
sinh,cosh,tanh, … - FMA:
fma(a, b, c)(computesa*b + cwith a single rounding) - Decompose/compose:
frexp()→(mantissa, exp),ldexp(mantissa, exp) - Split:
modf()→(int_part, frac_part) - Classification:
classify()→enum of {Zero, Subnormal, Normal, Inf, NaN}
let r: f64 = f64::hypot(3.0, 4.0) # 5.0
let z: f32 = 1.0f32.fma(1e10f32, -1e10f32) # avoids catastrophic cancelationFormatting & parsing
let v: f64 = 1234.56789
print(v) # 1234.56789 (default)
printf("fixed=%.2f sci=%.3e gen=%g\n", v, v, v)
# fixed=1234.57 sci=1.235e+03 gen=1234.57
let a: f32 = f32.parse("3.14")
let b: f64 = f64.parse("6.022e23")
let o_opt = f64.parse_opt("NaN") # → Option<f64>Parsing accepts
inf,+inf,-inf, andnan(case-insensitive).nan(payload)is permitted but payload bits are not guaranteed to round-trip across platforms.
Bit-level access & endianness
let bits: u32 = (3.5f32).to_bits() # raw IEEE-754
let x: f32 = f32.from_bits(bits)
let be = x.to_be_bytes() # explicit byte order for I/O
let y = f32.from_be_bytes(be)Performance, precision & determinism
-
Prefer
f64for numerically sensitive work;f32for memory/throughput. -
Floating arithmetic is not associative:
(a+b)+cmay differ froma+(b+c). -
For reproducible results across platforms:
- Avoid “fast-math” compilation for critical code.
- Use
fma, stable algorithms (e.g., Kahan summation), and fixed evaluation order.
-
Binary floats cannot exactly represent many decimals (e.g.,
0.1). Compare with tolerances.
pub func approx_eq(a: f64, b: f64, eps: f64 = 1e-9): bool {
ret (a - b).abs() <= eps * (1.0 + a.abs().max(b.abs()))
}Examples
Kahan (compensated) summation
pub func sum_kahan(xs: []f64): f64 {
var s = 0.0
var c = 0.0
for x in xs {
let y = x - c
let t = s + y
c = (t - s) - y
s = t
}
ret s
}Fast, precise linear interpolation
# lerp(a, b, t) = a + t*(b - a), but fma avoids extra rounding
pub func lerp(a: f32, b: f32, t: f32): f32 {
ret (b - a).fma(t, a)
}Safe normalization with edge cases
pub func normalize(x: f64, y: f64): (f64, f64) {
let d = f64::hypot(x, y) # robust sqrt(x*x + y*y)
if d == 0.0 { ret (0.0, 0.0) }
ret (x / d, y / d)
}Constraining to a finite range
pub func clamp01(x: f32): f32 {
if x.is_nan() { ret 0.0f32 } # define your policy
ret x.max(0.0f32).min(1.0f32)
}FAQs
Q: Why did my 0.1 + 0.2 become 0.30000000000000004?
A: 0.1 and 0.2 are not exactly representable in binary floating point. Compare using a tolerance (see approx_eq).
Q: Should I store currency in floats?
A: Prefer scaled integers (e.g., cents in i64) or decimal types. Use floats only for approximate calculations.
Q: When should I use f16?
A: For dense arrays where memory/bandwidth matter and precision requirements are low (e.g., ML activations, approximate textures). Convert to f32/f64 for computation if needed.
Q: How do I sort values with NaNs?
A: Use total_cmp(a, b); it defines a total order (e.g., NaN ordered after finite numbers).
See also
- Integers:
i8/i16/i32/i64,u8/u16/u32/u64 - Complex numbers (if enabled by your profile)
- Numerics & math utilities in the standard library