Types
TypesStrings

Strings

string is Loom’s mutable, first-class, UTF-8 text type. It stores valid Unicode by default, supports efficient in-place edits, and offers character-aware operations (so you don’t accidentally split code points).


Quick reference

  • Encoding: UTF-8 (always)
  • Mutability: Mutable (edit in place)
  • Indexing: Prefer character-aware APIs; byte indexing is allowed with care
  • Lengths: bytes_len() vs chars_len()
  • Default literal type: string

Literals

Basic strings

let a: string = "Hello, Loom!"
let b          = "Line 1\nLine 2\tTabbed"
let c          = "Snowman: \u2603"        # Unicode escape
let d          = "Rocket: \U0001F680"     # 🚀

Escapes

  • \n \r \t \\ \" \'
  • \xNN (single byte, hex)
  • \uNNNN (BMP, 16-bit hex)
  • \UNNNNNNNN (full 21-bit hex)

Raw strings (no escaping, multi-line OK)

let path = r"""C:\Users\loom\docs\readme.txt"""
let json = r'''{"quote": "He said: "ok"" }'''
  • Use r""" ... """ or r''' ... ''' (pick the other delimiter if your text contains """ or ''').
  • Contents are taken verbatim (no escape processing).

Length, indexing, and slicing (Unicode-aware)

UTF-8 characters vary in byte length. Prefer character-aware helpers unless you explicitly need bytes.

let s = "café"                 # bytes_len=5, chars_len=4

s.bytes_len()                  # 5 (O(1))
s.chars_len()                  # 4 (O(n))

let c0: char = s.char_at(0)    # 'c'
let b0: u8   = s.byte_at(0)    # 0x63

let head   = s.slice_chars(0, 3)   # "caf"          (new string)
let suffix = s.slice_bytes(3, 5)   # "é"            (must fall on boundaries)
  • char_at(i) indexes by character (Unicode scalar).
  • byte_at(i) reads the i-th byte.
  • slice_chars(start, end) / slice_bytes(start, end) create new strings.
  • Byte-slicing that lands inside a code point raises a runtime error.

In-place editing APIs

Because string is mutable, you can modify text without allocating new strings:

var s: string = "Hello"
s.push_char('!')                      # "Hello!"
s.append(" world")                    # "Hello! world"
s.insert_chars(0, 1, "Hey, ")         # "Hey, Hello! world"   (pos by char)
s.replace_in_place("world", "Loom")   # "Hey, Hello! Loom"
s.remove_range_chars(5, 7)            # remove chars [5,7) by char index
s.trim_in_place()                     # trims ASCII/Unicode whitespace in place

Selected mutators:

  • append(str), push_char(ch)
  • insert_chars(pos, count, str) / insert_bytes(pos, bytes)
  • remove_range_chars(start, end) / remove_range_bytes(start, end)
  • clear(), reserve(cap), shrink_to_fit()
  • to_upper_in_place(), to_lower_in_place(), replace_in_place(from, to)
  • trim_in_place(), trim_start_in_place(), trim_end_in_place()

Pure (non-mutating) counterparts also exist, e.g., to_upper() returns a new string.


Concatenation & formatting

var s = "Hello"
s += ", "                   # in-place
s += "Loom"
s = s + "!"                 # builds a new string; use `+=` to mutate

printf("name=%s id=%u\n", name, id)

let msg  = string.format("({0}, {1})", x, y)                # positional
let msg2 = string.format("{x} × {y} = {p}", {x:6, y:7, p:42}) # named

let t = "Hello, 世界"

t.is_empty()                 # bool
t.starts_with("Hell")        # bool
t.ends_with("界")            # bool
t.contains("lo, ")           # bool

t.find("lo")                 # Option<usize> (byte offset)
t.rfind("l")                 # Option<usize>
t.find_char('界')            # Option<usize> (char index)

Iteration:

for ch in t.chars() { print(ch) }    # Unicode scalars
for b  in t.bytes()  { print(b) }    # raw UTF-8

Conversions

Bytes ↔ string

let bytes: []u8 = t.to_utf8()                         # copy out UTF-8
let ok:  string = string.from_utf8(bytes)?            # validate (throws on invalid)
let los: string = string.from_utf8_lossy(bytes)       # U+FFFD for invalid

Characters / arrays

let chars: []char = t.to_chars()
let u: string     = string.from_chars(chars)

Numbers ↔ string

let n: i32 = i32.parse("123")
let o: Option<i32> = i32.parse_opt("x")               # None

let hex = string.format("0x{0:X}", 48879)             # "0xBEEF"

Equality, ordering, hashing

  • == / != compare by value (byte sequence of valid UTF-8).
  • <, <=, >, >= use lexicographic byte order.
  • For locale-aware collation, use a collation library (future std extension).
  • hash() is stable within a process; not guaranteed across versions/platforms.

I/O and encodings

  • File APIs read/write string as UTF-8 by default.
  • Other encodings via codecs:
let sjis = codecs.encode("こんにちは", "shift_jis")   # []u8
let back = codecs.decode(sjis, "shift_jis")?         # string

Performance tips

  • Use in-place APIs (+=, append, insert_*, replace_in_place) to avoid temporary allocations.
  • Call reserve(cap) before large concatenations to reduce re-allocations.
  • bytes_len() is O(1); chars_len() may be O(n).
  • When comparing human text in security-sensitive contexts, consider case-folding and normalization (nfc(), nfd()) to avoid deceptive mismatches.

Examples

In-place sanitization

pub func sanitize_line(s: string): string {
    s.replace_in_place("\r\n", "\n")
    s.replace_in_place("\r", "\n")
    s.trim_in_place()
    ret s
}

Safe prefix (by characters)

pub func take_prefix(s: string, k: usize): string {
    let n = s.chars_len()
    if k >= n { ret s }
    ret s.slice_chars(0, k)
}

Join with separator (pre-reserve for speed)

pub func join(parts: []string, sep: string): string {
    if parts.len() == 0 { ret "" }
    var out: string = ""
    # Reserve approximate capacity
    var cap: usize = 0
    for p in parts { cap += p.bytes_len() }
    cap += sep.bytes_len() * (parts.len() - 1)
    out.reserve(cap)

    for i in 0..parts.len() {
        if i > 0 { out += sep }
        out += parts[i]
    }
    ret out
}

Case-insensitive (ASCII) compare without allocating

pub func eq_ignore_ascii_case(a: string, b: string): bool {
    if a.bytes_len() != b.bytes_len() { ret false }
    for i in 0..a.bytes_len() {
        let x = a.byte_at(i)
        let y = b.byte_at(i)
        let xl = if x >= 'A' && x <= 'Z' { x + 32u8 } else { x }
        let yl = if y >= 'A' && y <= 'Z' { y + 32u8 } else { y }
        if xl != yl { ret false }
    }
    ret true
}

FAQs

Q: Do strings guarantee valid UTF-8? A: Yes for constructors that validate; from_utf8_lossy permits replacement of invalid bytes. Low-level APIs may create unchecked strings for performance—use with care.

Q: Are strings null-terminated? A: No. Strings store a length and bytes; embedded \0 is allowed.

Q: Is character indexing O(1)? A: Not necessarily. Characters are variable width in UTF-8; use iterators or cache offsets if you need repeated access.

Q: How do raw strings handle embedded quotes? A: Use the other triple-quote flavor (r"""...""" vs r'''...''') so you don’t need escaping.


See also

  • char (Unicode scalar values)
  • Arrays & slices ([]u8, []char)
  • Formatting & I/O (print, printf, string.format)