Strings

string is Loom’s mutable, first-class, UTF-8 text type. It stores valid Unicode by default, supports efficient in-place edits, and offers character-aware operations (so you don’t accidentally split code points).

Quick reference

Encoding: UTF-8 (always)
Mutability: Mutable (edit in place)
Indexing: Prefer character-aware APIs; byte indexing is allowed with care
Lengths: bytes_len() vs chars_len()
Default literal type: string

Literals

Basic strings

let a: string = "Hello, Loom!"
let b          = "Line 1\nLine 2\tTabbed"
let c          = "Snowman: \u2603"        # Unicode escape
let d          = "Rocket: \U0001F680"     # 🚀

Escapes

\n \r \t \\ \" \'
\xNN (single byte, hex)
\uNNNN (BMP, 16-bit hex)
\UNNNNNNNN (full 21-bit hex)

Raw strings (no escaping, multi-line OK)

let path = r"""C:\Users\loom\docs\readme.txt"""
let json = r'''{"quote": "He said: "ok"" }'''

Use r""" ... """ or r''' ... ''' (pick the other delimiter if your text contains """ or ''').
Contents are taken verbatim (no escape processing).

Length, indexing, and slicing (Unicode-aware)

UTF-8 characters vary in byte length. Prefer character-aware helpers unless you explicitly need bytes.

let s = "café"                 # bytes_len=5, chars_len=4

s.bytes_len()                  # 5 (O(1))
s.chars_len()                  # 4 (O(n))

let c0: char = s.char_at(0)    # 'c'
let b0: u8   = s.byte_at(0)    # 0x63

let head   = s.slice_chars(0, 3)   # "caf"          (new string)
let suffix = s.slice_bytes(3, 5)   # "é"            (must fall on boundaries)

char_at(i) indexes by character (Unicode scalar).
byte_at(i) reads the i-th byte.
slice_chars(start, end) / slice_bytes(start, end) create new strings.
Byte-slicing that lands inside a code point raises a runtime error.

In-place editing APIs

Because string is mutable, you can modify text without allocating new strings:

var s: string = "Hello"
s.push_char('!')                      # "Hello!"
s.append(" world")                    # "Hello! world"
s.insert_chars(0, 1, "Hey, ")         # "Hey, Hello! world"   (pos by char)
s.replace_in_place("world", "Loom")   # "Hey, Hello! Loom"
s.remove_range_chars(5, 7)            # remove chars [5,7) by char index
s.trim_in_place()                     # trims ASCII/Unicode whitespace in place

Selected mutators:

append(str), push_char(ch)
insert_chars(pos, count, str) / insert_bytes(pos, bytes)
remove_range_chars(start, end) / remove_range_bytes(start, end)
clear(), reserve(cap), shrink_to_fit()
to_upper_in_place(), to_lower_in_place(), replace_in_place(from, to)
trim_in_place(), trim_start_in_place(), trim_end_in_place()

Pure (non-mutating) counterparts also exist, e.g., to_upper() returns a new string.

Concatenation & formatting

var s = "Hello"
s += ", "                   # in-place
s += "Loom"
s = s + "!"                 # builds a new string; use `+=` to mutate

printf("name=%s id=%u\n", name, id)

let msg  = string.format("({0}, {1})", x, y)                # positional
let msg2 = string.format("{x} × {y} = {p}", {x:6, y:7, p:42}) # named

Inspection & search

let t = "Hello, 世界"

t.is_empty()                 # bool
t.starts_with("Hell")        # bool
t.ends_with("界")            # bool
t.contains("lo, ")           # bool

t.find("lo")                 # Option<usize> (byte offset)
t.rfind("l")                 # Option<usize>
t.find_char('界')            # Option<usize> (char index)

Iteration:

for ch in t.chars() { print(ch) }    # Unicode scalars
for b  in t.bytes()  { print(b) }    # raw UTF-8

Conversions

Bytes ↔ string

let bytes: []u8 = t.to_utf8()                         # copy out UTF-8
let ok:  string = string.from_utf8(bytes)?            # validate (throws on invalid)
let los: string = string.from_utf8_lossy(bytes)       # U+FFFD for invalid

Characters / arrays

let chars: []char = t.to_chars()
let u: string     = string.from_chars(chars)

Numbers ↔ string

let n: i32 = i32.parse("123")
let o: Option<i32> = i32.parse_opt("x")               # None

let hex = string.format("0x{0:X}", 48879)             # "0xBEEF"

Equality, ordering, hashing

== / != compare by value (byte sequence of valid UTF-8).
<, <=, >, >= use lexicographic byte order.
For locale-aware collation, use a collation library (future std extension).
hash() is stable within a process; not guaranteed across versions/platforms.

I/O and encodings

File APIs read/write string as UTF-8 by default.
Other encodings via codecs:

let sjis = codecs.encode("こんにちは", "shift_jis")   # []u8
let back = codecs.decode(sjis, "shift_jis")?         # string

Performance tips

Use in-place APIs (+=, append, insert_*, replace_in_place) to avoid temporary allocations.
Call reserve(cap) before large concatenations to reduce re-allocations.
bytes_len() is O(1); chars_len() may be O(n).
When comparing human text in security-sensitive contexts, consider case-folding and normalization (nfc(), nfd()) to avoid deceptive mismatches.

Examples

In-place sanitization

pub func sanitize_line(s: string): string {
    s.replace_in_place("\r\n", "\n")
    s.replace_in_place("\r", "\n")
    s.trim_in_place()
    ret s
}

Safe prefix (by characters)

pub func take_prefix(s: string, k: usize): string {
    let n = s.chars_len()
    if k >= n { ret s }
    ret s.slice_chars(0, k)
}

Join with separator (pre-reserve for speed)

pub func join(parts: []string, sep: string): string {
    if parts.len() == 0 { ret "" }
    var out: string = ""
    # Reserve approximate capacity
    var cap: usize = 0
    for p in parts { cap += p.bytes_len() }
    cap += sep.bytes_len() * (parts.len() - 1)
    out.reserve(cap)

    for i in 0..parts.len() {
        if i > 0 { out += sep }
        out += parts[i]
    }
    ret out
}

Case-insensitive (ASCII) compare without allocating

pub func eq_ignore_ascii_case(a: string, b: string): bool {
    if a.bytes_len() != b.bytes_len() { ret false }
    for i in 0..a.bytes_len() {
        let x = a.byte_at(i)
        let y = b.byte_at(i)
        let xl = if x >= 'A' && x <= 'Z' { x + 32u8 } else { x }
        let yl = if y >= 'A' && y <= 'Z' { y + 32u8 } else { y }
        if xl != yl { ret false }
    }
    ret true
}

FAQs

Q: Do strings guarantee valid UTF-8? A: Yes for constructors that validate; from_utf8_lossy permits replacement of invalid bytes. Low-level APIs may create unchecked strings for performance—use with care.

Q: Are strings null-terminated? A: No. Strings store a length and bytes; embedded \0 is allowed.

Q: Is character indexing O(1)? A: Not necessarily. Characters are variable width in UTF-8; use iterators or cache offsets if you need repeated access.

Q: How do raw strings handle embedded quotes? A: Use the other triple-quote flavor (r"""...""" vs r'''...''') so you don’t need escaping.