im so sleepy.

agents driven by language models (LMs) call functions to do stuff. Functions like these:

  • read_file(path, from, to)
  • write(path, content)
  • list_fir(path, show_hidden = false)
  • edit_file(path, old_string, new_string)

this is not a simplification btw.

so far, LMs were told to generate any of these formats to call a function:

  • json, which sucks cuz of c-escape
  • xml which sucks cuz of closing tags
  • yaml which sucks cuz of multilinear strings being indented (sucks for LMs)
  • just bash which sucks cuz of security

wow these all have problems with something, hm?

worst of all: we wanna save tokens wherever possible. so if an LM has to generate a full </parameter> for each argument in a function, that adds up quick

Introducing: my new format

wowie let’s have a look at this format!

[write path=“file.txt” content=

content of file here
horray newlines
no c escape! cool, i can regex all I want [\s\S]*

]

now isnt that simple?

  • no string escape problems
  • no xml-closing tags
  • no json-brace-foolery
  • no… | symbols for multilinear strings

now, of course, this is a new format. so language models suck at generating it, right?

WRONG

even a local 2-bit quant of qwen 3.6 35B-A3B aligned to it super easily.

and! even a dense Qwen3 4B model at Q4 quant worked with it flawlessly. I’m tired and need to sleep.

now congratulate me! say “horray wow ur such a genius ohmygod we are gonna save so many tokens and thus möney”.

go, go head. im not gonna ask an LM to do it, that much is clear.

or, even better: tell me what SUCKS about this, im always open for critical feedback.

id rather be wrong than believe im right all the time.

    • maria [she/her]OPM
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      3 days ago

      thanks for sharing, but the goal here was not to make yet another key-value format, but to have natural feeling multiline-strings with minimal escaping.

      thats why i settled for the code blocks: they tokenize well, are universally understood as “this is some text”, are easy to write and rarely ever have to deal with escape sequences.

      in this case, the [ and ] also serve as tool-call delimiters, which would usually be some heavy XMS ones <functions> </functions>…thats 6 - 10 tokens down the drain for delimiters! >o< aaaaa

      thank u for engaging with the post btw. i really appreciate it <3

      • gandalf_der_12te
        link
        fedilink
        arrow-up
        1
        ·
        2 days ago

        if you are looking to embed code into free-flowing text output, you can go with a format such as this:

        
        this is some free-flowing text output from the LLM to write to a file.
        
        > write("filename", "content")
        > someotherfunction()
        
        this is more free-flowing text
        
        

        you write the commands with indentations (>) and basically ignore every other line. some languages do it that way. i think PHP uses opening <?php and closing ?> tags to denote where code starts/ends, the rest is just ignored by the PHP interpreter.

        • maria [she/her]OPM
          link
          fedilink
          arrow-up
          1
          ·
          1 day ago

          I… think u didnt read my comment.

          for me, the point was to have natural feeling multiline strings, but yesyes, if those are not a concern, ur format very much rules ~

    • Another Catgirl
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      for context, I downloaded qwen3:4b with ollama and it runs fast on my gpu. I wanna make tools for my LLM to be able to play Dwarf Fortress for me.

      • maria [she/her]OPM
        link
        fedilink
        arrow-up
        2
        ·
        3 days ago

        ohgosh playing a whole game is a whole different level of complexity >o<

        it can work, but only under very short time horinzons and it will likely get stuck immediately trying to do the same thing. (im assuming you mean the text-based version)

        see claude opus trying to play runescape here and skip to 6:55 where it starts. you will notice: its interesting to watch, but gets stuck quickly. and that model was SOTA at the time.

        the model would also have difficulties telling what us where. sounds weird, but due to tokenization, it cant “infer” which characters are above which others. it will be guessing wrong a lot.

        you would be babysitting the model.

        right now, LMs and VLMs are being trained for computer use (where they click on a screen and use a computer “like a human would”), but exclusively for work tasks and not games.

        try this first: run ollama run qwen3:4b-instruct-2507-q4_K_M --experimental. that launches ollama in an “agent mode”, where it can run shell commands, which essentially gives it one tool: bash(command:str).

        test it saying “yo what files and dies are in this directory?” and it will run ls probably and tell you.

        if you really - REALLY want it to work with the game, the best option would be:

        • download the VL version of the model ollama pull qwen3-vl:4b-instruct (or -thinking) so it can understand images
        • give it lots of context. 16k - 32k is…kinda the minimum
        • making some script which saves a screenshot of the game to a file
        • put the qwen into some pre-built agent, like opencode or goose (which has GUI)
        • tell it “start dwarf fortress as a background process, use this script to take screenshots, look at them and use xdotool to navigate the game. Perform the loop of take screenshot -> look at it -> take action on the game” (or some other mouse / keyboard usage command)

        theres no reason to use my format, its just a tolen-efficiency thing. with your fast GPU-throughput, this shouldnt be a problem.

        ohgosh long response-

          • maria [she/her]OPM
            link
            fedilink
            arrow-up
            2
            ·
            2 days ago

            then i have “bad” news.

            i made thiup, it’s not a standard.

            meaning: you would have to write the entire agentic scaffolding yourself.

            which honestly, is a fun project.
            ive done it, its nice being able to see exactly what’s goin on.

            im sure ur super duper smart and already know this, but: all that an agent is, is just:

            • parsing the tokens as they come in
            • notice when the LM writes a tool call
            • parse it after its done
            • execute
            • return the output

            so umm… if thats what u feel like doing, good luck! 💖

            • Another Catgirl
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 days ago

              that is a need, I’m gonna stop playing modded minecraft and work on this instead. it sounds like a lot of fun!