• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    13
    ·
    edit-2
    3 months ago

    They seem to have held back the “big” locally runnable model.

    It’s also kinda conservative/old, architecture wise: 16-bit weights, sliding window attention interleaved with global attention. No MTP, no QAT (yet), no tightly integrated vision, no hybrid mamba like Qwen/Deepseek, nothing weird like that. It’s especially glaring since we know Google is using an exotic architecture for Gemini, and has basically infinite resources for experimentation.

    It also feels kinda “deep fried” like GPT-OSS to me, see: https://github.com/ikawrakow/ik_llama.cpp/issues/1572

    it is acting crazy. it can’t do anything without the proper chat template, or it goes crazy.


    IMO it’s not very interesting, especially with so many other models that run really well on desktops.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    3 months ago

    Also, for any interested, desktop inference and quantization is my autistic interest. Ask my anything.

    I don’t like Gemma 4 much so far, but if you want to try it anyway:


    But TBH I’d point most people to Qwen 3.5/3.6 or Step 3.5 instead. They seem big, but being sparse MoEs, they can run quite quickly on single-GPU desktops: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        3 months ago

        Ughhh, I could go on forever, but to keep it short:

        Basically, the devs are Tech Bros. They’re scammer-adjacent. I’ve been in local inference for years, and wouldn’t touch ollama if you paid me to. I’d trust Gemini API over them any day.

        I’d recommend base llama.cpp or ik_llama.cpp or kobold.cpp, but if you must use an “turnkey” and popular UI, LMStudio is way better.

        But the problem is, if you want a performant local LLM, nothing about local inference is really turnkey. It’s just too hardware sensitive, and moves too fast.

  • mrnobody@reddthat.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 months ago

    Why would anyone care about Gemini or other AI here? I mean, i get this is the tech Space, but AI=bad.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      3 months ago

      There’s a whole lot of interest in locally runnable ML. It was there even before ChatGPT 3.5 started the tech bro hype train, when tinkerers were messing with GPT-J 6B and GAN models.

      In a nutshell, it’s basically Lemmy vs Reddit. Local and community-developed vs toxic and corporate.

    • Axum
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      Openly downloadable ai models are here. You may as well download one and play with it on your own hardware so that you can learn the it’s and outs of it as well as limitations and use cases