My office computer has a Ryzen 7 5700, RX 580x, and 32gb of ram. Running ollama with deepseekv2 or llama3 is much slower than chatgpt in the browser. Same with my newer, more powerful home computer.

What kind of hardware do you need to run with comparable responsiveness to chatgpt? How much does it cost? Presuming such hardware is commercial, where do you find it?

  • TootGuitar@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 day ago

    It depends on what you mean by “relative responsiveness”, but you can absolutely get ~4 tokens/sec of performance on R1 671b (Q4 quantized) from a system costing a fraction of the number you quote.

    • Xanza@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      This is the point everyone downvoting me seems to be missing. OP wanted something comparable to the responsiveness of chat.chatgpt.com… Which is simply not possible without insane hardware. Like sure, if you don’t care about token generation you can install an LLM on incredibly underpowered hardware and it technically works, but that’s not at all what OP was asking for. They wanted a comparable experience. Which requires a lot of money.

      • TootGuitar@sh.itjust.works
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        3 hours ago

        Yeah I definitely get your point (and I didn’t downvote you, for the record). But I will note that ChatGPT generates text way faster than most people can read, and 4 tokens/second, while perhaps slower than reading speed for some people, is not that bad in my experience.