Cohere's First Model for Developers

(cohere.com)

46 points | by hmokiguess 4 days ago

4 comments

  • moojacob 31 minutes ago
    I was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.

    More competition is better.

    • SubiculumCode 4 minutes ago
      I always forget the VRAM requirements on these MOE things
      • sipjca 0 minutes ago
        fwiw because of the relatively few activated params offloading to system RAM is quite feasible, you can see the endless amount of people doing this on r/localllama with qwen3.6 35a3b
  • tonyrice 1 hour ago
    I'm excited to see more OSS models
  • moralestapia 2 hours ago
    >Our plan to being profitable is to give mediocre stuff for free
  • cyanydeez 2 days ago
    looks like it's just qwen 3.6 coder.
    • lumost 2 hours ago
      its worse at code compared to qwen 3.6 coder.
    • SubiculumCode 1 hour ago
      Do you mean it's based on qwen 3.6 coder?
      • daemonologist 53 minutes ago
        There is no "coder" version of Qwen 3.6; I think they just mean it's a coding-focused model of similar size and performance (to Qwen 3.6 35B-A3B).

        Regular Qwen 3.6 benchmarks slightly better and has much wider software support though, so this is probably of interest only to organizations which disallow models trained in China.

        • kadoban 29 minutes ago
          I mean, Qwen 3.6 kicks ass. I don't know who these people are, but if their first outing is "not quite as good as Qwen 3.6", that's not a bad start by any means.

          30B vs 35B isn't nothing either.

          If it ends up just being some tweaks to someone else's weights, then meh.