Waikit Kan Waikit Kan /index
← back to /notes

AI landscape

3 min read

It can get exhausting fast trying to keep up with the latest AI news, there is something new coming out every other day and one can understand how confusing it can easily get when trying to find the right tool to use. I’ve been trying out a bunch of AI models that I could integrate into my own SaaS product, so consider the following a snapshot of my experience as of 11th Feb 2025.

I’ll break down the product by the type of model to follow along easier.

Text generation

I’ve used OpenAI, Claude, Perplexity, and DeepSeek. The use case is to get the AI to write about a topic in detail. My preference is Claude as I found it to be more elaborate when asking it to write about a topic and the writing style is more varied than the others, OpenAI gives the shortest output so if you want more text to work with, it might not be the best option. Perplexity doesn’t have it’s owm llm and integrates others, it’s strength is in the citation feature but I heard it’s not very good in the API compared with the web app as it does something extra on the web app.

The reasoning models from OpenAI and DeekSeek do not seem to improve the output but given the affordability of DeepSeek R1, this could be a default as it also provides the chain of thought for free at a slightly slower final output.

Another thing to consider is that the API for OpenAI and Claude data policy are likely more preferencial to your users as their data policy do not use your data to train. If using DeepSeek, just ensure you’re using a third-party hosted version.

DeepSeek is the only one with an open source option out of the ones I mentioned.

Vector database

I just want to mention them as if you’re building something with RAG, you’ll likely come across vector databases. I’ve only had a little experience with Pinecone so I’ve no particular opinion but it does require you to embed your data first, so you’ll need rely on something like OpenAI’s embedding endpoint to do this.

Image generation

I’ve used Stable Diffusion, Dall-e, Flux, Midjourney extensively and tried some other bunch. The use case is to create realistic faces. The named ones are all commercially viable. All models have text-to-image and with the exception of Dall-e, all support image-to-image. My preference is Flux by a long shot as it gives a great base output. The core team left StabilityAI (Stable Diffusion) and created Flux, basically. It’s open source but you’ll have to use it on one of their hosted providers in order for you to use it commercially. It’s edge is that you can train a LoRA with it. Midjourney gives great results but you’ll have to rely on their Discord or web app as there is no official API.

Even with the base image model, you’ll likely run it through other models to get the results you want e.g upscaling, fixing issues, controlling the pose, inpainting etc just make sure their licence allows you to commercialised it.

Video generation

To be honest, I’ve not much experience here though I’ve played around with Hunyuan video and Video 01 and they seem like the higher quality models (Kling also, but I haven’t tried). The issue right now for video generation is that they’re not cheap, with pay-as-you-gen Video-01 costing US$0.5 for a 6s video, testing is going to be very expensive and if you’re using video generation for your SaaS, you’re looking at quite expensive user plans for little output, a $20 monthly plan will get your users 120s worth of video generation, you’re at a loss.

Another factor to consider is that the top quality models mentioned are all coming from China, and data is sent back to their respective companies, which may or may not matter to you or your users.

So that’s my experience so far. I’m looking forward to video generation becoming cheaper.

  • ai
  • products
  • generative ai
  • llm