Hey!
Summary of this email:
- Why Postgres is the best database (+ Automatic Embeddings in Postgres course)
- How to do embedding chunking
- The AGENTS.md standard is finally here!
Estimated reading time: 3 minutes.
🐘 Why Postgres is the best database
Postgres, by default, is a SQL database, but you can also use it as NoSQL (it has native JSON support), as a key-value store, as a full-text search engine...
Up to this point we could say it's a database on steroids, but it's actually much more than that.
And that's thanks to its extensions and its community.
With extensions (some created by major companies like Timescale or Supabase), Postgres is also capable of:
✔︎ Making HTTP requests With extensions like pgsql-http (synchronous) or pg_net (asynchronous), it can make calls to other servers.
✔︎ Having a CRON system Thanks to pg_cron, you can configure a function to run every X amount of time.
If you combine it with the HTTP capability, you can simplify your stack a lot.
✔︎ Being an MQ queue system SQS, RabbitMQ... are great, but if we don't have experience with them and want to get into the world of message queue systems, with pgmq we can have it inside Postgres itself.
Everything native to the database (it all works through tables created in its own schema).
It even has a retry system!
Additionally, with other extensions it can be a timeseries database, a graph database, a geospatial database (with PostGIS, widely used in the industry).
For these reasons, if we have to choose a general-purpose database today, we go with Postgres.
Also, if you combine HTTP requests, crons, and a queue system, you can delegate embedding generation to the database. This makes adding AI-related features much faster and simpler.
If you want to learn how to make these combinations and delegate (or understand why not to) this generation to the database, we've just published the Automatic Embeddings in Postgres course at 100% in the standard plan.
✂️ How to do embedding chunking
If you've gotten into the world of embeddings and RAG, you've probably asked yourself what the best strategy is for embedding chunking.
Here we're sorry to bring you bad news, but there's no golden rule to follow. You have to resort to the classic trial and error approach due to the non-deterministic nature of these systems.
Last Tuesday we went live with Jesus Serrano, who at his company has a CMS used by half of Spain (Mediaset, RTVE, eldiario.es, Prensa Iberica...) where he told us how they implemented the feature of querying for previous and similar content. You can watch it here.
It's Python code, but with proper software architecture and typing, which rarely happens and is exciting to see. 🥹
🤖 The AGENTS.md standard is finally here!
OpenAI + Google + Cursor have come together to recognize this file (whether at the root or in nested directories) and add that information to the agent's context.
It's pure markdown.
The downside is that, for now (although they'll likely add native support soon), Claude Code doesn't have compatibility.
But it's as simple as running this command and you're set: mv CLAUDE.md AGENTS.md && ln -s AGENTS.md CLAUDE.md
Then remember to add CLAUDE.md to your gitignore, and you'd have standardized rules.
Now let's see if they can agree on standardizing the MCP connection file.
Tomorrow at 9am CEST, on Cafe con Codely, we'll be discussing this news and much more. You can follow it on YouTube or Twitch.
And since you've made it this far in the newsletter, here's the joke of the week, which I know you were waiting for:
> I had a problem programming in C, so I decided to use Java. Now I have an AbstractProblemFactory. 😂
Cheers!