Current vector databases often treat embeddings as standalone entities, detached from their original source data. This separation complicates the management of the relationship between embeddings and the data they represent. It requires additional bookkeeping and synchronization efforts to keep embeddings updated with changes in the source data. This approach weakens context and diminishes the effectiveness of embedding-based searches, particularly in applications where maintaining data context is crucial, such as Retrieval-Augmented Generation (RAG) and semantic search.
Existing Solutions
- Dedicated Vector Databases: Platforms like Pinecone, Weaviate, and LanceDB store embeddings separately from source data.
- Vector Extensions for Traditional Databases: Tools like pgvector for PostgreSQL enable vector operations within general-purpose databases.
- Multiple Database Systems: Teams often juggle vector databases, metadata databases, and lexical search indexes (e.g., Elasticsearch).

Shortcomings of Existing Solutions and Challenges with Treating Embeddings as Standalone Entities
- Disconnection Between Data: Treating embeddings as standalone data leads to synchronization challenges with source data.
- Complex Synchronization Pipelines: Manual ETL processes are required to keep embeddings updated, increasing the risk of errors.
- Increased Operational Complexity: Managing multiple systems necessitates additional monitoring, alerting, and maintenance efforts.
- Risk of Data Inconsistency: Manual synchronization is prone to oversights, resulting in stale or incorrect data being served to users.
- Difficulty in Model Upgrades: Upgrading embedding models or changing data representations is cumbersome and risky due to tight coupling.
A Proposed Solution
Storing source documents and their corresponding embeddings together maintains data relationships and ensures that embeddings are directly associated with their source data. This approach simplifies data management by keeping everything within a single database system, leveraging its features for data integrity and consistency.
An interesting new post from Timescale proposes an alternative approach called the “vectorizer” abstraction, which treats embeddings as database indexes rather than independent data. By introducing this concept and presenting their implementation—the pgai Vectorizer for PostgreSQL—they seek to simplify embedding management, reduce operational overhead, and improve synchronization between embeddings and source data for teams building AI applications.

While Timescale’s approach with the vectorizer abstraction offers a promising solution, it does come with certain limitations. Currently in early access, it is limited to PostgreSQL databases and requires an external worker process. It supports only OpenAI embedding models at this time and depends on existing PostgreSQL extensions. Despite these constraints, this method signals a shift toward more integrated and efficient management of embeddings within existing database systems.
As the industry matures, we’re likely to witness a proliferation of sophisticated embedding management solutions across different platforms. These solutions will likely incorporate features such as automatic embedding updates, version control for embedding models, and native integration with popular AI frameworks. Just as the data lakehouse ecosystem coalesced around open table formats like Apache Iceberg and Delta Lake, we can expect a similar convergence in embedding management solutions. Major database and lakehouse vendors, along with leading cloud providers, are likely to introduce integrated embedding solutions that promote standardization and best practices emphasizing data consistency, operational simplicity, and seamless integration with existing systems.
Related Content
- Choosing the Right Vector Search System
- The Data Exchange podcast episodes on vector databases
- The Vector Database Index
- GraphRAG: Design Patterns, Challenges, Recommendations
If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
