As a long-time SQL user, I still turn to it whenever I need to work with tabular data, preferring it over dataframe libraries like Pandas. SQL’s syntax is easier to recall and aligns well with set theory concepts, making it intuitive for those familiar with basic mathematical logic. The presence of an optimizer further enhances SQL by efficiently translating declarative queries into optimal execution plans, allowing users to focus on what they want rather than how to achieve it. However, SQL can become cumbersome, especially when dealing with complex queries involving subqueries and temporary tables. For more intricate data structures, like property graphs, I find Cypher to be a more natural choice.
Surprisingly, even highly skilled developers often struggle with SQL. Although it’s the standard language for database querying, SQL can impede productivity, especially in data-intensive fields like AI. This challenge spans a wide range of users, from developers to data scientists, who rely on SQL for their daily tasks. The challenge of achieving fluency and proficiency in SQL can hinder innovation, making it harder for some users to fully leverage its capabilities.
A new paper from a team at Google aims to enhance SQL by introducing pipe-structured data flow syntax, known as Google Pipe Syntax, to make SQL more flexible, extensible, and easier to use. This approach addresses the significant problem of SQL’s rigid structure, which makes it challenging to express operations in a different order without subqueries or workarounds. By introducing pipe syntax, users can compose operations in any order, increasing flexibility, simplifying the user experience, and enabling clean language extension. Here’s an example from the paper:
Adopting Pipe Syntax for SQL could bring about several practical benefits:
- Enhanced Productivity. Simplifies the process of writing, reading, and maintaining SQL queries, leading to faster development cycles.
- Improved Code Readability and Maintainability. The new syntax enables more intuitive query structures, making them easier to understand and modify.
- Better Tooling and IDE Support. By making SQL more composable, the new syntax could pave the way for advanced tools and features in SQL IDEs, such as improved auto-completion and debugging capabilities.
- Faster Adoption of New SQL Features. The Pipe Syntax offers a streamlined approach to integrating new functionalities, reducing the need for extensive retraining or system migrations.
Exploring SQL Alternatives
A number of alternatives have been developed to address SQL’s limitations, each with distinct strengths and challenges. PRQL offers composable relational operations but struggles with adoption due to its unfamiliar syntax and divergence from traditional SQL. SQL++ extends SQL to better handle structured data types like JSON but doesn’t resolve the core syntax issues that make SQL cumbersome for complex queries.
Python DataFrames are popular for data manipulation but lack the declarative power and optimization essential for large-scale processing, making them less effective for AI tasks. Tools like KQL and Apache Beam also attempt to improve on SQL but face adoption challenges due to their specialized use cases and steep learning curves.
These alternatives highlight the difficulty in finding a balance between enhancing SQL’s usability and ensuring seamless integration with existing systems. None have yet fully succeeded in overcoming SQL’s limitations while maintaining the broad compatibility and ease of use that SQL offers.
Pipe SQL: From Concept to Widespread Use at Google
In contrast to these less successful attempts, Pipe SQL appears to have gained significant traction at Google. After an initial implementation phase involving a small group of early users, Google stabilized the Pipe SQL language and made the pipe syntax widely available. Over the following six months, adoption steadily increased, with initial spikes following announcements on a SQL users mailing list and the removal of opt-in settings, making the pipe syntax the default. Usage continued to grow as more users incorporated pipe syntax into their daily work, with significant uptake following a SQL workshop at a user conference, where a 40-minute tutorial on the syntax generated excitement and further adoption.
Pipe SQL: Pros, Cons, and Future Prospects
The introduction of Pipe SQL is a step forward in making SQL more adaptable and user-friendly, especially for complex data processing tasks. However, it is not without its limitations. The potential for parsing ambiguities, the complexity of tree-like query structures, and the current lack of IDE support suggest that while Pipe SQL holds promise, there is still work to be done. Additionally, there will be an adjustment period as users familiarize themselves with the new syntax.
- Readability boost. The linear flow of Pipe SQL really stands out to me. It mirrors the logical sequence of query processing, which could significantly lighten the cognitive load for developers and AI teams working with complex data structures.
- Flexibility in Query Structure. By aligning SQL with data manipulation patterns seen in other languages like Python, Pipe SQL opens the door to more natural and expressive query writing. This flexibility could be a game changer for teams that need to translate complex algorithms into SQL, making the entire data preparation phase more efficient.
- Tooling opportunities. The simplified, linear structure of Pipe SQL presents an opportunity for the development of more powerful tools. Imagine an IDE that not only auto-completes your SQL but also suggests optimizations on the fly.
- Ergonomic considerations. While some believe Pipe SQL enhances SQL’s usability, I’m not entirely convinced. In some cases the new syntax will introduce unnecessary complexity, offsetting any ergonomic benefits and making it harder for teams to onboard new members quickly.
- Unnecessary Syntax Sugar. Some see Pipe SQL as a cosmetic upgrade rather than a substantive one. I share this skepticism; unless it offers real, tangible benefits, it risks adding complexity without enough payoff, potentially leading to fragmented coding practices.
- Potential for Misuse. Flexibility comes with the risk of misuse. Inefficient query patterns could emerge, particularly among teams that are still adapting to the new syntax, potentially leading to degraded performance in large-scale data processing tasks.
- Fragmentation risk. The risk of SQL fragmentation with Pipe SQL is a major concern. I worry that introducing another dialect might further splinter the SQL ecosystem, complicating cross-database compatibility and making long-term adoption more problematic.
While Pipe SQL is an intriguing development that could make SQL more accessible and powerful, I’m adopting a wait-and-see approach. I’ll reserve judgment until I see how quickly it gets adopted by the broader community—particularly in Postgres, which remains my favorite database system. Let’s see if Pipe SQL and similar variants can succeed where Esperanto didn’t—by staying close enough to the familiar to actually catch on.
Related Content
- What Is a Lakehouse?
- The Modern Metadata Platform: What, Why, and How?
- Inside the Data Strategies of Top AI Labs
- Why Digital-First Companies Are Building Their Own AI Platforms
If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:
