Table of Contents
The landscape of data is in perpetual flux, and the role of the analytics engineer is becoming increasingly central to organizational success. 2025 State of Analytics Engineering Report, produced by dbt Labs, provides an in-depth look at the present state, evolving challenges, and anticipated direction of this growing field. The report illuminates this sector is rapidly adapting to significant forces, most notably the expanding integration of artificial intelligence – especially generative AI (GenAI), a notable surge in investment in data initiatives, and the enduring, fundamental importance of ensuring data quality.
Artificial Intelligence: A Collaborative Force in Data Teams
A prominent theme highlighted in the report is the transformative influence of artificial intelligence in the data engineering teams. Contrary to earlier apprehensions about job displacement, the report suggests that AI is primarily serving to enhance the capabilities of data teams rather than replacing them. Interestingly, the report indicates an increase in the size of data teams, suggesting that organizations are recognizing the amplified value of combining human expertise with AI tools.
The report states that current applications of generative AI (GenAI) and Large Language Models (LLMs) within the day-to-day activities of data teams primarily involve the generation of analytics code (including SQL, Jinja, and Python) and the creation of data documentation. While there is considerable interest in leveraging AI to answer data-related questions through generated SQL or a semantic layer, its more common application is in the creation or description of data assets. The report indicates that 70% of respondents are using AI for analytics development in at least some capacity.
Most data teams are presently using general-purpose LLMs such as ChatGPT, Claude, and Gemini for their code development needs. However, a limitation identified is that these broader tools can encounter difficulties in accessing important context within an analytics project, such as the codebase or metadata. This challenge is prompting a movement towards specialized GenAI solutions integrated into development tooling, with approximately 25% of respondents already utilizing these tools.
The report forecasts two key developments: the integration of data context into general-purpose systems through the establishment of open standards for managing context for LLMs, and the increasing prominence of specialized tooling deeply integrated with data development use cases. Despite the widespread use for code and documentation, the desire to use AI for querying data using natural language is still evolving. 29% of respondents indicated that their organizations do not currently use AI tools for this use but would like to, while another 23% have experimented with it.
Among the 30% who are currently using AI to consume data assets by answering questions in natural language, two-thirds are doing so with standard SQL generation, compared to one-third using a semantic layer. A previous research suggests that employing a semantic layer tends to yield significantly higher accuracy in GenAI-based queries of data using natural language. Encouragingly, 27% of respondents plan to increase their investment in semantic layer tooling over the next year.
While a small segment of respondents remains skeptical, the majority of analytics professionals see considerable potential, although their optimism is tempered with a degree of caution. The report highlights a noticeable disparity between this optimism and the perceived real-world impact, suggesting that while AI-driven tools are enhancing workflows, they haven’t yet delivered the transformative results people hoped for. Areas where AI is expected to significantly benefit the data workflows include:
- Building analytics code
- Facilitating self-serve data exploration
- Testing and monitoring code
- Proactive data monitoring and alerting
- Debugging data pipeline issues
- Analyzing data and extracting insights
Heightened Investment in Data Initiatives
The report says that following a period of economic prudence, data budgets are experiencing a notable increase, with AI at the forefront of this growth. AI tooling stands out as the primary area of investment for data teams over the past year. Looking ahead, a substantial 45% of respondents intend to boost their investment in AI tools within the next 12 months – as stated in this year’s report.
The report clearly demonstrates a significant upswing in data team budgets. While only 9% of respondents reported budget increases last year, this figure surged to 30% this year. Beyond AI, data quality and observability represent the second-largest area focused for increased investment, with 38% of respondents planning to allocate more resources in the coming year. This underscores the urgency with which organizations are seeking to resolve issues related to data quality.
Other areas where increased investment is anticipated include the following:
- Data platform infrastructure (data warehouse, lake, or lakehouse)
- Data transformation tools
- BI and data visualization tools
- Data extract/load solutions
- Semantic layer technologies
- Centralized data catalogs
- Reverse ETL tools
- Data orchestration platforms
Crucially, the report emphasizes that these investments in AI and data tooling are not leading to a reduction in workforce; data teams are also expanding. 40% of respondents reported an increase in headcount over the past year, a substantial leap from the 14% reported in the previous year’s study. This indicates that organizations recognize the vital role data teams play in achieving success and are backing this recognition with greater investment in both technological resources and personnel.
The discipline of analytics engineering is also broadening its footprint across diverse industries. While the technology sector continues to be the largest segment of the community at 34%, its proportion has slightly decreased year-over-year. Notably, heavily regulated sectors such as financial services (15%) and healthcare and life sciences (10%) now constitute a significant portion of the analytics engineering community. This highlights the essential capabilities that analytics engineering provides, even within complex data ecosystems subject to stringent compliance and regulatory requirements.
Data Quality: The Persistent and Paramount Concern
Despite the excitement surrounding AI and the increase in investment, the report unequivocally states that establishing trust in data remains the paramount objective for organizations. Data teams understand that unreliable data inevitably leads to unreliable outcomes, particularly as they integrate AI and LLMs. As we all know, “garbage in, garbage out” is true for all data projects.
Data quality continues to be the most critical challenge for data teams to address. Over half of the respondents (56%) identified poor data quality as the challenge they encounter most frequently. Their day-to-day experience as data practitioners has not fundamentally changed despite the adoption of AI. A considerable portion of their time, 57%, is still dedicated to maintaining or organizing data sets. This highlights the ongoing necessity of addressing the foundational issue of data quality.
While challenges such as documenting and maintaining data sets, and limitations on compute resources, have become less significant obstacles, the issue of poor data quality stubbornly persists. Furthermore, the prevalence of ambiguous data and a lack of data literacy among stakeholders suggests that the challenges are not solely technical but also organizational.
The Expanding Scope of Analytics Engineering
The report portrays analytics engineering as a dynamic field with an expanding scope and increasing influence. The emergence of analytics engineers was initially propelled by data analysts adopting best practices from the software engineering discipline. This trend continues with a noticeable inclination among nontechnical business users to engage in data transformation. Nearly 65% of respondents believe that empowering nontechnical business users to create transformed and governed data sets would significantly enhance their organization’s data value and efficiency.
The report observes a continued increase in the adoption of the hybrid model, where work is allocated based on both business area and function. This indicates that members of data teams, regardless of their specialization, are becoming more deeply integrated within their respective organizations. While data teams overwhelmingly feel valued by their organizations, there remains an opportunity for improvement in establishing clear objectives and providing a roadmap for how data teams are expected to contribute to the organization’s overall success.
The Course Forward for Data Teams
The 2025 State of Analytics Engineering Report makes one thing abundantly clear: data teams are proactively adapting and evolving. With AI reshaping workflows, investment in data on the rise, and maintaining data quality remaining a key challenge, organizations must take a proactive stance in defining their data strategy. The report suggests that the path forward involves a symbiotic relationship between AI and human expertise, where data reliability is non-negotiable, and the influence of analytics engineering continues to expand in both impact and scope in organizations.
In conclusion, the 2025 State of Analytics Engineering Report offers a valuable perspective on a field in constant evolution. AI is rapidly becoming an essential tool, investment in data is increasing, and while challenges persist, particularly concerning data quality, the future of analytics engineering is brimming with opportunities for making a significant impact and driving innovation.
References
- dbt Labs [ 2025 State of Analytics Engineering Report (original) (archived) ]
↫ Previous post