Navigating the Future of AI in the Creative Industries

Subscribe • Previous Issues

The Impact of Text-to-Video Models on Video Production

Sora is a large-scale AI system from OpenAI capable of generating high-fidelity videos up to a minute long using just text prompts. It employs neural networks (“diffusion transformer architecture”) to acquire a diverse range of video simulation capabilities that could profoundly impact the entertainment and video production industry.

Sora’s capabilities include generating videos of diverse durations, aspect ratios, and resolutions, extending videos in time, editing input videos, and interpolating between videos. Notably, Sora showcases emergent simulation capabilities, such as 3D consistency and long-range coherence, setting a new standard for what AI can achieve in video production.

The Automation Wave and Its Implications

Sora’s release has accelerated competition in AI video generation, with tech giants like Google, Meta, and others working on similar systems. This rapid progress indicates an evolving landscape of creative AI tools for media production.

Despite progress, challenges persist in accurately simulating cause-and-effect relationships and maintaining coherent trajectories of objects over extended periods. Visible artifacts still persist in some Sora outputs, and more advancement is required before professional adoption for film or TV. Expectations should be calibrated, as Sora’s current capabilities do not realistically extend to replacing human creatives.

Concerns also exist around potential misuse, like deepfakes for propaganda, and lack of consent in training datasets. There are reasonable worries about automating jobs in location scouting, design, production, and acting. Sora could severely impact livelihoods even if it enhances efficiency.

Transforming Video Production: Opportunities and Challenges

I recently examined the multifaceted impact of Sora and similar AI technologies on the video production industry, delving into the potential for automation, the challenges and opportunities they present, and the ethical considerations they necessitate.

The impact of generative AI on video production will be profound, with implications across various aspects of the industry. From enhancing production processes to democratizing access to high-quality video creation, AI technologies like Sora promise to redefine the creative landscape. However, this shift also necessitates a recalibration of skills, with a growing emphasis on AI literacy and ethical content creation. As with other applications of generative AI, the future of video production will likely witness a symbiotic relationship between AI and human creativity, with AI augmenting the creative process while humans provide ethical and strategic guidance.

[This graphic utilizes data from recent U.S. online job postings related to video to illustrate the probable influence of generative AI on video production. Click to enlarge.]
Navigating the Future with Caution and Optimism

Predicting AI’s societal impacts involves weighing complex factors. Video generation tools like Sora showcase impressive technical capabilities, yet also raise understandable concerns given the potential to disrupt industries reliant on human creativity. While video AI generation will keep rapidly improving, it’s unwise to make overly optimistic or pessimistic predictions about societal impact. Past experience shows predictions often miss the mark – for example, some early concerns about AI art generators’ role in disinformation has so far proved overstated. However, as Sora and similar tools grow more capable, vigilance around ethical risks remains prudent. With conscientious development and use, video AI could yield innovations to serve the public good.

Sora will spark a wave of creative AI models, akin to DALL-E’s impact on AI art generators

Sora represents just the initial wave of creative AI – its biggest disruption may be in spurring other models that unlock new generative possibilities like multimodal video-text-audio synthesis. Just as DALL-E unleashed a wave of AI art generators in its wake, Sora is likely the tip of the spear for a new generation of creative models.

Issues around copyright and IP also remain unsettled, and could hamper future innovation. There are reasonable fears about disproportionate job impacts even if productivity rises. As routine tasks get automated, uniquely human skills like creativity, ethics and strategic thinking will be increasingly valued. AI literacy and governance also become imperative to ensure responsible development.

In my view, unresolved issues around copyright and IP pose a serious threat that could severely hamper future innovation if not properly addressed. There are also reasonable fears about disproportionate job impacts even if overall productivity rises. As routine creative tasks get automated by AI, uniquely human skills like creativity, ethics and strategic thinking will be increasingly valued. However, the current education system seems unprepared to cultivate these skills. AI & Data literacy and governance also become imperative to ensure responsible development, but progress remains painfully slow on these fronts.


As discussed in Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming

Data Exchange Podcast

1. 2024 Themes and Trends in AI. Our annual outlook for the year ahead explores AI themes including making AI more accessible and efficient through advances in model training and inference, providing tools to facilitate enterprise adoption, democratizing hardware, progressing generative models, and integrating AI across business functions.

2. Where AI Systems Are Heading Next.  Jerry Kaplan’s new book explores emerging AI trends like continuous learning, sensory integration, self-improvement, specialized training, and open source models to democratize and optimize AI systems for superior, adaptive results.


From Managing the Risks and Rewards of Large Language Models

Mistral’s Impact on the AI Landscape

Mistral models, recognized for their open nature, have quickly become leaders among open-source LLMs. The company burst onto the scene by releasing capable open-weights models, Mistral 7B and Mixtral 8×7B. Mistral is partnering with Microsoft Azure to provide their models through Azure AI Studio and Azure Machine Learning in addition to their own platform and self-deployed options.

Mistral is perceived as a competitor to OpenAI, potentially altering leverage and negotiation dynamics within the industry by introducing more competition. This influences innovation, pricing, and access to AI technologies. 

Their newest offering, Mistral Large, represents their most sophisticated language model to date. Now available through Mistral’s platform and on Azure, Mistral Large marks an exciting advancement in AI capabilities. Building on the success of Mistral Medium, which has achieved high rankings on the LMSys leaderboard, Mistral Large is expected to climb the leaderboards rapidly. Distinguished by exceptional performance in reasoning, knowledge benchmarks, multilingual capabilities, and tasks related to coding and math, Mistral Large showcases Mistral’s commitment to pushing the boundaries of what large language models can achieve. With strengths such as fluency in multiple languages, a 32,000 token context window, precise instruction following for content moderation, and native function calling, Mistral Large represents meaningful progress in conversational AI that can enable new applications and innovations.

Analysis
  • Mistral appears to be stepping back from full ‘openness,’ at least for LLMs with abundant parameters. I’ve noted that the number of teams routinely publishing open LLMs is fairly limited. This quantity now seems diminished: Mistral’s choice to not plainly pledge open weights or models for Mistral Large, paired with its Microsoft affiliation, stokes worries about potential for more exclusive models moving forward. Although Google recently unveiled a duo of open models, Meta currently stands alone as the lone consistent source of open LLMs, at least for those over 30 billion parameters.
  • Despite the “open source” label on earlier models, a closer examination reveals Mistral LLMs don’t fully meet open source standards. They provide necessary components, such as model weights and appropriate licenses, for deployment and optimization but fall short in other open source criteria.

  • The practice of open sourcing models, while beneficial, introduces broader potential for security vulnerabilities.

  • Comparing Mistral to OpenAI, the latter reported $2 billion in annualized revenue in December 2023, with projections to double this within a year. Despite this, OpenAI faces financial losses due to hefty expenses in research, development, personnel, and computing.
  • OpenAI’s tools have demonstrated potential to bring value to companies in certain areas. As the capabilities develop further, businesses continue exploring optimal applications and use cases. For OpenAI’s offerings to reach their full potential, it is important that companies discover clear benefits in using them.
  • Both OpenAI and Mistral face competition from other AI entities, which poses a threat to their market positions if they fail to maintain technological innovation or secure adequate funding.
  • These AI startups are burning through traditional venture funding incredibly fast. Venture investors typically bet on early-stage companies where a large return on investment is possible. But for companies already valued at many billions of dollars, exponentially large outcomes are needed to generate those returns. So traditional VCs are largely priced out at this stage.
  • As a result, AI startups like OpenAI, Mistral and Anthropic must court large companies and sovereign wealth funds that can write nine-figure checks. There are only a handful of such funders globally that can make a meaningful investment at this scale. Going public is also an option, but these startups have such massive capital needs that even public markets may not provide enough funding fuel for their ambitions.
  • The enormous investment needs for pursuing Artificial General Intelligence (AGI) ambitions may necessitate these startups to reconsider their goals. While the allure of AGI fits OpenAI’s ambitious vision, the practical reality is that most companies desire AI solutions for specific business needs rather than a computational cure-all. The path forward lies in developing profitable products that solve these real-world problems, even if less glorious than AGI.

The pace of innovation in AI and foundation models brings many benefits. Developers and AI teams now have numerous model options, with impressive new foundation models regularly released. As I recently highlighted, I’m particularly excited about Google’s Gemini family of models. Their capabilities push boundaries while maintaining rigor around ethics and transparency.


If you enjoyed this post please support our work by encouraging your friends and colleagues to subscribe to our newsletter:

Discover more from Gradient Flow

Subscribe now to keep reading and get access to the full archive.

Continue reading