Microsoft's big OpenAI investment + generative AI brings data & copy rights to the fore

While the MS investment in OpenAI is getting headlines, many interesting trends are playing in the background around generative AI.

Jan 18, 2023

Welcome to RiD - Your weekly dive into “exactly how long is it till the robots take over?” or you could just call it your weekly Language AI update. Either works.

My vibes on one of the main hot takes out there..

Jo Kristian Bergum @jobergum

“LLMs will kill Google Search” vibes are so funny, Like G doesn't know about LLMs.

Pat Verga @pat_verga

New preprint: Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models. https://t.co/IFQPhUWJqv In this work we ask two key questions: 1) How to measure Attribution? and 2) How well do current SotA models perform on AQA? 3/ https://t.co/umdmMUJ6X8

But the ChatGPT as Google killer is for another newsletter…. Ok, here is the TL;DR list of this weeks newsletter:

Big story of the last week, Microsofts proposed investment in OpenAI of $10B.
Have we solved / democratized the core language modeling component (with transformers) and now the differentiator is training costs (in the billions) and data? That said…
Will generative AI bring the question of who owns the data to the fore? Huge implications and the battle has begun.
Still some decent activity in smaller start ups. How will / will they differentiate from the big tech capabilities?
Amateur and commercial spins offs and counterpoints abound around ChatGPT
Usual language AI highlights in commercial, technical, ethical and general interest items

Microsoft goes LOOOONNNGGG OpenAI

The biggest Language AI commercial story this week was obviously the possible $10B investment by MS. Various other parties are involved and the whole pay off structure is quite complex (by the way, check this AI newsletter out and add to your content stable). The 2019 investment by MS was a buy-in to the generative AI world, and so far looks like an era defining move by an establishment player. Which raises the question…

Will Big Tech ultimately dominate the core LLM / Generative AI market?

Although very much debated, if we’ve discovered the fundamental architecture for LLM’s & possibly AGI i.e. transformers (Sam Altman, OpenAI CEO, essentially thinks we have. Listen here to his thinking), the barrier to entry for creating generative AI is basically very large capital investment (in the billions) to train the models (with a few million to pay those pesky humans the set up the pipelines). Does that mean we’ll have a few LLM creators and we all just use APIs? I think this is a big theme / mega trend to watch over the next few years. As I said, many people certainly DO NOT think we have found the secret sauce…

Grady Booch @Grady_Booch

AGI will not happen in your lifetime Or in the lifetime of your children Or in the lifetime of your children’s children

Regardless, even if the big firms dominate, can others leverage these base models, at some price, and create specialized capabilities trained on proprietary data? The use of Reinforcement Learning from Human Feedback (RLHF), is an extremely powerful loop in the training process, which coupled with proprietary data, can enable specific variants, for specific use cases. Examples abound but see below (in the Commercial Happenings section) for an example in the legal space and the creation of a unique and vast data set. This data set however is a collaborative academic project leveraging data from the American Bar Association. So…

Who really owns the data?

This question is starting to come to the fore as ChatGPT and other LLMs, derive their abilities from the data they use, much of which is proprietary or copyrighted. Mere citizens had no chance in the web 2.0 world but with clear copyright laws around content, the whole data question seems to be getting more interesting. See below for a write up on this broader point however the battle has already started in the generative AI Art space. Stable Diffusion is now the target of a class action lawsuit, on behalf of three plaintiffs, on this exact point. This too, is another big trend to watch.

Spins offs & creator economy reactions

Another trend I am seeing on Twitter which is visible in the market more broadly is just an absolute sh/t load of spin off or counterpoint apps to ChatGPT. People are integrating with virtual agents, video, chrome extensions, data analytics …everywhere. Below I also link to some initial apps looking at trying to tell you when ChatGPT is “hallucinating”.

This is to me seems like where a lot of creative destruction could happen, as previously well funded companies, with previously SOTA AI capabilities, may just get swamped by iterations on ChatGPT or similar services. This seems distinct from the proprietary data and RLHF examples, in that they leverage the generic API for more basic services.

Additionally, on the content side, the creator economy has exploded with “how to” & other videos, articles etc, which seems a perfect example of our new frontier of a very large and fluid creator market can pivot and respond in real time to the current zeitgeist, in ways that traditional media can not or will not.

Other bits & pieces:

Is Anthropic (the recipient of huge funding amounts in 2022, mostly from FTX (see last week’s newsletter) in trouble under any claw back related to SBF? The claw back team related to Madoff were extremely aggressive in this regard, so doesn’t seem out of the question in relation to FTX.
Are we entering a VC hype bubble around generative AI?? More next week on this,

Basic market data tracker:

Happenings this week:

Commercial:

Funding events:
- Inbenta ($40m)
NLP Market size Worth USD 161.81 Billion by 2029
DeepL, the AI-based language translator, raises over $100M at a $1B+ valuation
Some seed activity / new kids on the block:
- Attention uses NLP to help sales reps sell faster
- Seek AI Finds $7.5M in Seed to Grow Its Generative AI Platform

Technical:

Microsoft researchers are working on a text-to-speech (TTS) model that can mimic a person's voice after a mere three seconds of training.
Google AI Unveils Muse For Faster Image Generation
Meet Med-PaLM: A LLM Supporting the Medical Domain
AI for legal - Legal NLP Dataset Called ‘MAUD’
More TTS seems to be getting an upgrade. A new beta product on the market at https://www.elevenlabs.io/ which enables you to choose - Gender, Age, Accent, Pitch, Speaking style. Marry this with ChatGPT and we’re off. Discovery credit -
Pete @nonmayorpete
Eleven Labs (@elevenlabsio) released a generative voice AI. Configure the following: - Gender - Age - Accent - Pitch - Speaking style Press generate. You get a new voice every time 🤯 Samples:
3:38 AM ∙ Jan 13, 2023
27Likes3Retweets
ChatGPT guardrails (maybe): As everyone using ChatGPT knows, it can spew out some very confident but inaccurate answers. How this is dealt with has many and various options. One way is another AI truth agent. Got it all claims it is making great strides in that regard. Another beat product.

Ethical & other:

Watermarking ChatGPT
- OpenAI looking at this
- As are others
The question of who owns the data:
- ChatGPT: An Author Without Ethics
- Stable Diffusion lawsuit
AI Art curation - Could Instagram’s algorithm curate an art exhibition

Other media:

Pods:
- Is ethical AI possible? Vox media
- ChatGPT implications: Besties All in pod

»»»»»»»»»»»»»»»»»»>

Thanks for reading robotsindisguise Newsletter! Subscribe for free to receive new posts and support my work.

robotsindisguise Newsletter

Microsoft's big OpenAI investment + generative AI brings data & copy rights to the fore

While the MS investment in OpenAI is getting headlines, many interesting trends are playing in the background around generative AI.

Basic market data tracker:

Happenings this week:

Other media:

Discussion about this post