40 位 AI 公司创始人谈论目前的 AI 技术

2023-09-29
类别: 新闻

这个视频采访了 40 位 AI 初创公司创始人,他们分享了在日常生活中的 AI 使用经验,这里是文字版。

这段采访主要介绍了近期基于大语言模型的 AI 典型应用场景,如写稿,语音机器人,用 AI 来编程,AI 生成图片等,以及目前 AI 的优势和典型问题, 它可以做创造性的工作,擅长阅读和处理数据,但是它会有“幻觉”(hallucinate),无法保证结果的正确性。构建一个可靠的 AI 应用就需要在确定性的软件和不确定性的概率模型中间做好协同,常用的办法的包括拆解问题步骤,适当的引导,以及持续迭代和调试。

提问:在你的日常生活中,你以最意想不到的方式使用 AI 的方式是什么?

A: 是的,我不知道是否应该说这个,但是我用它来写婚礼的演讲稿。

B: 是的,我们两个都把电话答录机设置成了我们的语音机器人。

现场打电话。

人类:嗨,最近怎么样?

人工智能:部落很强大,Padawan。(前半句来源于《魔兽世界》,Padawan 指《星球大战》中未经训练的绝地武士)

不错。

C: 我认为这个机器人被编程成谈论“部落”。

D: 现在我们每天都打开它来帮助我们编程。

E: 帮助我们编码。

F: 它在编码方面非常出色。

G: 它使我编码的速度加快了很多。

A: 你可以告诉 AI 你想对用户界面进行什么样的更改,比如加入深色模式,它就能自动修改代码来实现这个功能。

H: 这些工具将使人类越来越像解说员,告诉它们我们想要的,然后,模型实际上会创造出比人类自己做得还要好的东西。

I: 我认为基本的问题解决技能始终会非常重要。了解这项技术及其限制,并充分利用它是至关重要的。

提问:从总体上看,当前的人工智能工具擅长做什么?

J: 是的,非常好的问题。有趣的是,生成式 AI 工具在我们原以为它们可能表现不佳的领域,例如讲创意的故事和创意 AI 的方面,实际上做得很好。

K: 用于创造性的人工智能。

L: 我们的目标是使任何人都可以在自己的卧室里制作像《南方公园》这样的作品。

M: 例如,你可以制作一张我、Eric 和 Rihanna 在海滩上打排球的照片。

看着人工智能生成的照片。

我喜欢这个。

这个是我最喜欢的,或者我喜欢这个。

我喜欢这个。

H: 我们正在构建真正像人类一样的 AI 语音,对话能力很强,就像人类一样。

O: 没有 AI 之前,声音听起来很差劲。现在它与人类声音几乎无法区分。

P: 就我们个人而言,AI 在语义搜索方面非常出色,以前这方面的效果不尽人意。现在只需要输入一段随机文本,它就能找到相关内容。

Q: LLMs 具有出色的阅读能力。

R: 它们非常擅长能够处理任意数据并回答相关问题。

S: 因为有很多新类型的数据,所以很难进行适应,因此我们目前正在对所有这些新类型的数据进行微调,以使其尽可能准确。

T: 对于时尚来说,非常具体地说,每时每刻都会出现新的术语,你必须不断更新这些模型,知道,噢,这个月的潮流是“美人鱼裙”,但下个月可能是“芭蕾裙”。

U: AI 工具在提供大约 85% 到 90% 的解决方案方面非常出色,但在其上需要进行更多的微调或更多的技巧来确保您可以提供真正的价值。

V: 你可以使用一系列简单的操作来完成一些非常复杂的事情。

W: 要为它们设定明确的结构和方向,只需要给它们一个明确的任务,它们就能做得很好。

V: 如果你能够思考你经历的过程,那么你实际上可以设计一个提示或一系列步骤,以便整个过程更可靠,而不仅仅是依赖它们自己。

A: 在你的流程中持续迭代和调试时很重要的。

Y: 你认为的解决方案可能随着时间的变化而改变。你的数据可能会变,模型的质量也可能会变,所以这需要持续的迭代。

Z: 我认为最困难的部分是你试图将确定性的软件与不确定的概率模型相结合,而我们正处于其中间。

Q:这是一个令人兴奋的工作,因为过去,计算机只是严格按照你的指令执行,可以预期在给定相同输入的情况下获得相同的结果,现在,相同的输入可能会产生不同的结果。

N:如果我们可以在输出中引入一些随机性,这有助于更好地探索并完善模型,我们的模型将会从众多选择中不断学习和进化。

M:它不像你期望的那样可靠,这对我们做娱乐的来说是好事,因为只要内容有趣,其他都不重要。但如果你在操作汽车,情况就复杂了。

Q:所以这是一把双刃剑,有时它会产生幻觉(hallucinate),创造出一些原本不存在的东西。

A: 我会把“幻觉”定义为 AI 生成一些不存在的东西,但看起来像是可能存在或者应该存在的东西。

J:它们在区分事实和虚构上仍然很差,虽然它们是好的创意故事编写者,但是知道什么是真实和虚假的区别上却令人意外地差。

1:如果你是医生,你可能需要花很长时间来确认 GPT 的诊断,如果出现任何错误,那么你可能会陷入麻烦。

X:在什么时候你会信任 AI 而不是专业医生呢?

J:近期,业界付出了大量努力来避免这些幻觉,但这已经产生了相反的问题,现在它经常会认为事物不真实,或者假装不知道它真的应该知道的事情,对吧?它会告诉你它从未听说过那篇文章,尽管它明显在训练集中,对吧?它就是在那里。

2: 有点像人类,你知道,当你阅读东西并从中得出一些结论并内化它时,你不能确切地记得你在哪里读到它,所以当你使用这些模型来处理现实世界的数据时,实际上更难区分什么是幻觉什么是真实的数据,你不能一直要求它提供一致的引用,这仍然是一个挑战。因此,在这方面的信任度还有一段路要走。

X:仅仅说准确率提高了是不够的,这背后有更多因素需要考虑,特别是涉及人们的信任时。如果你想开发一种最终被大众接受和使用的技术,信任是关键。

Q:我们仍然需要在细节上为它们提供指导,这就是为什么你会听到很多关于“人机协同”。

N:保持人的参与是非常重要的。

U:在 AI 修正过程中,始终需要有人参与以确保其准确性。

N:现在,有人需要监督并确保没有产生错误的信息。

Q:因此,有很多利弊,真正的挑战在于找到正确的引导方式,我认为所有在 AI 上工作的 YC 公司都面临这个挑战。我们得到了这个新工具,我们都在努力弄清楚。

Z: 我永远不想忘记,最终这项技术是为人类服务的,我们不能忘记其核心意义。我们可以让人类说出“那又怎么样”。

3:理想情况下,它实际上是在加深人与人之间的联系,更多地是与人互动,找出对他们来说实际上有价值的东西。

Question: What is the most unexpected way that you use AI in your regular life?

Yeah, I don’t know if I should say this, but writing speeches for weddings.

Yeah, we both set our answering machines to our voice bot.

Human: Hey, how’s it going?

AI: The Horde is strong, Padawan.

Nice.

This was, I think, prgrammed to talk about Horde.

Now we have it open every day to help us code.

Helps us code. It’s great at coding.

It’s just making me code infinitely faster.

You can describe to the AI what change you want to make to your UI, like build dark mode, and it’ll edit all the code to implement that feature.

These tools will make humans more and more like narrators, like people who are describing what they want. Then the models will actually create something that’s even better than what humans would do if they’re doing it themselves.

And I think the fundamental problem-solving skills are always going to be really needed. And understanding the technology and its constraints and how to leverage it is going to be so important.

Question: What are the current AI tools good at in a general sense?

Yeah, really, really good questions. I think one of the things that’s really counterintuitive about generative AI tools is that they’re really good at what we thought they would be really bad at, which is sort of creative storytelling work.

AI for creativity.

The goal to make software so that anyone can make South Park from their bedroom.

Example, you can make a photo of me, Eric and Rihanna playing volleyball on the beach.

I like this.

That one’s my favorite, or I like this one.

I like this one.

We are building truly human like AI voices, that’s very conversational, like humans.

Without AI, the voices sounded horrible. Now it’s just indistinguishable from from a human voice.

For us personally, what it’s amazing at is semantic search, that’s something that didn’t really work before. Just like taking a random piece of text and finding relevant things.

LLM have great ability to read.

They’re pretty good at being able to take arbitrary data and be able to answer questions about it.

Because there’s so much new type of data, it’s really hard to adapt and so we are currently fine-tuning all of our models on these new types of data to make it as accurate as possible.

For fashion, very specifically, there are new terms popping up all the time, you have to like keep updating these models to know, oh, for this month, the trend is mermaid court, but for next month, maybe it’s ballet courts.

AI tools are pretty good in giving you around like an 85% to 90% solution, but there’s a lot more fine-tuning or a lot more hacks that you need to put in place on top of them to ensure that you can deliver genuine value.

You can use a bunch of simple operations to actually do something really complicated.

You need to really like give them structure about like how it should look like and give them one particular task to do and then they do it very well.

If you’re able to think through the process that you go through, then you can actually engineer a prompt or engineer a sequence of steps so that you can have that entire process be even more reliable than you would be.

It’s important to just be very iterative in your process and just debug and iterate on your prompts as you go.

If you think you have a solution, it may not be the same solution over time, your data can change the actual underlying model quality can change with that, and so the biggest difference is just there’s this sort of iteration required.

I think the hardest part is you’re trying to marry deterministic kind of software with probabilistic models, and we sit right at the middle of that.

It is like quite an exciting thing to work with because in the past with programming, the computer really just followed your instructions to a T and you can expect the same results given the same inputs, now you put in the same inputs, you might get some variants.

If we can actually introduce some Randomness into our outputs then we can explore our space bit better and our models will get better from learning from all of these other choices that we can make.

It’s not reliable in the way that you expect it to be reliable, which is great for us, because for us we’re doing entertainment, so it’s like, as long as it’s funny, it doesn’t matter, but I guess if you’re operating a car, that seems more complicated.

So it is a double-edged sword, sometimes it can hallucinate and make up something that wasn’t intended.

I would define a hallucination as AI generating something that doesn’t exist, but looks like it might or should exist.

They’re still really bad at distinguishing fact from fiction, so they’re good creative storytellers, but they’re surprisingly bad at knowing the difference between what’s true and false.

If you’re a doctor, finding out what GPT decided was the diagnosis for this patient probably takes a ton of time to verify, and if there’s any mistake then you’re in a lot of trouble.

At what point do you trust the AI over the doctor?

There’s been a lot of effort in the industry recently to sort of prevent these hallucinations, but that’s created this opposite problem, which is now it will often think things aren’t real or pretend, that it doesn’t know things that it really should do, right? It will tell you it’s like never heard of that article, even though it’s definitely in the training set, right? Like it’s going to be there.

A bit like a human, you know, when you read things and you take something away from that and internalize it, you can’t necessarily exactly remember where you read it, and so when you’re using these models with real world data, it’s actually even harder to disambiguate what’s a hallucination versus something that was a nuanced piece of data, you can’t ask it for citations consistently, that’s still a challenge. And so the trustworthiness there has some way to go.

It’s not enough just to say like, hey, the accuracy metrics are better, you have to understand that there’s more at play especially for, you know, human trust, and that is a key component if you’re going to develop a technology that people are going to use at the end of the day.

There’s still a lot of nuance where we have to like steer them, and that’s why you’ll hear a lot about this human-in-the-loop.

It’s really important to still have a human in the loop.

Having humans in the loop ti initially assess whether the corrections that were needed to be made were accurate or not.

There needs to be someone supervising now, but making sure that there are no hallucinations.

So there are a lot of pros and cons, It’s really about figuring out the right ways to steer it, and that’s the challenge that I think like all the YC companies working on AI are facing. We’re given this new tool to work with, and we’re all really just trying to figure it out.

I never want to lose sight of the fact that ultimately this is technology is service of humans, and we get to keep humans to say the “so what”.

Ideally it’s actually like deepening human connection where it’s a lot more about, you knoe, interacting with people and figure out what’s actually valuable to them.