Grammar Girl Quick and Dirty Tips for Better Writing

Can AI really write? A no-nonsense discussion, with Christopher Penn

Episode Summary

1021. This week, Christopher Penn talks about the role of AI tools like ChatGPT in writing and editing. We look at common misconceptions about how AI works and best practices for writing prompts. We also talk about privacy concerns, bias, fact-checking, and our concerns for the future. Whether you use these tools daily, tried them a long time ago and decided they aren't for you, or are just curious, you'll find something of interest.

Episode Notes

1021. This week, Christopher Penn talks about the  role of AI tools like ChatGPT in writing and editing. We look at common misconceptions about how AI works and best practices for writing prompts. We also talk about privacy concerns, bias, fact-checking, and our concerns for the future. Whether you use these tools daily, tried them a long time ago and decided they aren't for you, or are just curious, you'll find something of interest.

Find out more about Christopher and his books at trustinsights.ai and ChristopherSPenn.com.

🔗 Share your familect recording in a WhatsApp chat.

🔗 Watch my LinkedIn Learning writing courses.

🔗 Subscribe to the newsletter.

🔗 Take our advertising survey

🔗 Get the edited transcript.

🔗 Get Grammar Girl books

🔗 Join Grammarpalooza. Get ad-free and bonus episodes at Apple Podcasts or Subtext. Learn more about the difference

| HOST: Mignon Fogarty

| VOICEMAIL: 833-214-GIRL (833-214-4475).

| Grammar Girl is part of the Quick and Dirty Tips podcast network.

| Theme music by Catherine Rannus.

| Grammar Girl Social Media Links: YouTube. TikTok. Facebook.Threads. Instagram. LinkedIn. Mastodon.

Episode Transcription

LIGHTLY EDITED TRANSCRIPT

Mignon: Grammar Girl here. I'm Mignon Fogarty, and today we're going to talk with Christopher Penn about some of the common misconceptions around AI tools like ChatGPT that I've seen among writers and editors and just people in general.

Whether you love it or, or despise it, I think it's really important to understand AI. And Chris is a guy who can really help us do that. 

Christopher Penn, welcome to the Grammar Girl podcast.

Christopher: Thank you so much. It is nice to be on one of the OG podcasts. You and I have been podcasting, I think since what, ‘05?

Mignon: You were ‘05 think I was ‘06, but yeah, it's been a while. Yeah. And so actually, because I've been subscribed to your newsletter for so long, one of the things I've noticed about you is that you are always on the cutting edge of trends and data. I mean, you called the housing crisis in 2007.

I remember I was looking at buying a house, and you actually said to me, “Mignon, I think maybe you should wait.” And I did not take your advice, is one of the worst financial situations I've ever made, but we loved the house. It turned out okay, but you were definitely right. And then in 2019, 2020 again, you predicted how bad the pandemic was going to be, but way before a lot of other people realized it.

And so, I guess it was about a year and a half ago, I noticed your Almost Timely newsletter became entirely about AI, and I would love to hear more about how that came about, like whether it was a gradual thing or did you just have an “aha” moment where you were like, “Okay, this needs to be everything.”

Christopher: So AI has existed in some form since the 1950s. I started to have an interest in 2013, and there's three branches of AI. There's what's called regressive, classification and generation. Regression is needle in a haystack. "Hey, here's a bunch of data and here's an outcome. What in the data is like this outcome," right?

We call this find, finding AI. There's a classification, which is "Here's a bunch of data. Let's organize it." Like, what do we got in here? You see this particularly with people like marketers trying to organize things like social media posts. Like there's so many, how do we organize and classify them?

So that's the central point. And then there's this new branch that started in 2017 called "generative" based on an architecture and a paper created by Google called "Attention Is All You Need." And the architecture is called transformers. That became a product family really around 2020, just as the pandemic got rolling, when companies started to say, "Maybe we can get these, this architecture to do something interesting, like predict the next word in a sequence in a way that's never been done before."

By 2021, there were usable models that would write coherent text, not factually correct, but at least coherent and readable. Prior to that, it looked like, you know, you were rolling your face on a keyboard. And that was when I started paying attention to this technology. I started using OpenAI GPT3, which had come out in 2021.

And then November, 2022, OpenAI releases a tool called ChatGPT. And suddenly everyone's an AI expert. And having that background of more than a decade working in the core technologies, I could see where this was going because ChatGPT changed people's relationship with AI. 

Prior to that, you needed coding skill, right? You needed the ability to work in these models and stuff like that and build your own models, and then suddenly there's a web interface you can chat with just like you were drunk texting with your uncle Fred, and suddenly, okay, wow, people can make use of this. And it was that point where I started pivoting a lot of my content to say, "Okay, here's what the stuff does, and here's how we should be thinking about it."

And of course, you know, the rest of the world pivoted as well as people start to understand the implications of these tools. But now today, these tools have accelerated faster than I've ever seen any technology evolve. It is absolutely breathtaking how quickly they've evolved, and I'll give you an example in the first release of ChatGPT, and that was the back-end model is this obscure, poorly named model called GPT 3.5 Turbo that required an entire server room of machines to run and serve up answer text. 

About a month ago, Meta, the parent company of Facebook, released a model called Llama 3.1. That version, there's a version you can run that is more capable than GPT 3.5. About as capable as the successor, GPT 4, and runs on your laptop with no internet connection. 

[Hey, I'm just jumping in here to say that we recorded this episode about a month ago, and since then, Meta has released an even newer model called Llama 3.2.]

So we've gone from an immature technology that is extremely expensive to use to a mature technology that you can run. I've used it on a plane with no internet. And to have generative AI capabilities all the time in less than two years, that is crazy fast evolution.

Mignon: Yeah, I agree. It's amazing. I haven't seen anything like it since the introduction of the internet. And, you know, if people looked into it six months ago or eight months ago and gave it a try, it's different today. It's so much better. And there are so many misconceptions.

So, you know, one of the things someone asked me a few weeks ago was, "Well, does ChatGPT use AP style or Chicago style?" And, you know, I remember having that question when I first started playing around with it too. It's a reasonable thing for an editor or a writer to ask, but it's not how it works.

So can you maybe start by explaining why that's not the right question to ask, and sort of how it works at a sort of basic level for people who haven't looked into it as deeply as, as you have, or maybe I have.

Christopher: Sure. So the way these tools work is they understand pieces of words and how those pieces of words relate to every other word around them in the sentence, in the paragraph, in the document itself. They are trained on such huge quantities of text. To give you an idea of what the average model is trained on, it would be a bookshelf that wraps around the equator twice. That's how much raw text is needed to train one of these models. And what they get used to seeing is understanding the statistical distribution of language and implicit in that is how language works. So if I say, "I pledge allegiance to the … " the next word probably is "flag."

It probably is not "rutabaga." And as a result, the models understand the likelihood of everything that we say. If I say, "God save the …" right, depending on your nationality and, you know, your knowledge base, you might say we've got "the queen" or "king," but again, probably not "rutabaga." You ask about something like Chicago style or AP style or any writing style, you're asking about a specific format.

And these models have seen all of that and then some. What they spit out, by definition, is probability based text. So, if you give it a very short, naive, terrible prompt, like, "Write me a blog post about B2B marketing," it's going to give you the most probable words, phrases, concepts, to that in a written format that probably going to be technically correct, probably going to be boring, probably going to be completely unoriginal because it is invoking its best guess is our probabilities for those words.

If you think of when we talk to these tools, we are prompting them, we're writing to them, like chatting, etc. Every word that we type in a prompt kind of has an invisible word cloud around it, and where those word clouds intersect is how it knows what to respond with, right? So these word clouds are huge, and they narrow down.

So if you write a prompt, like "Write me a short fiction story about two women in love." You're going to get, right, because it's so generic, but if you say, "Write me a short story about two women in love set during the Victorian period, but set in Wales instead of London. And the one woman comes originally from Scotland. The other woman comes from France originally," you see all these extra words and all these word clouds start overlapping, and they get more and more precise, and you get more and more text that is unique and different. Because you're getting more specific. 

The one thing I say in all the keynote talks that I do is if you remember nothing else, the more relevant, specific words you use when you prompt these tools, the better the output will be.

If you give it, I say, think of these things like the world's smartest interns. They have this brand new intern, comes into the office. This intern has 255 PhDs, they got a PhD in everything. But it's still day one. They still don't know who you are. They don't know where the restroom is. They don't know where the coffee machine is.

And so if you to say to enter, "Hey intern, go write me a book," you're going to get crap out of that human intern, just like you're going to get crap out of a machine. If you, on the other hand, give the engine, "here's our style guide. Here's our brand safety guidelines. Here's the background information. Here's the outline I want you to use. I want you to use AP style," and you give the intern a big old pile of directions, and that intern is going to come up with a much better result.

Mignon: Right. Yeah. And I feel like we shouldn't   go past the idea of training on, like, the entire world's libraries without mentioning the concerns about copyright infringement, and the lawsuits that are currently happening. So that's like a whole other podcast. That's a big issue in the field.

So if you weren't aware of that, I want you to know we're not going to focus on it very much today …

Christopher: We're not going to focus on it, but there's an interesting distinction, and this is what the law will hinge on, and it depends on where you are. So in the EU, the EU has ruled that the use of someone else's intellectual property for creating models infringes on their rights. 

In Asia, China and Japan have ruled that what is inside a model, if you pull it open, is a big pile of numbers, it in no way resembles the original works, and therefore, a model does not infringe on someone's rights because it in no way could be mistaken for the original.

And so the question, it's going to be resolved in every jurisdiction separately as to whether what AI does is infringing on the originals or not. And it's not clear how that's going to play out.

Mignon: Oh, thank you. That's fascinating. I wasn't aware that there were country differences at this point. That's super interesting. Does that mean people in the EU can't use these models right now?

Christopher: There are some tools that are not permitted to use that are not available in the EU.

Mignon: Wow. Okay. Okay. I didn't know that. So another concern that I see, Oh, and I did want to ask, so is it, would it be fair to say that, you know, back to the AP Chicago issue and thinking about the way it works, would it be fair to say that if you asked it to write a news article for you, that it would be very likely to follow AP style because it is looking for words from that style of writing that originally would have been written in that style, or is that not

Christopher: It's probable that it will invoke something that looks more like AP style than Chicago style as opposed to say, writing a nonfiction book, which is probably going to be more Chicago style than AP style. If you, again, with these tools, if you tell it, I want this an AP style, it will write it in AP style.

If you, if you let it do its own devices, it will do the highest probability for the data set that you're working with. One of the things people knock these tools all about is saying, "Oh, they use the exact same as always “in a world of” whatever, and they, you know, the word "delve" and all this stuff. Well, that's within that context.

If you say, write some Sapphic fiction, the word "delve" is not going to appear because in the training data, it's seen Sapphic fiction writers don't use that word, right? So it's not going to be within that set. So part of prompting is understanding what words we want to invoke and what range of diction we want to invoke. That is very often not something that human writers think about.

Mignon: Mhm. Great. So another misconception that I hear a lot of people who are concerned about privacy and confidentiality, that they don't want to upload their information because they don't want it to be used to train data, or they have clients that don't want their, you know, proprietary information getting into these models.

And I think that sometimes it's the case that you wouldn't want to do that, but I don't think it's always the case. And, you know, how can people carefully deal with these issues and still carefully use AI if they want to.

Christopher: Here is the golden rule: if you ain't paying, you're the product, right? It's been that way for social media. It's been that way for everything. And the same is true with AI. If you ain't paying, your data is the product. And that is being used to train other people's models and things. Now there's two different branches of thought on this.

And here's one that's, you know, the obvious one is pay for tools. And then make sure that the privacy settings of those tools reflect what you expect if you're using, for example, the developer edition of the enterprise edition of any tool in the contract, it says we will not use your data to train on and stuff.

And so if you are working with sensitive information, you absolutely should be doing that. If there is some data you absolutely positively just cannot give to a third party. And there are certain versions of AI like Meta's Llama 3.14 405B model, you can run that on your own servers inside your own company.

And that is totally under your IT department's control. It is reassuringly expensive. To do that, you're going to spend tens of thousands of dollars on hardware, but if you are say a three letter agency based in Langley, Virginia, then that's a pretty safe investment, and you will have all the benefits of generative AI, but you wouldn't know your data will never ever leave the protective confines of your compound.

So that's one aspect. And the second aspect, when it comes to what goes on with your data is knowing the level of sensitivity of it, right? So if your company or your work, whatever is bound by guidelines like ISO 27001 or SOC 2 compliance stuff, you know, what you're supposed to be doing with your data of any kind, you know, what's allowed and not allowed.

So you can just look at, you know, "Here's the requirement in general that my system's supposed to have, like HIPAA compliance for healthcare." Look at ChatGPT: not HIPAA compliant. "Okay, clearly then, I can't use HIPAA data in a non-HIPAA system." I mean, that's kind of a no brainer. So think about the regulations you already have to adhere to, and what systems are qualified for those regulations.

Mignon: Right. Well, what if you're just like a lowly person like me who maybe has a ChatGPT or Claude subscription or something like that? Do you feel like that is, you know, and I think they say that they won't use your data for training, but also it's pretty cheap. So, you know, am I putting my stuff out there at risk?

Christopher: If you've confirmed the setting, say they will not train on your data, then you, I mean, the basis of contract law is that they will not train any data. If it turns out that they do that, then you get to join the class action lawsuit and sue them for a whole bunch of money if it comes out that they were not adhering to those terms. 

And again, this comes down to sensitivity. So if you are writing a book, and it turns out that they, in fact, were training on your data, yeah, you should join the lawsuit. If you are processing case notes from a field agent, and that leaks and that gets that agent killed, you should not be using ChatGPT, period.

No license, no matter. You should be running that internally, you know, in a protected system, because that is literally life or death. And I would say the same is true for all these different use cases is what are the consequences if something goes wrong? That's a question people don't ask often enough is “What could go wrong?”

Asking it unironically.

Mignon: Yeah. So, let's talk a little bit about hallucinations and fact checking. So, you know, one of the things I see the most people doing actually is using ChatGPT, Claude, Perplexity as a search engine, as a replacement for Google, essentially. And yet we know that they make mistakes. So what are your … how do you approach thinking about using these tools for search, and what advice do you have for people who are doing that?

Christopher: So hallucination is a probability error. If you think of these tools like a library, like a model is like a library, and there's a librarian, and you ask the librarian questions, depending on the obscurity of the question, the librarian may or may not be able to find a book that's appropriate to your question.

So if you walked into a library, and we'll pretend it's obscured, you walk and say, "Hey, I'd like the' Joy of Cooking.'" And the librarian wanders around a bit and comes back with the "Joy of Sex." And he's like, "This is close, right?" No. That's not close. Not at all, but semantically, it thinks it's close enough.

That's what's happening in the hallucination is the models have three basic directives: be helpful, be harmless, be truthful. And it's a delicate balancing act. But they try to be as helpful as possible, sometimes at the exclusion of being truthful, right? So they will say, "Hey, you want this thing? I'm gonna try my hardest to give you this thing," and it gives you this thing.

That's the thing. You're like, "Yeah, but that's not right." We didn’t say it was right. And so using these tools as search engines is probably not the best use case for a language model self. If you, if you dig into the raw technology stuff, you've got to get super nerdy. All models hallucinate a hundred percent of the time.

When they're first built, it requires a lot of tuning to even just stop, get them from lying and period because they're always assembling just probabilities. And so there's a lot of extra steps to go into refining a model to make it more truthful. And then there's additional safeguards built on top of it.

Now, what's happening in the field is that we're diverging into people … these model makers recognize the validity of the use case of "most internet experiences suck," right? Go to someone's website, like, I want to go check out a recipe, I've got to wade through 42 pages of your grandmother's second cousin's roommate's dog's, you know, why they like this recipe.

I don't care. Tell me how many cups of sugar to use. And so, Perplexity, and Google, and ChatGPT, and all these companies said, you know what, that experience sucks. So instead, we're going to generate the answer for you that says you need two cups of sugar for this recipe. The consumer sees as a much better experience.

And so we now have tools like Perplexity. We have Google's AI Overviews. We are going to shortly have OpenAI Search GPT that use search data as the basis of their responses, and then they craft responses from that data. And they still do get it wrong. Perplexity, in particular, will ingest the webpage content and summarize it down so it's more accurate than just making it up from the model', long term knowledge, but it's still sometimes wrong. Google famously, when search generated experiments became AI overviews, had the case of, well, "Hey, how do you make pizza, you know, cheese, stay on pizza? Add a quarter cup of glue to the pizza."

That was from a Reddit forum post that someone said that in jest, and Google does not have a sense of humor. And therefore was unable to detect that. So for consumers, there are some tools which are going to likely lead to a higher probability of success, like Search GPT, like Perplexity, like Google AI Overviews, because they are rooted in search results.

Other tools like ChatGPT or straight, you know, regular consumer Gemini or regular Claude, you should not use those as search engines because they're not pulling from at least known sources on the web. However, they are still subject to the vagaries and the flaws of search itself, right? You can find in a search engine, absolute garbage, completely incorrect information.

And whether you're using generative AI or traditional search, you will still get the answer to what you were looking for, even if the answer is completely wrong.

Mignon: Right, so especially if you're searching for something important, it's critical that you actually check somewhere else that the fact … so if say, you're using Perplexity, it shows you the sources where you got the information. Click through on those sources and double check that what you've got is right and not somehow taken out of context on that page or, or something like that.

Christopher: Yeah, and know the difference between sources. For example, we had to postpone this interview. This interview was supposed to happen last week, but I was in the hospital getting emergency surgery. As part of recovery, I wanted to figure out what I could do to accelerate recovery. So I went to Perplexity.

I said, "You know, find me peer reviewed research on post-surgical recovery methods." And it points out these different papers. I went and downloaded the papers themselves after checking which journals they were in. Nature, Cell, Science, NIH, PubMed. I know these to be credible journals, as opposed to, you know, Bob's Random Journal of whatever. And then stuck it into a system like Notebook LM, which is a product by Google specifically made for academic research, where you give it the data and ask questions of the data. And if you didn't provide the answer, it will say, "I don't have the answer." Will not try to make it up. And I was able to design a post-surgical recovery regimen that's working about 40 percent faster than normal because I was able to go get the data, but I had to be the informed consumer to do that. The tools don't do it for you.

Mignon: Yeah, that's great. I'm glad you're getting better faster. And yeah, I do the same thing. I go to Perplexity when I have medical questions like that, but I do, I always download the journal article and read it. You know, I used to be a scientist and a tech writer, so I'm comfortable reading journals, and 90 percent of the time it's great.

But I have found errors occasionally, you know, things that Perplexity didn't quite present properly in the summary. Yeah. 

So, you know, some of the people I encounter who are most opposed to generative AI are fiction writers, which I think is kind of funny because not that I'm laughing at their concern, but that fiction writing is one of the things I think that AI is really bad at.

Um, but like, what do you think? What do you find are the things that, for writers, AI is the best for?

Christopher: So there's six broad use cases of generative AI. Generation, summarization, extraction, rewriting, classification, and question answering. Those are the big six. Of those six, two of them it's less good at. Generation, aka writing, and question answering because of the factuality issues. Summarization, take big data and make it into smaller data — they are almost flawless at. They are so good at extracting, extraction, take data out of data, super powerful, super helpful when, … you know, we're in the midst of a presidential election here in the USA. And this one group releases one big document. And I said, "Extract out just these parts" because I care about this, and I wanted to read just that part of it. I did it from, you know, these tools do a fantastic job rewriting. Take one form of data, turn it to another form of data. So, for example, one of the things tools do incredibly well is to take complex things and explain them into something you understand.

So if you are a fiction writer, you can have it explain something in a story format. That's maybe an arcane concept, like how the spike protein on the SARS-CoV-2 virus works, you know, explain this to me as though it were a fantasy epic. And it will do an incredible job that's still factually correct. 

And classification: take the data and organize it.

With fiction writing, here's why generative AI gets a bad rap: because people can't prompt it well for fiction writing. That's really what it boils down to. And because they're trying to do too much. I use generative AI for fiction writing all the time. I write fan fiction with it. And my fan fiction is good enough that people can't tell unless I tell them this was written with generative AI, and the process and procedure goes like this.

Number one, I preload the model's conversation window, the context window, which is a short term memory with the topic, whatever the general idea is. Then I brainstorm with the machine what the overall plot should be. And then I have it build out character cards, and we go back and forth about who the character should be.

We look at, you know, what are their fatal flaws? We do things like, you know, Christopher Booker's, you know, seven major story types, et cetera. And then I say, "Let's start writing the outline for the story." And I say, "First, let's do 15 chapters," right? Right. I got the outline for each 15 chapters, three-arc format, and so on and so forth.

And I say, "Great. Now let's divide each chapter into five sections." And we'll then break down a chapter in five sections. "Now I want you to, using all the stuff that we've got, you're going to write chapter one, section one with a minimum word count of 1,200 words. Do not repeat yourself."

And so on and so forth. Give it a bunch of instructions. And then I will provide it a writing style. So I'll take one of my previous human written stories that I've written and say, "I want you to copy my writing style exactly because I have a very specific way that I like to write." And so what it does is instead of that sort of generic kind of wishy-washy machine probability of writing, it replicates me.

And I have assembled these pieces, you know, one section at a time and created, you know, 50, 60,000-word pieces of fiction that's pretty decent. You know, it's certainly good enough for fan fiction because you can't make any money on it anyway. It's illegal to, but it, for me, is great when I want to express a story that I don't want to sit down and hand write out because I'll have an idea in the middle of that, like, you know what, this would be a cool story, but I don't know that I really want to spend six months writing it. Could I get something that's 90 percent as good as me in six hours instead? The answer is yes. And now I have my story and whether or not I even publish it, at least it exists. 

I was doing a piece the other day. I had this wild idea of, you know, it was from a Reddit post talking about how in the "Terminator" franchise, Skynet should have won almost immediately. The fact that Skynet didn't, and they keep having more and more "Terminator" movies, indicates that Skynet wasn't trying to win. Why? Because it was built for the purpose of war. If it wins the war, it has no purpose anymore. So its goal then is to prolong the war as long as possible so that it continues to have purpose.

So I wrote a fanfiction story where the resistance, instead of trying to save John Connor over and over again, sent "Terminator" back to try and reprogram Skynet from the beginning, and it turned into a machine love story.

Mignon: That's fun.You know, you reminded me of something that came up. So the length of things you're trying to get out of AI, you know, one of the primary things I use it for is transcription. You know, we've been doing these interview podcasts, but I feel like it's really important for accessibility to have good transcripts.

So we use AI to make the transcripts, and it enables me really to do these podcasts. You know, I can do two shows a week now because we can get those AI transcripts. That's one of the reasons. And yet, like the other day, I put in the audio, and it came out perfectly formatted, really great — the first half. And then halfway through it just started putting the text on there with no punctuation, no capitalization. It was this block of vomited text, and it had been so good for the first half. And like, what's the deal? Why did it just, I mean, I know it doesn't have intention, but why did it just get tired and give up halfway through?

Christopher: It depends on, so which, which model or engine was using it.

Mignon: I was using Macwhisper.

Christopher: Okay. So using the whisper model. The whisper model loses coherence after a while depending on its setting. This is one of the reasons why we don't recommend people use it unless they have some technical knowledge. Because there's a way to start that particular engine that has very specific parameters for voice.

Mignon: Hmm.

Christopher: Especially if there are pauses longer than three seconds, it blows up, it just loses its crap and just goes crazy. Generally speaking, for transcription, I will typically recommend people use, if they have the budget, use a vendor like Fireflies. Fireflies charges you by the minute of uploaded audio.

And then obviously, once you get the transcript, just delete the audio, and you can stay on the lowest level subscription plan. And then from there, you can … you would put it into a tool like Google Gemini to essentially clean up the transcript and remove stop words and all of these other things. 

Whisper is a fantastic model. I use it all the time. I use it to transcribe YouTube videos, but its transcripts always require cleanup as well. So, with all these tools, you will, you need to have almost like a workflow that allows you to take, you know, here's the raw audio, turn it into raw transcript, take the raw transcript, refine it into, you know, grammatically correct, you know, grammar, punctuation, spelling, whitespace, and all this stuff, and then you can then take that and do other stuff with it.

Mignon: Yeah, another thing I've used it for really successfully is having it teach me how to use Google Spreadsheets. So, you know, I had the spreadsheet with the URL and that for every page on our website. And, you know, I post them to social media over the years, and I always keep the teaser.

And so I have the teaser and the URL, so I can use them again in the future. And when we redesigned our website, every URL changed, and we didn't get it into my spreadsheet. So it broke my spreadsheet, and it was that way for years. And I didn't post as much to social media because my spreadsheet was broken.

And then I realized I could use ChatGPT to show me how to, like, run a script to replace all those URLs and match them to what they needed to go to. And it's something I don't know that I ever could have done a Google search to figure out the answer to that particular problem. I mean, what do you call that?

I mean, I didn't hear that under your five or six model things that AI can do.

Christopher: That's question answering, but what you're really talking about is coding. You are having these tools help you code because, you know, if you're writing a script or you're writing some macro to do some, that's code. And these tools are phenomenal at coding. In fact, these tools are so good at coding that is having a negative effect on hiring of coders, because if your current coders that you have on staff, start picking up generative AI and start using it, they can become 2x, 3x, 4x more productive overnight. Now that means that you don't have to hire new coders, but you don't have to add headcount. You can get a lot more efficiency out of the people that you already have. And the reason for this is that code is much less ambiguous than the written word, right?

And then and how we read and write language, a statement in Python is either correct or not, it runs or it doesn't, as opposed to say, you know, the expression like "That's what she said," right? There's so many ways to say that if you change the emphasis like "That's what she said," or "That's what she said." Just the inflection changes the meaning of the sentence, and then on a page, you can't see that. 

So language models struggle with human language, but machine language, like a Python or an R or C, there's very little room for ambiguity. And so they are much better at machine languages than human languages.

So when you're doing stuff like that, even just writing a macro or script or whatever, yeah, you are writing code. You have any tools to help you write code, answer questions with code and they're really good at it.

Mignon: Yeah, it was amazing. The other thing that I keep forgetting that I always forget about AI is that it can, it can help you solve the problem you're having with it. So the first time it told me how to write the script, it didn't work. But I put in, I said, "Okay, this is the, this is the error message I'm getting."

And I put it back in and said, "Okay, this is the error message. What, what do I do now?" And, and then it told me what to do. And I got it right. Like, how I find that sort of confusing. Like, how can it get it wrong the first time, but then know the second time what to do? It's weird. Like this is, this technology is weird.

It's not like anything I've used before. And you have to remember tha:, that it's like an interaction. It's, it, I don't know. Can you talk a little bit about that?

Christopher: So the nature of the underlying technology is such that the more you talk about something, the smarter it gets on that thing because it has this memory. Remember we were talking about how every piece of a word associates with another word. You know, the whole idea of word clouds. If you ask it something, and you don't provide a lot of information up front, it's going to do its best to infer what it is you're talking about. And it's going to get it wrong more often than it should, because it's trying to even just guess what your intent is. The single biggest problem that people do, the single biggest thing that people do wrong with these tools is their prompts are too short. My average prompt is three to five pages of text before we ever get it with anything. That is how much information and to give you a sense of how much they can handle this, they in the short term conversation window of like a ChatGPT, it can hold 90,000 words, right? So that's like this is a 75, right? [He's holding up a book.] This can be a prompt. This whole thing can be a prompt.

Google's Gemini, the new version of Gemini can hold two of these. As a prompt, right? 1. 4 million words is how much information can go into a prompt. The more data you provide, the better. So when I prompt these tools, we have a format that we call RACE: role, action, context, execute. 

You tell the model who it is. You are a Pulitzer prize winning author who specializes in writing Sapphic fiction, right? Give it the action. Your first task today is to write the outline for a short, a Sapphic short story that's going to be between 1,500 and 2,500 words. And it will encapsulate a slice-of-life scenario in the character's arc.

Here's the background. Here's who the character is. Here's their goals and motivations, everything that you would expect in a character card in a piece of fiction. Execute. Write the outline for the story in whatever format and things. And so if you prompt with enough information up front and provide enough information, the likelihood that it gets it wrong the first time goes down.

The more data you provide, the less it screws up. That's so critical for people to understand.

Mignon: I can imagine some people are wondering, "Okay, you're writing a six-page prompt. Isn't it easier just to write it on your own? Like, what, how do you decide when, you know…

Christopher: That's a very good question. The answer is it depends on how reusable you want it to be. Google just came out with Gems, which is their version of ChatGPT's, GPTs. You might have something that you do frequently that you would want to reuse over and over again. And yeah, the first time you build that prompt, it's going to be a long time.

It's going to take you 15 minutes, 20 minutes to build out that six-page prompt and stuff. Yeah. But then every subsequent time you use it, it's seconds. I'll give you a real simple example. Um, well actually, uh, how much fun do you want to have on this show? 

Mignon: I want to have as much fun as we can possibly have.

Christopher: All right, let's see. Can we, if let's do this, I'm going to share my screen and,

Mignon: Okay. Keep in mind, this is audio. So we want

Christopher: This is audio. So what we're going to do today, I'm going to go into Google Gemini, and this is again, something that you can do in ChatGPT or Claude or whatever. I'm going to ask it the sequence of prompts that is going to basically explore a topic.

And your topic is best practices for being a sensitivity reader. So for those folks who don't know, a sensitivity reader, their role is to read through a piece of text and understand, like, is this biased? Is this, you know, did you say something that's inadvertently racist or misogynist or cultural appropriation, all of those things that authors shouldn't do, but it, it just kind of happens.

And so, what Google's Gemini is doing on screen is it is asking itself questions about what the best practices are for being a sensitivity reader. So what are those best practices? What are common mistakes that people make as a sensitivity reader who maybe were less experienced things that are generally true about sensitivity readers that are actually false things that are actually false about being a sensitivity reader that are actually true.

And then expert tips and tricks that, uh, you know, true experienced veteran sensitivity readers would know what we're doing here is we're having the model explain what it knows on a topic. And this is part of prompting because every part of the conversation that you've had becomes part of the prompt for the next section of the conversation.

So we are now at, let's see, it has gone through and has written a whole bunch of stuff. It has written 2,300 words of its prompt on just on sensitivity reading. So these are all the best practices, and this is now probably like five or six pages. The next thing that we want to do is we want to have a model learn how to evaluate itself because evaluation, these models, as you pointed out, they're really good proofreaders, right?

They're really good explainers. So I'm going to give it a prompt, say, "I want you to build a scorecard, a scoring rubric for sensitivity." I'm going to hit stop there because that's the wrong one here. Delete that. And I want you to … something is sensitive here. So we're going to score. And we would build a scorecard and this. The idea being that the model would know based on all the stuff we just talked about how should you score a piece of content as to whether it is using condescending or patronizing tone, use of othering language, reinforcement of negative stereotypes.

Um, so it's going to build out this gigantic scoring rubric. Now, as it finishes that up, I'm going to say, "Our next best step is going to convert this into system instructions." So we're going to say, "I'd like you to convert this entire conversation into a prompt to be used with large language models like Google Gemini. The purpose of this problem is to instruct the LLM to accept input from the user to be scored using the sensitivity rubric."

This will return scores and analysis. As well as recommendations for what the user should do differently in their text. And I'm going to give it, tell it to incorporate everything we talked about. So now we're going to, we're having the model create its own programming. So it is now writing its own software to become a sensitivity reader, which is pretty ambitious. 

And so it says "Here's all and this what's being shown on screen for the folks who are listening is the process. You're going to assume the role, analyze the user's text with the scoring rubric, generate a report based on it, avoid common mistakes, and empower the user. So I'm going to copy this. And now we're going to go over to Google's regular Gemini, the consumer one, and to create one of those new Gems.

And I call this Gemini Sensitivity Reader. I'm going to paste in those instructions that we just made along with the scoring rubric that it just manufactured. I'm going to hit save. Now what we've done is we've created an app, right? We have, in less than 10 minutes, created an app, and I can now give this any piece of text and have it analyze it as to whether or not it is good or bad.

So I'm going to give it a piece of content here. And I'll say “Score this.” You'll note this prompt is two words because those system instructions were that were pages long are incorporated into this, this Gem, this app on the backside. 

To your question, yeah. For prompts, five, six pages. Is it just easy to do it yourself? Yeah, for a one-off task. Yes. But a sensitivity reader should not be a one-off task. A sensitivity reader should be an every-time task. The ability for you to say, "I'm about to write a blog post and write, write an email, I'm going to put up a social media post. Am I doing something that would be offensive?"

This one scored an 85, right? So it says diversity inclusion. It provides technological focus. It doesn't actively exclude any groups, but incorporating more diverse forces could enhance inclusivity, avoidance of stereotypes and harmful tropes. It successfully avoids that, but again, the article is not as inclusive as it could be.

So what we've done is we've built an app in 10 minutes that is reusable and that every writer should be using.

Mignon: Yeah, that's a great example of how, you know, it's so much more than just hopping ontoChatGPT and uploading a document and saying, "Do a sensitivity read on this." You know, bringing up sensitivity read, we should talk about, you know, I know a lot of people are really concerned about the biases in social media — not social media — in AI.

How it can, you know, it's trained on biased material. So it perpetuates biases. Like, how do you get around that? How do you approach that as a problem?

Christopher: Well, for a couple of, you know, put it in a couple of ways. Number one, you have to be aware of your own biases when you are creating content. If something reads okay to you, one of the questions should be “Am I biased?” The answer is yes. Everyone's biased. And if so, in what way does that show up in my approval of what this content is in general?

Anything that goes out to the public should never go out without some form of human review, especially if it is money, if it is legal, if it is health, right? Your money or your life. Anything high risk where if you, you know, you could get sued, you don't want machines talking to the general public. You have to understand the biases in the machine, the biases in the machine are mirrors of us.

So everything that you do or don't love about the human species is in these tools, and they can, and many of them have some level of safety mechanisms, but the reality is they're about as good as the safety mechanisms in your HR process. Which is to say you can still hire a jerk. And three, using safeguards, like having a Gem, for example, or a GPT that is a well-trained sensitivity reader where you can say, "Yeah, check my work."

It is impossible to build an AI that is unbiased. It just flat out impossible. It is just as impossible to raise a human being that is unbiased, but you absolutely can raise human beings who are aware of biases and can check themselves and you absolutely can do the same thing with AI.

Mignon: Great. Well, thank you so much, Chris. We're, I'm going to wrap up the main section here for our regular, um, listeners. And we'll. When we'll continue on the discussion for our wonderful Apple Podcast supporters and Grammarpaloozians. We're going to talk about job loss, the concerns about job loss, and a great blog post you had about how to make AI writing sound like you. But, to say goodbye to the big audience, you know, where can people find you? I imagine that after hearing this conversation, many of the people will want to follow you now. So where can they find you?

Christopher: You can find me for work at trustinsights.ai. That is my company. And if you want help with your AI efforts, we do that. Management consultants. If you want to read my stuff, you can find me at ChristopherSPenn.com.

Mignon: Super. Thanks so much, Chris. 

And I have one more announcement before we go. On Saturday, yes Saturday, Grammarpalooza subscribers are going to get another bonus episode. This time, it's an interview with Drew Ackerman, better known as Scooter from the Sleep With Me podcast. If you don't already know about it, Drew came up with the trippiest idea that became a huge hit. He bores people to sleep. He tells rambling, boring, and yes sometimes trippy stories that people put on to help them sleep or to "get through the deep, dark night," as Drew says. And Saturday, as is timely following today's show, you can hear Drew and I talk about the ways he is and isn't using AI to make his show. How the hallucinations that are usually such a problem for people, have actually helped him, but we also talk about our worries and the ways it hasn't been helpful. So if you're a Grammarpalooza subscriber, look for the Sleep with Me interview in your feed Saturday, and if you're not a subscriber yet, you can always check it out with a free trial on Apple Podcasts or Subtext. All the details are in the show notes.

That's all, thanks for listening.