AI in Clinical Practice: Effectively Using Large Language Models

Guest James Norton, BSN, RN, FPCNA, describes the use of AI in nursing practice, focusing on Large Language Models (LLMs). James shares how to effectively craft a prompt to get the results you need whether you are looking for information on clinical references or guidelines, or drafting appeal letters for denied prior authorizations, and the importance of reviewing AI outputs with a critical eye.

Episode Resources

Article: Artificial Intelligence: Opportunity for Positive Transformations in Cardiovascular Disease Management
CE Course: The Role of Artificial Intelligence in Cardiovascular Care: ATTR Case Study
CE Course: Artificial Intelligence: Leveraging AI for CVD Management

Transcript

I’m Yvonne Commodore-Mensah, Board President for PCNA. I’d like to welcome you to Heart to Heart Nurses. PCNA supports your professional journey with accessible continuing education, practical patient resources and a vibrant community that understands the unique challenges and rewards of cardiovascular nursing. Together, we’re advancing the knowledge that defines excellence in cardiac care while celebrating the difference you make every day.

Geralyn Warfield (host): (00:31)

Welcome to our audience to today’s episode on artificial intelligence. And James Norton, you are here to share your expertise with us. But first of all, could you share about your background, please?

James Norton (guest): (00:40)

Absolutely. Thanks so much for having me on. I’m James Norton, Nurse Coordinator at the Stanford Center for Inherited Cardiovascular Disease. I’m also our department’s Epic super user and through that sort of morphed into the AI educator role and I’ve been involved with all of the AI pilots at the point of care for cardiovascular medicine for the last two years now at Stanford.

Geralyn Warfield (host): (01:04)

So, there’s a lot of excitement and anxiety around using this kind of technology. And I’m hoping we could start off kind of with the basics and maybe talk a little bit about LLMs or large language models, first of all.

James Norton (guest): (01:17)

Yeah, that’s a perfect spot to start. There’s so much happening in the world of AI right now. And I think the most accessible point for cardiovascular nurses is the large language model. And those are basically the chat GPTs, the Claudes, Geminis, as well as the more medical-focused models like OpenEvidence.

And we, in the outpatient ambulatory cardiovascular nursing environment, end up relying on those tools now almost every day. We think of them sort of as adjuncts to our workflow, ways to accelerate, remove friction, and sort of ease some of the cognitive load of day-to-day tasks, both administrative and clinical.

Geralyn Warfield (host): (02:04)

So how does this look in practice?

James Norton (guest): (02:06)

So, I guess if you look at sort of a day in the life of us as a Nurse Coordinator, there’s about 4-5 areas where every single day we’re utilizing one of these tools. The first one that comes to mind is OpenEvidence. And that is free for providers and all clinicians. You just need to sign up with your NPI number. And if you’re not aware, RNs can sign up for an NPI number, so they have access to this tool as well. And I think that’s something that not a lot of nurses are aware of.

That has largely replaced searching in clinical reference databases, traditional databases for finding drug information, for finding current treatment protocols, clinical guidelines. It allows you to ask in a conversational manner for clinical recommendations.

We don’t typically use that for patient-specific questions, more just general guidelines.

And then at Stanford, we also have secure access to most of the other large language models, including everything from OpenAI and Anthropic and Google, all in an environment hosted by Stanford that’s safe for patient health information.

We also have integration of those tools within Epic, so it can use context from the patient’s chart when answering the question. And those tools, we’re drafting appeal letters for insurance, we’re crafting custom patient education. We are working on other aspects of the prior auth process, and also doing chart summarization and clinic prep.

Geralyn Warfield (host): (03:56)

So, could you give us a real-life example of how this looks?

James Norton (guest): (04:00)

Yeah, yeah, definitely. So, we are often dealing, we work in genetic cardiology, and we have these novel myosin inhibitor therapies, and it’s an almost constant stream of denials from insurance companies. So, one of the tasks that we perform on an almost daily basis is drafting appeal letters. This process has just been completely revolutionized by using large language models, especially when they have access to patient data.

So, we can put together, we can feed in the PDF, the denial. We can ask our large language model to analyze it and then draft an appeal letter using the context of the patient’s chart. And if we have a system that’s embedded within Epic or our EHR, that’s great. If not, if we have a secure large language model, we can copy and paste clinic notes in.

And that will take the process of writing a long appeal letter with all the clinical evidence down from something that would take well over an hour to something that can be done in about 10-15 minutes.

Geralyn Warfield (host): (05:08)

Sounds like a great savings of time and effort. I love that. We’re going to take a quick break and we will be right back.

Geralyn Warfield (host):

A lot of times, James, we hear the phrase “Garbage in, garbage out.” And when it comes to AI, what we get out is only as good as what we put in, and that is in reference to prompting structures. And I’m wondering if you could help us be better at not getting garbage out because we’re putting such great prompts in that we’re getting what we want.

James Norton (guest): (05:11)

Yeah, and that’s such a good point. There’s so much research now that shows that exactly that, the quality of the prompt largely dictates the quality of the output from your large language model.

When I’m speaking about prompting, that’s one of the most common questions that I get asked. I like to talk about formatting every prompt in four sort-of chunks. You have the role, you have the task, you have the context, and then you have the constraints. It gives you a clear framework for communicating effectively with a large language model.

It’s funny, when large language models first became popular, role, prompting, was talked about as being something that was going to be a career for people. There would be prompt engineers, and this would be an entire category, an entire new career path. I think now it’s becoming more and more clear that prompt engineering is a skill, and in healthcare, in a way, it almost becomes a clinical skill.

Being able to interact with AI is going to be, or even is now, as crucial as the ability to read an EKG, or to use a stethoscope, or use any of the other technology tools that are available to us in the clinical setting.

So going back to the structure, every time we write something into a large language model, if we first define what we are, so often, my prompts will start with something like, “You are writing in service of James Norton, a Nurse Coordinator at Stanford Center for Inherited Cardiovascular Disease, working with the attending physician, so and so.”

Then we’ll move into the task. You are, “We have to draft an appeal letter for a medication denial. I will be uploading the PDF.”

Then we move into the context section. And that context will be the patient’s chart, if I’m copying and pasting it in, the PDF of the denial letter that we received from insurance.

And then the final part of the prompt will be constraints. And that’s how we want it structured, what we should be including, shouldn’t be including. And what I encourage everybody to do is experiment within that category and then save things that work.

So, on my work computer, I have a whole catalog of prompts for all the common situations that I encounter every day that I can just copy and paste in. And then I’ve got sections where I can input patient data or upload PDFs of test results, whatever that might be.

Geralyn Warfield (host): (09:23)

So, when you said you’ve saved those, do you save those within a Word document, within an Excel sheet? What’s literally the nitty gritty of how do you categorize them, much like photos in a photo album? How does that look on your computer?

James Norton (guest): (09:35)

It’s traditionally always been in a Word document and that’s just my method. I’m sure it’s not the most efficient. I had a Word document with just prompt after prompt broken down into different categories.

But recently at Stanford, and I think many other institutions are doing this, they are allowing users to save favorites. So, we have one click buttons that we can assign, and there’s an infinite number of assignments that you can pick. So, we have sort of drop downs of all the different categories that we use that can recall our favorite prompts.

Geralyn Warfield (host): (10:09)

I love that. That’s a wonderful way to use technology to help you with technology. There’s one further thing I’d love for us to discuss a little bit more.

And one of my colleagues recently shared the verbiage of something that you can use currently, that’s realistic, versus what’s aspirational.

And I realize I’m asking you a loaded question because sitting in this room today, what’s realistic versus what’s aspirational when it comes to this is going to be different than three months from now, maybe even three minutes from now, but we’re going to talk about what we know in this moment. So, what’s realistic in terms of how you can use these and what’s a little bit more aspirational?

James Norton (guest): (10:48)

Yeah, that’s a great question. And it really, it makes you consider all of the pitfalls and guardrails that we should have on our use, and the way that we interact with large language models.

I think first and foremost, it’s important to know that these systems, these large language models, regardless of who’s making them, are largely sort of black box type systems. We don’t really know how they’re arriving at their conclusions. And in some ways, even the engineers aren’t 100% clear about why one prompt will give you one answer, but slightly different prompt will give you something completely different. And so just like humans in many ways, they are very imperfect.

And it’s important to always be aware of that when you’re interacting. You are always the safety net. These tools can accelerate workflows. They can crunch large amounts of data to make it much more consumable. But ultimately at the end of the day, you have to look at the output and verify it yourself.

And that also leads into something else that’s been noted more and more in research now. There’s a strong risk of de-skilling. If we overly rely on large language models, we now see that we start losing the skills ourselves. Something that we repeatedly automate with Chat GPT becomes harder and harder for our own brains to automate.

So that’s a, that’s definitely a big risk.

And I think that regardless of the tool, there are limitations as far as what it does when it doesn’t have all of the data necessary. This leads to hallucinations. Everybody talks about hallucinations and this happens regardless of the model. Even it’s even been noted in the purely medical ones, like OpenEvidence that sometimes it will just come up with something that we don’t know where it came from. The engineers don’t know exactly where it came from, but it’s not real. And that again goes back to us having to be that final safety net.

The other thing that I talk about a lot, and I think is really important to note, is that all of these models were trained off of data that we created. And humans are very imperfect. We’ve built large data sets, but many of those numbers are biased, whether it was sampling bias or whatever the case may be. And it was all of that historical data that we’ve created that these models were trained on. So, we see time and time again that it’s perpetuating biases that have existed in clinical care and, of course, outside of that.

And so, it’s up to nurses to look at all of this with a critical eye, look at every output, every prompt, and purposely work into their workflow. It’s up to nurses to be critical of every output and keep that in mind as they use these outputs in their day-to-day practice.

Geralyn Warfield (host): (14:15)

So, based on our conversation today, what one key takeaway would you like the audience to leave with?

James Norton (guest): (14:21)

I think the most important takeaway is that AI is here and if it’s not currently in widespread use at your center, it will be very shortly. And using AI is a critical part of maintaining patient safety. It’s a skill that every nurse needs to have and without knowledge of how to effectively use AI, the risk of patient harm goes up exponentially.

Geralyn Warfield (host): (14:52)

James, thank you so very much for being here today and talking us through AI and how it might affect our patients, how it might affect us and our workflows. And we’ll look forward to more conversations in the future about how things have changed.

James Norton (guest): (15:04)

Thank you so much for having me. It’s pleasure.

Geralyn Warfield (host): (15:06)

This is Geralyn Warfield, your host, and we will see you next time.

Thank you for joining us for this episode of Heart to Heart Nurses. We invite you to visit pcna.net for education and resources that will empower you to provide preventive cardiovascular care with confidence and expertise.

AI in Clinical Practice: Effectively Using Large Language Models

Episode Resources

Related Resources

The PCNA Community

Membership

Explore

Legal