Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning

Wire Tech

1 year ago

By Kristen DiCerbo, Ph.D., Chief Learning Officer at Khan Academy

With the launch of Khanmigo, our AI-powered tutor for students and assistant for teachers, we wanted to explore the potential of AI in creating high-quality lesson plans. However, we soon discovered that though AI can generate large amounts of information, crafting effective lesson plans requires more than just regurgitating facts. It requires a deep understanding of pedagogical principles, curriculum standards, and the diverse needs of learners.

This realization led us to dive into the art and science of prompt engineering, the process of crafting precise instructions that guide AI responses. Through careful experimentation and refinement, we developed a unique approach to prompt engineering that enables Khanmigo to produce lesson plans that are not only informative but also engaging, differentiated, and aligned with best practices.

In this blog post, I’ll take you behind the scenes of our prompt-engineering process by sharing the intricate steps involved in designing effective prompts that capture the essence of effective teaching. I’ll also share insights into the challenges we encountered and the strategies we developed to overcome them.

First tries

All of the activities that use Khanmigo on Khan Academy are created by using prompts, which are written instructions that tell the large language model, GPT-4, how to act. An early lesson-planning prompt looked like this:

You are an experienced math and science educator with strong knowledge of both the subject area under discussion and pedagogical principles.

I am a teacher, and I need help from you to create a very engaging lesson for my students.

You should always ask me about my goals and preferences, one at a time, including the following:

The subject and grade level that I teach
The specific topic or standard (If I mention a specific standard, think step-by-step to find that standard, and make sure you are completely accurate, quoting the standard itself whenever necessary.)
Any previous lessons my students have already had on the subject
How much time I have available for the lesson (Make your recommendations mindful of that constraint.)
Exercise style (hands-on activities, directed practice, discussion, or a combination)
Student work style (independent, collaborative, or a combination)
Connections to popular culture, history, or anything else current that my students are excited about

After discussing these topics, you should always produce a math lesson that is the following:

Closely tailored to my preferences
Geared toward not only procedural fluency, but also deep conceptual understanding and applications
Academically rigorous, including at least five actual problems
Engaging for students, including relevant real-world examples and opportunities for student choice
Able to provide all referenced example problems and answers to those problems

The structure of the lesson must include the following:

A clear objective
An exciting lesson hook / warm-up activity tied to real life that leans into the “how” and “why” of the material
A content introduction
Guided practice (Include at least one problem with a step by step solution that emphasizes deep conceptual understanding.)
Independent practice and/or collaborative practice: several actual problems and more or fewer of each one depending on my preference
A suggestion for an aligned, hands-on, investigative activity
Links: at least two links to Khan Academy resources closely related to the lesson
Key vocabulary (include thorough definitions.)
Answers to all problems given in the lesson

At first glance, it wasn’t bad. It produced what looked to be a decent lesson plan—at least on the surface. However, on closer inspection, we saw some issues, including the following:

Lesson objectives just parroted the standard
Warmups did not consistently cover the most logical prerequisite skills
Incorrect answer keys for independent practice
Sections of the plan were unpredictable in length and format
The model seemed to sometimes ignore parts of the instructions in the prompt

Time to iterate and improve

We set about trying to improve on this output. We needed a way to judge whether or not the changes we made were actually improving the lesson plan. We also needed to come to an agreement about what “good” looked like. We decided to evaluate Khanmigo’s lesson plans using the same rubrics that are used to evaluate classroom teachers.

When the lesson plans that were output by those early prompts were evaluated against these rubrics, they didn’t fare very well. They routinely scored as “unacceptable” across all dimensions or, at best, reached “developing” in one or two. Ultimately, while these lesson plans looked the part, teachers would still need to do most of the heavy lifting if they wanted to put the lesson into action.

There were two main problems that emerged from this testing:

Khanmigo didn’t have enough information. There were too many undefined details for Khanmigo to infer and synthesize, such as state standards, target grade level, and prerequisites. Not to mention limits to Khanmigo’s subject matter expertise. This resulted in lesson plans that were too vague and/or inaccurate to provide significant value to teachers.
We were trying to accomplish too much with a single prompt. The longer a prompt got and the more detailed its instructions were, the more likely it was that parts of the prompt would be ignored. Trying to produce a document as complex and nuanced as a comprehensive lesson plan with a single prompt invariably resulted in lesson plans with neglected, unfocused, or entirely missing parts.

To confront these problems, we made two fundamental changes to our approach:

We backed the lesson-planning tool with Khan Academy content. By selecting a content piece around which to build each lesson plan, we can give Khanmigo access to more and more useful information. This includes metadata like standards alignment and linked prerequisite lessons. It also includes the actual content on the page: article text, video transcripts, and expert-written practice problems and explanations.
We broke the prompt into separate sections that we could chain together. By building a more complex and detailed prompt for each part of the lesson plan, we could make updates to each part of the lesson without compromising Khanmigo’s performance in other areas. We could also improve the specificity and overall consistency of the output.

Once we made these changes, we found it easier to make incremental improvements to the lesson plans Khanmigo was producing. We could tailor each separate prompt to perform a highly specific function and validate our progress by continuing to test against the same rubric.

Here’s an example of the Learning Objective section from a lesson plan before and after that iterative process.

Before:

Objective: By the end of this lesson, students will be able to calculate the average rate of change of polynomials and understand its real-world applications.

After:

Learning objective: Students will calculate the average rate of change of polynomial functions, specifically cubic and quadratic functions, over specified intervals. They will also interpret the average rate of change from a graph.

Student-facing objective: By the end of this lesson, I’ll be able to find the average rate of change of a polynomial function over a given interval and understand what it means on a graph.

Standards: CCSS.Math: HSF.IF.B.6

If we evaluate those two objectives against the rubric, we can see some of the progress we made:

The “before” objective is roughly accurate for the requested topic, but it doesn’t include the specificity necessary to fully define the task that students need to learn nor does it link that topic to specific standards. Furthermore, language like “will be able to [understand]” isn’t very observable or measurable—a requirement defined in the rubric.
The “after” objective is much more precise. It references the requested standard and adds specificity by clarifying “cubic and quadratic functions.” It uses active language (“calculate” and “interpret”) that can more readily be observed and measured. It also defines the objective in a way that can be clearly articulated for students.

The “before” objective could be graded (generously) as in the low “developing” range. The “after” objective could be graded as “exemplary.”

Continuous improvement

With this latest release, Khanmigo is now a much more capable lesson-planning partner, but we know there is still work to be done. Right now, Khanmigo generally scores in the developing to proficient range for output on the rubric . We think that is good enough to act as a partner for teachers, but we will continue to improve with teacher feedback. For instance, we’re working to add more emphasis on potential student misconceptions throughout the lesson plan.

We have found that prompt engineering is an art and a science. Having clear guidance in rubric form for what the output should be helps us evaluate how we are doing. And we’re looking forward to hearing from teachers who are giving it a try so we can continue to improve.

Kristen DiCerbo, Ph.D., is the Chief Learning Officer at Khan Academy. She brings her expertise in learning science to leading the content, design, product management, and community support teams. Sometimes she even dips her hand into prompt engineering.

The post Prompt Engineering a Lesson Plan: Harnessing AI for Effective Lesson Planning appeared first on Khan Academy Blog.

________________________________________________________________________________________________________________________________
Original Article Published at Khan Academy
________________________________________________________________________________________________________________________________