24. Duke's Ahmed Boutar on AI Alignment: Ensuring Users Get Desired Results

Education

Nov 5

Episode 24 of Kinwise Conversations · Hit play or read the transcript

Episode Summary: The Strategic Shift from AI Output to Human Interpretation

We’re joined by Ahmed Boutar, an Artificial Intelligence Master’s Student at Duke University’s Pratt School of Engineering and a Graduate Research Assistant at the CREATE Lab. Ahmed brings a vital perspective on AI governance and alignment, the complex task of ensuring AI systems are built to be safe and equitable.

This episode delves into the urgent need for K-12 and institutional leaders to move beyond simple AI adoption toward a model of human oversight. Ahmed details his research on Human-Aligned Hazardous Driving and explains the critical difference between inner and outer alignment—sharing real-world examples of how flawed objectives can lead to harmful outcomes in everything from hiring software to educational tools.

For mission-driven leaders, this conversation is essential: Ahmed argues that the human role in the future of the workforce lies not in generation, but in interpretation and ethical judgment. We explore how to adapt curriculum to foster this skill, why transparency is the greatest guardrail, and how educators’ expertise is now more valuable than ever in mitigating AI’s innate biases.

Key Takeaways for Institutional Leaders

The Interpretation Mandate: Shift curriculum design to prioritize student interpretation, explanation, and verbal defense of ideas over simple text generation, ensuring students struggle mentally rather than outsourcing thinking.
Guardrails vs. Velocity: Governance demands leaders balance the speed of AI progress with the imperative to set robust policies and guardrails that ensure the technology is fair and safe.
The Flaw of Alignment: Systems are prone to misalignment (the difference between the intended goal and the system’s actual learned objective), requiring human expertise to audit models for unintended biases and discrimination.
Transparency as Policy: To mitigate risk in high-stakes areas (e.g., grading, hiring, loans), leaders must demand transparency in how AI systems make decisions and ensure diverse perspectives are involved in their design.
Maximize Human Expertise: The emotional, empathetic, and interpretive skills of educators are irreplaceable; their role is elevated to one of judgment and guidance, not replacement by automated tools.

Lydia Kumar: Welcome to Kinwise Conversations. Today we're talking with Ahmed Boutar, an Artificial Intelligence Master’s Student at Duke University, and a researcher focused on one of the most critical issues in the field: AI alignment and governance. Originally from Tunisia, Ahmed brings a diverse background and a rigorous engineering focus to the ethics of AI. Ahmed is working on projects to create more human-aligned systems, like one that studies human eye-gaze data to help autonomous vehicles better identify hazards. If you've ever wondered why AI sometimes delivers unexpected, or even harmful, results, you're in the right place.

Lydia Kumar: Hi Ahmed. Thank you so much for being on the podcast today. And I am really interested in talking with you about AI and its implications. But before all of that, I want to give you a chance to introduce yourself to our listeners. I think it really helps people to have a sense of who they're hearing from and what your background is and, who you are.

Ahmed Boutar: Thank you so much Lydia. This is very exciting to do a podcast, my first podcast. My name is Ahmed. I'm currently a master's student at Duke. I'm studying AI at the moment. And I come from Tunisia, a country in North Africa. I did my undergrad here in the US in computer science, studied Audio Music Engineering, something I'm very passionate about. And then I transitioned to AI after working a little bit, taking a gap year to think about some stuff and what I want to do in life. And AI was one of those things that I was like, this is really interesting to learn about. And that's how I came to be here, pretty much.

Lydia Kumar: Thanks so much for sharing that. I'm curious because you say AI was something that was interesting and you wanted to learn about it. Why did—what was the draw? Why did you want to learn about AI in particular?

Ahmed Boutar: So when in undergrad I took this class called Data Mining, and my professor was impressive, an impressive researcher, and he was able to predict the results of the elections with a pretty good margin of error. And he was also able to predict the spread of flu in New York City, in Manhattan. And at the time I was 19 and I was like, this is insane. How can you do that with just looking at data and boom, you just have a result. So I thought it was really interesting, and then I kept taking some other classes relating to AI and I wanted to know more. Because I had a lot of different interests, like I mentioned earlier, like audio, music, engineering, sound design. And so I never really made up my mind on what I wanted to do specifically. And graduate school is an opportunity for you to deepen what you wanted to learn about, and that's why I decided to go to do AI. And I'm very grateful and lucky to get into Duke, to this program, where I met the professors here and the faculty here, and one of my mentors, Dr. Ben. And it was a really amazing opportunity to learn more about AI. I guess it's a great introduction to AI because learning about the field takes years and I don't think anyone is necessarily a really big expert as the field progresses so quickly.

The Speed of Progress: Addressing Overwhelm and World Models

Lydia Kumar: I feel like every—I subscribe to a lot of newsletters about AI and then they fill my inbox all the time, and it's an overwhelming amount of information. And that's just updates about what's happening primarily, or things people are researching or studying. And I can't even keep up with my newsletters that come in my inbox. I did subscribe to a lot of newsletters, but it's a lot of information.

Ahmed Boutar: I feel that every day I wake up and there's 10 emails from 10 different newsletters, and each one is an article, has a link to an article for 30 minutes. I'm like, oh my god, if I'm gonna read all of this, the day will be over and I didn't know anything yet.

Lydia Kumar: Right. So it's like, okay, maybe I'll just dabble a little bit every day. And I think you continue to learn, and I think everyone—I think we all have some responsibility to have a little bit of an understanding about how this technology works, but it is overwhelming to have a deep and ongoing understanding because the field changes so, so much.

Ahmed Boutar: Yeah. And I think this is why your podcast is such a great way to do this because for a lot of people listening, having something going on in the background as they're doing different tasks, like cooking or walking, they could definitely learn a lot about this field, or even if not a lot, just have a general understanding of what's going on, because some stuff is important about what's going on currently in AI.

The Shift from LLMs to World Models

Lydia Kumar: From your perspective, what's something that is important that's going on in AI right now that you think people should be paying attention to?

Ahmed Boutar: That's a very good question. There's obviously a lot of different things happening in AI that are very exciting. I remember in the spring I attended this talk by Dr. Yann LeCun. He's a professor at NYU, used to be the Chief AI Officer at Meta. And he was talking about his research in World Models. And his argument is that currently LLMs (Large Language Models) that just deal with text and the way they work is just predicting the next token of what will be said, is not enough to capture the world. And a good analogy that he provides is a five-year-old kid has processed more information than the current best model that we have, because they use visuals, sound, all these things. And his research aims to predict the next state of the world given the parameters that we have at the moment. And I think it's very impressive. If we talk about general intelligence, I think that's the closest thing we'll ever have to general intelligence, at least at the moment.

AI Alignment: Inner, Outer, and the Problem of Objectives

Lydia Kumar: So I know that you've been doing some research on aligning AI systems with human driving behavior. And I'm curious if you could just tell us a little bit more about your project and your research and how that works.

Ahmed Boutar: Yeah, so this project mainly started with my good friend, Lennox Anderson, who's also in the AI program. Lennox is leading this project and his goal was basically if we include eye gaze data, where humans look at different objects, and you include that in a computer vision model, say like the one that Tesla uses. Are we able to detect hazards better or not? And kind of the big question here is, if we include eye gaze data into this, are we able to do better? Are we able to create a more humanly interpretable or explainable model that tells you why a hazard has been specifically labeled as a hazard or not? Mainly the goal here is aligning, is the human alignment part of it. How can we align these models to what humans would do, like would perceive in a sense?

Defining Inner and Outer Alignment

Lydia Kumar: Why do you think that's important?

Ahmed Boutar: AI is progressing at an incredible pace. It's actually insane how quickly it's progressing. But in AI alignment, there are two main things. There's the outer alignment part. If I'm training a model, I will give it an objective. And then outer alignment is basically how well the model is being able to follow that objective and how well that objective captures our goal. And inner alignment is ensuring that what the model has learned, what it actually wants to do, matches this outer objective. And there is usually a misalignment between both. Sometimes we don't know why, sometimes we know why, and even if we know why, sometimes we can't really fix it.

Ahmed Boutar: I'll give you an example. There's been this research that was done. They created a simulation and they gave an AI these tools and they told it, "Build me the structure that can move the fastest." So the AI here, what it did is it built the tallest structure possible so that when it falls, it reaches the highest speed possible. So technically, there's an alignment here with outer objective, which is build me something that moves the fastest. But there is a misalignment with the inner objective because we wanted something that can walk.

Lydia Kumar: Yeah. I mean, in that example specifically it's really easy to see how something that feels dangerous and potentially bad for people. It's not achieving the vision that the human researchers had.

Ahmed Boutar: Yeah. I mean it's like what, I think it was Amazon that tried to work on this project where they tried to speed up their own hiring. And they basically trained a model to predict whether this candidate will be a really good engineer. But given that the engineering domain is dominated by males, the model would predict that whenever you have a white engineering male between the age of XYZ, they're gonna be predicted as a good engineer. And so everyone else that does not fit this general picture will not be predicted as a good engineer. And that's the flaw in the training data because it was not diverse enough. And this tells you that we have to be very careful when developing AIs. We need all these different perspectives, diverse perspectives, and very careful planning and designing of AI. I think personally my opinion is it's not because we can develop something that we should do it.

Policy and The Human Role: Transparency in AI Adoption

Navigating the Alignment Problem as a User

Lydia Kumar: Most people who are gonna listen to this podcast are not going to be AI developers, but they're going to be users. How do you navigate this alignment problem?

Ahmed Boutar: For starters education, like knowing that this problem exists. Being educated about AI in general I think is something very important so that when we get an output from an AI model. I think just having that in mind and having the ability to question the output and not necessarily taking it as ground truth.

Lydia Kumar: If you're an education leader and you want to use these AI tools that seem useful, what are some things that you think leaders in these systems or individual educators should be aware of and pay attention to?

Ahmed Boutar: Their goal is obviously to maximize their students' learning. While these LLMs can be very helpful, there are some things that one should be mindful of, such as the AI compliancy, which is the tendency of AI to be overly agreeable with whatever you're talking about. So I think there's definitely that when generating answers or a curriculum. There is definitely some innate bias in the LLMs to actually help the users. As educators, they should be mindful that they cannot necessarily outsource all these tasks and just be there as a judge. I think educators can use these LLMs to customize the content to different students because everyone has a different type of intelligence. I think LLMs are a very useful tool, but educators need to also, you cannot replace them. They have to use them as a way to help them teach better.

Governance and Guardrails

Lydia Kumar: I want to talk a little bit now about just guardrails or governance that folks could put in place.

Ahmed Boutar: I think those kind of policies where you're actively protecting what humans can do versus what AI can do and creating this distinction between both is such a good step and an important step. I think the whole thing going from what AI is allowed to do, what are we allowed to train on, what are we allowed to produce, versus how do we do that when it comes to data centers, like energy and all that kind of stuff. I think those are very important policies that we should consider.

Lydia Kumar: If you were developing an AI model in your work in alignment, do you think there's specific policies that you see that are really important for developing aligned AI systems?

Ahmed Boutar: I think starting with transparency. If I'm developing an AI that would tell me whether or not I can be approved for a loan. I would like to know what factors went into it. Are we talking here about protected attributes? So, being transparent on why the AI made such a prediction is extremely important. And I think also including diverse perspectives when it comes to AI researchers. If all of them belong to a specific group, they will not be able to even think about how other people could be impacted by such a model. The issue is that how do you balance the speed of progress versus setting up guardrails without them necessarily clashing?

Ahmed Boutar: The cost of developing these models where you need huge data centers and a lot of compute, academia can barely keep up with these development at this point. So it's only a few companies that are able to develop these models, and it's very important to have that kind of diverse perspectives when developing these models.

Curriculum Design: The Shift from Output to Interpretation

Lydia Kumar: In those spaces (as a TA at Duke), where do you see AI being potentially useful as someone who's trying to support the learning of others, and where do you see it potentially creating gaps or reducing learning?

Ahmed Boutar: I think it's actually a little bit harder as an educator now that LLMs and AI is being used by a lot of students. We saw that a lot of people would just take it as is, which definitely is detrimental to learning. There is an MIT study that showed that the brain activity decreased with people using LLMs versus not using LLMs. If you're asking them to write code, it's okay for an LLM to write code, but the interpretation needs to come in your own words. It needs to be from you, and that kind of thinking is something that we should not outsource to LLMs per se.

Ahmed Boutar: We are currently working on in CreateLab under Professor John Reifschneider. We are creating this platform that is AI powered, and it allows you to generate all these quizzes based on your lecture materials and it will grade these quizzes for you. But then we realized that the learning happens only if you're able to restate your learning in your own words. We transitioned from a regular answering to the questions to you having to speak out your answers yourself. Now you have to actually sit down and with your own words, talk it out, and I feel like that gives you a better understanding of where your mistakes could be. I think it's important for educators to be aware of that and find a way where they can let the students do those tasks on their own. Educators need to do the same, where they can change the way their assignments are structured to not focus on just applying whatever they taught, but rather interpreting it. And that's where the human will never be replaced by a machine, I think.

The Ultimate Human Competency

Lydia Kumar: It's that ability to understand what's being generated or what you want to do, and then be able to talk about it with another person or explain it. And if you can't talk about what's been created, then you don't understand it.

Ahmed Boutar: And I think LLMs are impressive in writing, but I think some people have this way of writing that is so impressive that I think when you read it, you will know it's not an LLM because it's just so beautiful. And I think those are one of the things that an LLM cannot replace. Going back to educators, they're dealing with kids for years. I think the empathy and being able to read the people, you'll be able to tell if something is wrong with a student or not. Whereas an LLM will never be able to do that.

Lydia Kumar: What is a concern that you have and what is a hope that you have that you're really grappling with right now?

Ahmed Boutar: I'm a little bit concerned when it comes to the concentration of who can develop AI models and who cannot develop AI models. It could even increase the gap between people. But I'm also hopeful in the sense that it could potentially be the other way around, where later on it will help a lot of people. Using the different drug discoveries. What happened with COVID was impressive. There's a company called InstaDeep. They were able to predict the next COVID variant before it even happened. I'm very hopeful with the AI development in different space that does not necessarily relate to our daily use, like chatbots and stuff. But with drug discovery and scientific discovery, I think learning more about science and the world around us is such an interesting thing. But it all comes down to a fair and a responsible use of AI.

Connect and Resources

LinkedIn: https://www.linkedin.com/in/ahmed-boutar-6124ab175/
Website: https://ahmedboutar.com/
Human-Aligned Hazardous Driving (HAHD) Project: Ahmed is involved in this research-driven initiative with Onyx AI LLC, focusing on integrating eye-tracking technology to improve machine learning models for hazard detection in autonomous driving systems. https://www.linkedin.com/posts/lennoxanderson_onyx-ai-activity-7289675591140659200-_dxv/
CREATE Lab at Duke University: Ahmed is a Graduate Research Assistant at the Center for Research & Engineering of AI Technology in Education, where he works on Generative AI and LLM platforms. https://sites.duke.edu/createcenter/
Duke University: https://pratt.duke.edu/

About the Guest

Ahmed Boutar is an Artificial Intelligence Master’s Student at Duke University’s Pratt School of Engineering. He is a Graduate Teaching Assistant for the Explainable AI course and a researcher focused on AI Alignment and human-centered design. His work includes the Human-Aligned Hazardous Driving (HAHD) project, which uses eye-gaze data to build safer autonomous vehicle models. He previously served as a Software Development Engineer Intern at Amazon. Ahmed's expertise centers on practical application of AI ethics and system oversight.

Episode Transcript: Duke's Ahmed Boutar on AI Governance: Shifting Education from Output to Interpretation
Lydia Kumar: Welcome to Kinwise Conversations. Today we're talking with Ahmed Boutar, an Artificial Intelligence Master’s student at Duke University, and a researcher focused on one of the most critical issues in the field: AI alignment and governance. Originally from Tunisia, Ahmed brings a diverse background and a rigorous engineering focus to the ethics of AI. Ahmed is working on projects to create more human-aligned systems, like one that studies human eye-gaze data to help autonomous vehicles better identify hazards. If you've ever wondered why AI sometimes delivers unexpected, or even harmful, results, you're in the right place.
Lydia Kumar: Hi Ahmed. Thank you so much for being on the podcast today. And I am really interested in talking with you about AI and its implications. But before all of that, I want to give you a chance to introduce yourself to our listeners. I think it really helps people to have a sense of who they're hearing from and what your background is and, yeah, who you are.
Ahmed Boutar: Thank you so much Lydia. This is very exciting to do a podcast, my first podcast. So my name is Ahmed. I'm currently a master's student at Duke. I'm studying AI at the moment. And I come from Tunisia, a country in North Africa. I did my undergrad here in the US in computer science, studied Audio Music Engineering, something I'm very passionate about. And then I transitioned to AI after working a little bit, taking a gap year to think about some stuff and what I want to do in life. And AI was one of those things that I was like, this is really interesting to learn about. And that's how I came to be here, in a nutshell, pretty much.
Lydia Kumar: Thanks so much for sharing that. I'm curious because you say AI was something that was interesting and you wanted to learn about it. Why did—what was the draw? Why did you want to learn about AI in particular?
Ahmed Boutar: So when in undergrad I took this class called Data Mining, and my professor was impressive, an impressive researcher, and he was able to predict the results of the elections with a pretty good margin of error. And he was also able to predict the spread of flu in New York City, in Manhattan. And at the time I was 19 and I was like, this is insane. How can you do that with just looking at data and boom, you just have a result. So I thought it was really interesting, and then I kept taking some other classes relating to AI and I wanted to know more. Because I had a lot of different interests, like I mentioned earlier, like audio, music, engineering, sound design. And so I never really made up my mind on what I wanted to do specifically. And graduate school is an opportunity for you to deepen what you wanted to learn about, and that's why I decided to go to do AI. And I'm very grateful and lucky to get into Duke, to this program, where I met the professors here and the faculty here, and one of my mentors, Dr. Ben. And it was a really amazing opportunity to learn more about AI. I guess it's a great introduction to AI because learning about the field takes years and I don't think anyone is necessarily a really big expert as the field progresses so quickly.
Lydia Kumar: I feel like every—I subscribe to a lot of newsletters about AI and then they fill my inbox all the time, and it's an overwhelming amount of information. And that's just updates about what's happening primarily, or things people are researching or studying. And I can't even keep up with my newsletters that come in my inbox. I did subscribe to a lot of newsletters, but it's a lot of information.
Ahmed Boutar: I feel that every day I wake up and there's 10 emails from 10 different newsletters, and each one is an article, has a link to an article for 30 minutes. I'm like, oh my god, if I'm gonna read all of this, the day will be over and I didn't know anything yet.
Lydia Kumar: Right. So it's like, okay, maybe I'll just dabble a little bit every day. And I think you continue to learn, and I think everyone—I think we all have some responsibility to have a little bit of an understanding about how this technology works, but it is overwhelming to have a deep and ongoing understanding because the field changes so, so much.
Ahmed Boutar: Yeah. And I think this is why your podcast is such a great way to do this because for a lot of people listening, having something going on in the background as they're doing different tasks, like cooking or walking, they could definitely learn a lot about this field, or even if not a lot, just have a general understanding of what's going on, because some stuff is important about what's going on currently in AI.
Lydia Kumar: Yeah. From your perspective, what's something that is important that's going on in AI right now that you think people should be paying attention to?
Ahmed Boutar: That's a very good question. So there's obviously a lot of different things happening in AI that are very exciting. I remember in the spring I attended this talk by Dr. Yann LeCun. He's a professor at NYU, used to be the Chief AI Officer at Meta. And he was talking about his research in World Models. And his argument is that currently LLMs (Large Language Models) that just deal with text and the way they work is just predicting the next token of what will be said, is not enough to capture the world. And a good analogy that he provides is a five-year-old kid has processed more information than the current best model that we have, because they use visuals, sound, all these things. And his research aims to predict the next state of the world given the parameters that we have at the moment. So some of his researches, for example, they simulated these balls dropping like a pool table, and they will need to predict where those balls will be in the next state after a specific action is taken. And I thought it was very impressive. And if we're talking about AGI (Artificial General Intelligence), although I'm not necessarily sure I like the word yet, but if we talk about general intelligence, I think that's the closest thing we'll ever have to general intelligence, at least at the moment. And I think it's very impressive. It's an amazing way of dealing with research, where you've taken a model and you're given all these inputs, like visual, text, audio, and you're asking it to predict what's coming next in the world. Obviously the research is still nascent and it's very new and people are still working on it, but I think it's super, super interesting.
Lydia Kumar: I think as people, it can be really hard to make predictions just as an individual about what's gonna happen next, but when you have all these data points, then opportunities come alive. And this connects to your initial interest in AI when you were fascinated by your professor being able to predict the election results. So I see a connection there.
Ahmed Boutar: Yeah.
Lydia Kumar: So I know that you've been doing some research on aligning AI systems with human driving behavior. And I'm curious if you could just tell us a little bit more about your project and your research and how that works, and then I have some specific questions about that for you.
Ahmed Boutar: Yeah, so this project mainly started with my good friend, Lennox Anderson, who's also in the AI program. So Lennox is leading this project and his goal was basically if we include eye gaze data, where humans look at different objects, and you include that in a computer vision model, say like the one that Tesla uses, are we able to detect weather hazardous—are we able to detect hazards better or not? And the big question here is, obviously computer vision models have their flaws, and obviously you will not be able to detect every single hazard, but what we have already is pretty impressive. But if we include eye gaze data into this, are we able to do better? Are we able to create a more humanly interpretable or explainable model that tells you why a hazard has been specifically labeled as a hazard or not? This is the general gist of the research basically.
Lydia Kumar: That's really interesting because I think machines being able to know that a hazard exists is really a big part of being able to build self-driving cars, for them to be able to see, for the machine to be able to know what that is. Is that related? Is that one of the purposes behind the research?
Ahmed Boutar: Yes, to some extent, but mainly the goal here is aligning, is the human alignment part of it. How can we align these models to what humans would do, like would perceive in a sense? I mean, at the end of the day, these models will use some sort of parameters to predict whether something is a hazard or not a hazard, but for you as a human being who's taking all of these inputs at once and seeing everything that's happening in the world, you may have a different opinion than the Tesla. And the Tesla here, I'm using it as an example. But a Tesla could also detect something that you would not detect, because, for example, if it's not in your range of where you can see as a human being. And so the goal here is the human alignment part. How can we align it to human beings? Which is a huge subject and I think it's one of the most interesting fields for me personally in AI at the moment.
Lydia Kumar: Why do you think that's important?
Ahmed Boutar: I don't want to be one of those AI doomers who say AI is gonna destroy all of us, but AI is progressing at an incredible pace. It's actually insane how quickly it's progressing and how we're able to use it in all these different domains. But in AI alignment, just to give some context, there are two main things. There's the outer alignment part. If I'm training a model, I will give it an objective. And then outer alignment is basically how well is the model being able to follow that objective and how well that objective captures our goal. So, basically ensuring that the objective reflects the designer's actual goals. And inner alignment is ensuring that what the model has learned, what it actually wants to do, matches this outer objective. And there is usually a misalignment between both, at least for a lot of the cases. Sometimes we don't know why. Sometimes we know why. And even if we know why, sometimes we can't really fix it. So I'll give you an example. There's been this research that was done. They created a simulation and they gave an AI these tools and they told it, "Build me the structure that can move the fastest." Any structure, just boxes on top of each other, sticks, it doesn't matter, as long as it can move the fastest. So the AI here, what it did is it built the tallest structure possible so that when it falls, it reaches the highest speed possible. So technically, there's an alignment here with outer objective, which is build me something that moves the fastest. But there is a misalignment with the inner objective because we wanted something that can move, that can walk. The AI has achieved the outer objective by making it move really fast as it's falling down. It's this huge piece with a massive head. So when it falls, it falls really, really fast. But then it's not really what we humans or the researchers wanted done, because we wanted something that can walk, that can move fast, that can run fast, or whatever. And this is the general issue with AI alignment.
Lydia Kumar: Yeah. In that example specifically, it's really easy to see how something that maybe researchers could see this could be useful, but just a giant building falling down feels dangerous and potentially bad for people. It's not achieving the vision that the human researchers had. And it's also, if this was out in the world, it would be really dangerous.
Ahmed Boutar: Yeah. I mean it's like what, at the time, I think it was Amazon that tried to work on this project where they tried to speed up their own hiring. And they basically trained a model to predict whether this candidate will be a really good engineer. But given that the engineering domain is dominated by males, and a lot of times in the US with white males, the model would predict that whenever you have a white engineering male between the age of XYZ, they're gonna be predicted as a good engineer. And so everyone else that does not fit this general picture will not be predicted as a good engineer. And that's the flaw in the training data because it was not diverse enough. And this tells you that we have to be very careful when developing AIs, which we're getting better at. But we need all these different perspectives, diverse perspectives, and very careful planning and designing of AI. I think personally my opinion is it's not because we can develop something that we should do it. But that's very debatable because some people are very excited about developing awesome technology.
Lydia Kumar: And it's, I mean, it's very fun to think about all the things that you can do with AI that weren't possible. And I think as people, we just love to make stuff and figure out what's possible. And to your point, maybe just because we can do something doesn't mean we should, but I think it's hard to have that self-control when as a society, we, I think there's this desire to see what's possible and to explore. But our exploration throughout history has not always been beneficial to humanity in the long run.
Ahmed Boutar: Yeah. But the awesome part is that now the world is so well connected that everyone can have an opinion and people can talk about this so that these companies are becoming more mindful of what's happening and they will not deploy these models if they can cause different harm to people or if they can discriminate against people. Obviously a lot of work still needs to be done, but we're making some steps in the right direction.
Lydia Kumar: Something I'm curious about, most people who are gonna listen to this podcast are not going to be AI developers or have any part in creating the models, but they're going to be users. And so what as a user, how do you navigate this alignment problem?
Ahmed Boutar: That's a very good question. For starters education, like knowing that this problem exists. Being educated about AI in general I think is something very important so that when we get an output from an AI model. I guess most people will be interacting mainly with LLMs. I think those are the OpenAI's models, Anthropic, Perplexity, all these different ones. And in Europe, all of these are mainly being used—these are the models that are mainly being used by people. And I think just having that in mind and having the ability to question the output and not necessarily taking it as ground truth. Other than that, it's just educating yourself. I'm not really sure what else just a regular user can do, at least for now, but you never know. Maybe someone gets educated, they get these ideas, they pursue it, and they become someone who can figure out the solution to these problems. You never know. There's many amazing, talented individuals who just haven't had the chance to dabble with all these things yet, but I think education is definitely one of the primary things that someone can do.
Lydia Kumar: Yeah, that's a really good point. I'm curious, a lot of my listeners are educators and people who are either leading school districts or people who are in classrooms. And right now in education, there are a lot of AI tools being created for student learning because we—there is student learning data, that people can use to then predict what they should teach students or what students should learn next. And then there's questions about how to teach students and what age students need to be taught these things. Because kids start going to school when they're five years old and then continue through that. And so if you're an education leader and you want to use these AI tools that seem useful, what are some things that you think leaders in these systems or individual educators should be aware of and pay attention to when they're using maybe a tool that has an LLM working? So it's not a chatbot, but a generative AI running in the background to make this new type of software possible. Or just teaching students in general, or thinking about how learning happens.
Ahmed Boutar: I think first of all, I think educators. Their goal is obviously to maximize their students' learning, to ensure that every single student in the classroom has learned properly the material that was taught to them. And that's the general objective. And while these LLMs can be very helpful with that, and I think there are very useful tools in terms of learning. There are some things that one should be mindful of, such as the AI compliancy, which is the tendency of AI to be overly agreeable with whatever you're talking about. And I think most people have experienced this, where whenever you ask an LLM a question, they would respond in a very agreeable way, amicable way. And it's like, go you, absolutely! Even if you point out an error, like, "You are absolutely right!" And I'm like, "If I was right, why did you do it like this in the first place?" So I think there's definitely that when generating answers or a curriculum or any sort of educational content. There is definitely some innate bias in the LLMs to actually help the users because that's why they are, like, chatbots are just helpful chatbots. If not, then no one would use them. And so as educators, they should be mindful that they cannot necessarily outsource all these tasks and just be there as a judge to the different answers that they're receiving. And also, for example. If you're just given an AI the objective of, "I want to maximize the student's learnings," most of the time it's very hard to capture what learning means unless you're using different exams and stuff like that. So we're gonna be using some proxy metrics like engagement, increase the student satisfaction, the user satisfaction, increased this and that. But it may not necessarily correlate to the actual deep understanding of a subject specifically. And so I think educators can use these LLMs to customize the content to different students because between even you and I as students, we learn completely differently. Everyone has a different type of intelligence. Some people are visual learners, some people learn better by listening. And I think these LLMs provide us with that opportunity to cater the content to these students specifically. I think LLMs are a very useful tool, but educators need to also, you cannot replace them. They have to use them as a way to help them teach better and to cater to their students' needs in a better way.
Lydia Kumar: Earlier this year, maybe March, I was leading a professional development session for some educators, and we were just talking about some of the ways that AI could save them time in their practice, and there was definitely resistance in the space because they felt very concerned about AI creating the perception that teachers weren't useful anymore. I think in the United States in particular, educators haven't always been appreciated, especially in the K through 12 space, as professionals and the complexity of the job hasn't always been recognized. And because of that, these individuals were like, "Oh, I'm really,"—these teachers were very uncomfortable with the technology because they were like, "Is this gonna make people think that we're even less useful?" And I think, in the example you're giving, it's like, actually your expertise is even more useful, because if you just rely on a machine and the system, you could create outputs or create results that could potentially be harmful for students and not in their best interest. And so it's really important for our educators to bring their expertise to the table and be sharp about how they evaluate outputs, because while it's useful, it's not perfect.
Ahmed Boutar: Yes. 100% agree. And I think as humans, obviously I still do not know much and I do not know what I do not know, but from what I've learned so far is that as humans, we will not stop AI's progress. AI will keep progressing and it will keep advancing. And as humans, we mainly have two main options. It's either we adopt it, we try to increase our productivity and be better at what we are doing—there's always a way to do that—or else we try to regulate it somehow. And that's when you get into policy and people doing that kind of work that is currently really awesome, especially what's happening in Europe. But I think in the case of educators, there are so many different things that could take so much time that LLMs could make it really fast and really easy to make, even by learning how to prompt an LLM better, you can get a way better response and a completely different response. Some people have posted different articles, including, for example, Neila, who's an amazing AI researcher in AI alignment, where you prompt an LLM saying, "I've read this somewhere and this person claims this, this, and that. I think it's wrong. Can you point out why it's wrong and it's your own thoughts?" And now you can deal with the AI compliancy a little bit better, and so you can have someone judge your work as you're working through it, and maybe you can see gaps that you have not seen before. So in the case of educators, maybe they can see, "Okay, I should have taught this subject after this subject." Or maybe they can just take the LLM output and they can just take the specifics that they like and they augment their work instead of replacing it. And I think these fears are completely in their place because if you take the example of LLMs. They're fantastic. They do work that is incredible. Even now as an engineer, sometimes I'm like, "Wow, I think will I have a job in five years from now?" So I think it's a very fair fear, and yeah.
Lydia Kumar: Yeah, it's a fair fear, and I think that there are things, I think that there's aspects of work that humans are just always gonna be better at doing, but maybe we're still trying to figure out exactly what those things are, even though I think a lot of people have ideas about the aspects of work that humans are better at some things, and maybe machines are better at other things. I think my dishwasher is actually much better at washing the dishes than me, so it'll probably continue in that way. Okay, so right now you're working as a TA in some classes at Duke, and so from that perspective, you're a student, but you also have this eye into what it's like to be an educator. And so in those spaces, where do you see AI being potentially useful as someone who's trying to support the learning of others, and where do you see it potentially creating gaps or reducing learning from that educator lens than the student lens?
Ahmed Boutar: I think, so I think as an educator, I think it's actually a little bit harder as an educator now that LLMs and AI is being used by a lot of students. For example, a lot of times when I'm grading assignments, when I look at an answer, I'm like, "This is clearly AI generated answer." And when we talked with a professor, Dr. Ben's Explainable AI class, which is funny because this is about explainable AI. You see a lot of output that is being submitted, that is just purely LLM generated. And Dr. Ben has made the decision that you can use an LLM because it boosts productivity and it doesn't make sense for someone nowadays to just learn everything from scratch while you can have an LLM help you through the learning. But what we saw was a lot of people would just take it as is, which definitely is detrimental to learning. I mean, there is an MIT study that was published I think last summer that showed that there is—they just took, I mean, it's a very small study, I think it was only 54 participants, and they showed that the brain activity decreased with people using LLMs versus not using LLMs. And so because we're just outsourcing all these thinking to these machines. But when it comes to the students here, I feel like it's important for educators to be aware of this, and if you're asking them to write code, it's okay for an LLM to write code, but the interpretation needs to come in your own words. It needs to be from you, and that kind of thinking is something that we should not outsource to LLMs per se. Producing stuff, sure. But thinking about it, interpreting it, is where the learning happens. And we are currently working on in CreateLab under Professor John Reifschneider, who's the director of the AI program. And we are creating this platform that is AI powered, and it allows you to generate all these quizzes based on your lecture materials and it will grade these quizzes for you. So the students will fill out these quizzes, but then we realized this may not necessarily be enough, because sure, you can fill out this quiz and the LLM can give you your own feedback, and you can learn from that. Then after we surveyed some people and talked to them, we realized that the learning happens only if you're able to restate your learning in your own words. And so now we're still seeing how this works, but we transitioned from a regular answering to the questions to you having to speak out your answers yourself. So the LLM would just generate the question, it will read it for you, and now you have to actually sit down and with your own words, talk it out. And I feel like that gives you a better understanding of where your mistakes could be. Because if you're just sitting down, at least for me, if I'm sitting down, I'm doing a problem, now there's this default system in my mind, "Go to the LLM, go to ChatGPT, Claude, they have the answers." But if you sit down and try to think of something and try to formulate and have this mental struggle to come up with an answer and then get the feedback. I think that's where the learning could really be improved. And so currently in the day and age where all students will use LLMs, and I don't think it's wrong in any way. I think it's important for educators to be aware of that and find a way where they can let the students do those tasks on their own, have that experience. Because in the workforce, even when I worked at Amazon, there's a lot of code where we use an LLM to generate it and it's just fine. But as engineers, we're gonna look at it and we are gonna ensure that it makes sense and that it works and that it's fine. And so I think educators need to do the same, where they can change the way their assignments are structured to not focus on just applying whatever they taught, but rather interpreting it. And that's where the human will never be replaced by a machine, I think.
Lydia Kumar: Right. It's that ability to understand what's being generated or what you want to do, and then be able to talk about it with another person or explain it. And if you can't talk about what's been created, then you don't understand it. And so this ability to just talk about what we're learning feels like—I think writing became a way for people to express what they had learned, and that feels like it's being replaced by people having to speak about what they've learned, because an LLM can't speak in your voice in person for you in the way that you could outsource your writing to an LLM.
Ahmed Boutar: Yeah. And I think LLMs are impressive in writing, but I think as much as we want to say, it's so easy now to generate amazing text, but I think some people have this way of writing that is so impressive that I think when you read it, you will know it's not an LLM because it's just so beautiful. And I think those are one of the things that an LLM cannot replace. And going back to educators, they're dealing with kids for years. I think the empathy and being able to read the people and being able to know when the student comes into class and you've seen them every day. You'll be able to tell if something is wrong with a student or not. Whereas an LLM will never be able to do that just because I think you're a human.
Lydia Kumar: Yeah. And as a human, we, I think there is something about being with another person or writing authentically or all of these things. I think there's something really, you can feel the humanity—maybe as people, there's this spiritual or some element to us that allows us to connect with people in a way that I don't know if a machine will be able to do. Or I hope that we get to hold onto that.
Ahmed Boutar: Yeah, I sure hope so too. I mean, you see a lot of issues rising up with people developing emotions for chatbots, and you can't really blame them because chatbots are so realistic. But I just hope we don't get to the point where AI becomes so good at it to the point where. You can't really make a difference between a human and an AI. I mean, that's I think what makes us human is that the empathy, and when you sit down with a human, obviously the human touch and all of that, and I think I sure hope AI will never be able to catch up to that and replace it.
Lydia Kumar: I want to talk a little bit now about just guardrails or governance that folks could put in place. You mentioned policy and what's happening in Europe, so I thought maybe we could start with you just sharing some of the things that you see happening that are most exciting around this. And then we'll go from there.
Ahmed Boutar: I think there's been, in the past couple days there's been this huge issue happening. Was it with Hollywood, like in California, where some people—I always forget the names, there's just so much information coming—where they wanted to license the image of this AI actor. And California, I think, was it the government that banned it from happening and it's like, "It's never gonna happen again." And I think those kind of policies where you're actively protecting what humans can do versus what AI can do and creating this distinction between both is such a good step and an important step. And I think policy can range. Obviously, I don't know much about policy unless I read it, so I'm very reactive towards that. If I read it, I'm like, "This is good." If I read it, I could say, "This is bad." But the whole thing going from what AI is allowed to do, what are we allowed to train on, what are we allowed to produce, versus how do we do that when it comes to data centers, like energy and all that kind of stuff. I think those are very important policies that we should consider. So going to, for example, some of the most impressive things that I've seen when it comes to governance comes from Anthropic when they created this charter for AI governance. It's a series of points where they outline the code of conduct of Claude. And I think just having this general governance of what can we do, what can we not do, is really important going forward with all these companies and AI development in general. Because, like, building an AI to predict something that should not be predicted, for example. Like, we've had a lot of issues with recidivism, like AI that's predicting recidivism, and having some policies around that where you require transparency with these models that have such a high stake is extremely important in general. So those are the policies that I really like personally. Just transparency in general.
Lydia Kumar: Yeah, I feel like understanding how models work and what they're trained on and what they should and should not do, can help you make better choices as a user, going back to that piece. Are there specific things that if you were developing an AI model in your work in alignment, do you think there's specific policies that you see that are really important for developing aligned AI systems?
Ahmed Boutar: Yeah, that's a good question. Again, my opinion is limited because I'm still learning about the field. But I think starting with transparency. If I'm developing an AI that would tell me whether or not I can be approved for a loan. I would like to know what factors went into it. Are we talking here about protected attributes such as my gender, my age, my ethnicity? If those went into predicting whether or not I'm able to get a loan, then I should be able to say, "No, I don't like this. This is not right." So, being transparent on why the AI made such a prediction is extremely important. And I think also including diverse perspectives when it comes to AI researchers, when they're developing an AI. So for instance, if you take a company developing a model to, for example, in the case here of predicting a loan, if all of them belong to a specific group, they're all of the same gender and they're all of the same ethnicity, then they will not be able to even think about how other people could be impacted by such a model. And I think including all these diverse views and diverse opinions, either among the developers or having external overseers that are able to give their own inputs towards this is extremely important in the development. The issue is that how do you balance the speed of progress versus setting up guardrails without them necessarily clashing? And I think that's the big question of what's happening currently in the AI landscape, in my opinion.
Lydia Kumar: As you were talking, I was thinking about the small number of people who develop these models compared to the huge number of people who use them. And so we there's a limited supply of people who can develop this technology and there is an almost unlimited number of people who want to use it. And so creating systems that are usable and fair to the masses, when the people creating these models are a pretty small group in comparison. So it makes I think really important for what you were saying about diversity of thought and awareness, because there's real impacts for people. And then for all of us as users, I think understanding what is important for us to demand in terms of transparency, or how do we hold our own destiny, our own autonomy when there are systems who can make decisions like whether or not you get approved for a loan in a way that may or may not be fair or equitable.
Ahmed Boutar: Yes, 100% agree. And I think I read this in a book, it's called Supremacy. But they're talking along the lines of, between 2012 when there was really this uptick in AI development due to the advancement in computer vision and the creation of dataset. This competition on Stanford by Fei-Fei Li. It's I think something went around from 48% or 60% of AI was owned by a few companies to now over 90% being owned by only a few people are developing these models. And because of the cost of developing these models where you need huge data centers and a lot of compute, academia can barely keep up with these development at this point. So it's all these companies, only a few of them that are able to develop these models, and it's very important to have that kind of diverse perspectives when developing these models. And I think there's a lot of work to be done in that sense. So yeah, I completely agree with you and I hope AI can be fair going on in the future, and we can keep that in mind as we're developing these models.
Lydia Kumar: Yeah, I think in the education space it's really important too, because decisions could be made about what a student learns or how a student is graded, that have impacts that may not necessarily be equitable. Like, this is an older study that I read about automated AI grading on writing graded people differently, and it ended up being that certain writing styles were privileged while others weren't, and they weren't necessarily like one was bad and one wasn't. It was just the way the AI model understood or had been trained to evaluate good or bad writing, but then a human grader would see something totally different. So if you think about college applications and if you had an AI grading that, or evaluating that, it may not be the same as a human. And so then all of a sudden you're waiting toward a certain type of candidate that may not actually be as qualified or have written as well. They're just writing in a certain way that is appealing to the model.
Ahmed Boutar: Exactly. I mean, at the end of the day, if we're talking about LLMs, all they're doing is they're just using—they're really good statistical models and they will tell you what is the best token to predict what is the best next word with this, given all the context that I have. So let's take an extreme example of a student who's been homeschooled, doesn't really have access to internet and all these things to post his writing or her writing. And they're impressive at writing, but their style has never seen the world—the world hasn't seen their style. And these models have been trained on all these writing styles except for theirs. Now they write the most beautiful college essay or essay in general. This LLM that will grade this essay may not necessarily give it a good grade because it hasn't seen this style of writing before, so it doesn't fit its training data. So it has no way of really understanding the relationship between the words, and to it, it's just gonna be something that is not necessarily good writing. And so those are things that we should be mindful of and educators should not necessarily trust 100% what an LLM's judgment. It's good to use it, but you cannot outsource the thinking for it at all. You just have to keep a balance between both. And that's why recently, not recently, I mean it's been a thing for a few years now, but a lot of people use this thing called LLM as a judge, where you take an LLM's output and you give it to another LLM, you tell it, "Judge it in a way," and now you have these LLMs judging each other to have the most fair answer. This is the best we can do right now, as far as I know.
Lydia Kumar: Yeah. It's funny because I think when I, I feel like a lot of people I talk to about AI, just regular people, are just like, "Soon the students are gonna make AI essays that are gonna be judged, graded by AI, and we're gonna submit AI generated job applications that are gonna be judged by AI." And so there's this sense of, what happens when we're just—how do we stop that as people, or how do you move toward a world where we're still creating some balance there? And I think it is keeping the human involved, because I think there's examples even right now where we're seeing people removing themselves from the equation on both sides, and then we've sort of lost the purpose of what we're all showing up to do in the first place.
Ahmed Boutar: Yeah. I mean, the AI is advancing way too fast where we barely are able to keep up, but hopefully one day we'll find a good balance. We're all learning as humanity what an AI is doing. We don't really know how an LLM—why an LLM gives you a specific output. We don't know why hallucinations are a thing, and yet. We're learning about it, we're learning how to treat it, we are learning how to behave and use it properly and responsibly. So I think it's an ongoing effort for sure. And I think having these questions and this medium, like this podcast where people can educate themselves and learn more, or allow people to ask more questions. Maybe we won't find the balance this year or the next one, and maybe someone will stumble upon these educational materials and decides to pursue this field, and somehow they'll be one of those people that introduce something seminal that would create a balance in itself. So you can only be optimistic here.
Lydia Kumar: Well, maybe that person will be you.
Ahmed Boutar: I sure hope so.
Lydia Kumar: Okay. I'm gonna ask you my last question. I always end on this, and that is for you, when you think about AI, what is a concern that you have and what is a hope that you have that you're really grappling with right now?
Ahmed Boutar: That's a very good question. I think I'm a little bit concerned when it comes to the concentration of who can develop AI models and who cannot develop AI models. And the people feeling taken aback by how much it's evolving fast and how maybe they have no chance of entering into the space in general. And so it could even increase the gap between people. But I'm also hopeful in the sense that it could potentially be the other way around, where now it's increasing the gap and later on it will help a lot of people. Using the different drug discoveries. What happened with COVID was impressive. We were able to deal with a global pandemic within a few years, like two to three years. That was impressive. And that's all because we had an AI. There's a company called InstaDeep. It's based out of London. They were able to predict the next COVID variant before it even happened. And so people were able to proactively approach this stuff. So I'm very hopeful with the AI development in different space that does not necessarily relate to our daily use, like chatbots and stuff. But with drug discovery and scientific discovery, I think learning more about science and the world around us is such an interesting thing. And that's what we're really good at as humanity. And I think just thinking about that stuff makes me very optimistic, honestly. But it all comes down to a fair and a responsible use of AI. Yeah.
Lydia Kumar: Yeah, absolutely. There's so much potential, but there's also a really big need for us to be super, super careful too.
Ahmed Boutar: 100%, yes.
Lydia Kumar: Okay, I'm gonna go ahead.
Lydia Kumar: 📍 That's a wrap on our conversation with Ahmed Boutar, a brilliant researcher who is asking the right questions about how we steer the future of AI. I have a few key takeaways from our discussion that I want to share.
- You Can't Outsource the Thinking: The most critical role of an educator is to ensure that students move beyond simply generating output with an AI to interpreting it, explaining it, and wrestling with the material in their own words. That is where deep learning occurs.
- The Problem is Alignment, Not Just Progress: Ahmed's work on AI alignment proves that the challenge isn't just making AI powerful, but making sure its objectives—both inner and outer—are safe, equitable, and truly reflective of human goals and values.
- Demand Transparency and Diversity: To ensure fairness in systems that impact high-stakes decisions like loans or job applications, we, as users, must demand transparency in how models work. Developers in turn must prioritize diverse perspectives to prevent unintended harms and discriminations.
To learn more about Ahmed's work, check out the links in today's show notes to his LinkedIn and his research on Human-Aligned Hazardous Driving.
If your district or organization is ready to tackle these systemic issues and build a policy framework, consider the Kinwise AI Leadership Lab. This customized experience helps leaders align AI with their mission, draft actionable policies, and create a roadmap for responsible implementation.
And finally, if you found value in the podcast, the best way to support the show is to subscribe, leave a quick review or share the episode with a friend. It makes a huge difference.
Until next time, stay curious, stay grounded, and stay Kinwise.

K-12Human-Centered AIAI GuidelinesAI in EducationAIAlignmentEducationAIK12LeadershipAIinschoolsEthicalAIInterpretationSkillsKinwiseConversationsAhmedBoutar

Lydia Kumar