{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# 보상함수 예제" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " Unit 03. 보상함수 예제\n", " Section 08. 보상 함수\n", "

" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "

\n", " PPT\n", "\n", "

" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "

\n", " Video\n", " \n", " 🚀 프리미엄 과정 \n", " \n", " \n", " 인프런\n", " \n", "

" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "```{important}\n", "- 영상은 **크롬 웹브라우저**를 사용할 것을 적극 권장합니다.\n", "- 영상은 **프리미엄 과정**에 등록된 구글 계정에서만 보입니다.\n", " - 프리미엄 과정에 등록되었는데도 영상이 보이지 않나요? \n", " - 비디오 재생문제 해결하기\n", "- **Q&A 댓글 달기** : 영상 우측 상단 `새 탭에서 보기` 버튼을 클릭하면 영상에 질문 댓글을 작성할 수 있습니다.\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Resources" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Example 1\n", "- Time trial - follow the center line\n", "```python3\n", "def reward_function(params):\n", " '''\n", " Example of rewarding the agent to follow center line\n", " '''\n", " \n", " # Read input parameters\n", " track_width = params['track_width']\n", " distance_from_center = params['distance_from_center']\n", " \n", " # Calculate 3 markers that are at varying distances away from the center line\n", " marker_1 = 0.1 * track_width\n", " marker_2 = 0.25 * track_width\n", " marker_3 = 0.5 * track_width\n", " \n", " # Give higher reward if the car is closer to center line and vice versa\n", " if distance_from_center <= marker_1:\n", " reward = 1.0\n", " elif distance_from_center <= marker_2:\n", " reward = 0.5\n", " elif distance_from_center <= marker_3:\n", " reward = 0.1\n", " else:\n", " reward = 1e-3 # likely crashed/ close to off track\n", " \n", " return float(reward)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Example 2\n", "- Time trial - stay inside the two borders\n", "```python3\n", "def reward_function(params):\n", " '''\n", " Example of rewarding the agent to stay inside the two borders of the track\n", " '''\n", " \n", " # Read input parameters\n", " all_wheels_on_track = params['all_wheels_on_track']\n", " distance_from_center = params['distance_from_center']\n", " track_width = params['track_width']\n", " \n", " # Give a very low reward by default\n", " reward = 1e-3\n", "\n", " # Give a high reward if no wheels go off the track and\n", " # the agent is somewhere in between the track borders\n", " if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:\n", " reward = 1.0\n", "\n", " # Always return a float value\n", " return float(reward)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Example 3\n", "- Time trial - prevent zig-zag\n", "```python3\n", "def reward_function(params):\n", " '''\n", " Example of penalize steering, which helps mitigate zig-zag behaviors\n", " '''\n", "\n", " # Read input parameters\n", " distance_from_center = params['distance_from_center']\n", " track_width = params['track_width']\n", " abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle\n", "\n", " # Calculate 3 marks that are farther and father away from the center line\n", " marker_1 = 0.1 * track_width\n", " marker_2 = 0.25 * track_width\n", " marker_3 = 0.5 * track_width\n", "\n", " # Give higher reward if the car is closer to center line and vice versa\n", " if distance_from_center <= marker_1:\n", " reward = 1.0\n", " elif distance_from_center <= marker_2:\n", " reward = 0.5\n", " elif distance_from_center <= marker_3:\n", " reward = 0.1\n", " else:\n", " reward = 1e-3 # likely crashed/ close to off track\n", "\n", " # Steering penality threshold, change the number based on your action space setting\n", " ABS_STEERING_THRESHOLD = 15 \n", " \n", " # Penalize reward if the car is steering too much\n", " if abs_steering > ABS_STEERING_THRESHOLD:\n", " reward *= 0.8\n", " return float(reward)\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Example 4\n", "- Object avoidance and head-to-head - stay on one lane and not crashing (default for OA and h2h)\n", "- [Link](https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-reward-function-examples.html#deepracer-reward-function-example-3)\n", "```python3\n", "def reward_function(params):\n", " '''\n", " Example of rewarding the agent to stay inside two borders\n", " and penalizing getting too close to the objects in front\n", " '''\n", "\n", " all_wheels_on_track = params['all_wheels_on_track']\n", " distance_from_center = params['distance_from_center']\n", " track_width = params['track_width']\n", " objects_distance = params['objects_distance']\n", " _, next_object_index = params['closest_objects']\n", " objects_left_of_center = params['objects_left_of_center']\n", " is_left_of_center = params['is_left_of_center']\n", "\n", " # Initialize reward with a small number but not zero\n", " # because zero means off-track or crashed\n", " reward = 1e-3\n", "\n", " # Reward if the agent stays inside the two borders of the track\n", " if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:\n", " reward_lane = 1.0\n", " else:\n", " reward_lane = 1e-3\n", "\n", " # Penalize if the agent is too close to the next object\n", " reward_avoid = 1.0\n", "\n", " # Distance to the next object\n", " distance_closest_object = objects_distance[next_object_index]\n", " # Decide if the agent and the next object is on the same lane\n", " is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center\n", "\n", " if is_same_lane:\n", " if 0.5 <= distance_closest_object < 0.8: \n", " reward_avoid *= 0.5\n", " elif 0.3 <= distance_closest_object < 0.5:\n", " reward_avoid *= 0.2\n", " elif distance_closest_object < 0.3:\n", " reward_avoid = 1e-3 # Likely crashed\n", "\n", " # Calculate reward by putting different weights on \n", " # the two aspects above\n", " reward += 1.0 * reward_lane + 4.0 * reward_avoid\n", "\n", " return reward\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 2 }