728x90
#1. Introduction
▶️ What is reinforcement learning
특정 State 에 따라 rewards를 정적강화(+n)/부적강화(-m) 을 세팅해서 자동으로 good action으로 행동하게 하는 것
▶️ Mars rover example
(s,a,R(s),s') = state, action, rewards, updated state after take action
▶️ The return in reinforcement learning
- Discount factor (감마) : 이동(action)에 대한 비용을 계산하는 것 . 증권에서는 돈의 가치 하락 등을 반영함.
- State에 따라 행동에 따른 return 값이 다르므로 이를 행동 가이드에 반영할 수도 있음
To summarize, the return in reinforcement learning is the sum of the rewards that the system gets,
weighted by the discount factor, where rewards in the far future are weighted by the discount factor raised to a higher power.
▶️ Making decisions: Policies in reinforcement learning
Policy(pi)
pi(state) = action
Goal
Find a policy pi that tells you what action to take in every state so as to maximize the return
Markov Decision Process (MPD)