If you want to develop your own MetaBBO-RL approach, to fit into MetaBox
running logic, you should meet with the following protocol about the Agent
and Optimizer
.
Agent
is the same definition in RL area, taking the state from env
as input and action
as output. But to fit into MetaBox pre-defined Trainer
and Tester
calling logic, Agent
should has train_episode
interface which will be called in Trainer
and rollout_episode
interface which will be called in Tester
.
Optimizer
is a component of env
in MetaBBO task. It’s controlled by Agent
and take action
from Agent
to perfrom corresponding change like hyper-parameters adjusting or operators selection. But to fit into env
calling logic. Interfaces namely init_population
and update
is needed.
Your agent should follow this template:
from agent.basic_agent import Basic_Agent
from agent.utils import save_class
class MyAgent(Basic_Agent):
def __init__(self, config):
"""
Parameter
----------
config: An argparse. Namespace object for passing some core configurations such as max_learning_step.
Must To Do
----------
1. Save the model of initialized agent, which will be used in "rollout" to study the training process.
2. Initialize a counter to record the number of accumulated learned steps
3. Initialize a counter to record the current checkpoint of saving agent
"""
super().__init__(config)
self.config = config
save_class(self.config.agent_save_dir, 'checkpoint0', self) # save the model of initialized agent.
self.learned_steps = 0 # record the number of accumulated learned steps
self.cur_checkpoint = 1 # record the current checkpoint of saving agent
"""
Do whatever other setup is needed
"""
def get_action(self, state):
"""
Parameter
----------
state: state features defined by developer.
Return
----------
action: the action inferenced by using state.
"""
def train_episode(self, env):
""" Called by Trainer.
Optimize a problem instance in training set until reaching max_learning_step or satisfy the convergence condition.
During every train_episode,you need to train your own network.
Parameter
----------
env: an environment consisting of a backbone optimizer and a problem sampled from train set.
Must To Do
----------
1. record total reward
2. record current learning steps and check if reach max_learning_step
3. save agent model if checkpoint arrives
Return
----------
A boolean that is true when fes reaches max_learning_step otherwise false
A dict: {'normalizer': float,
'gbest': float,
'return': float,
'learn_steps': int
}
"""
state = env.reset()
R = 0 # total reward
"""
begin loop:
"""
action = self.get_action(state)
next_state, reward, is_done = env.step(action) # feed the action to environment
R += reward # accumulate reward
"""
perform update strategy of agent, which is defined by you. Every time update your agent, please increase self.learned_step accordingly
"""
# save agent model if checkpoint arrives
if self.learned_steps >= (self.config.save_interval * self.cur_checkpoint):
save_class(self.config.agent_save_dir, 'checkpoint'+str(self.cur_checkpoint), self)
self.cur_checkpoint += 1
state = next_state
"""
check if finish loop
"""
return self.learned_steps >= self.config.max_learning_step, {'normalizer': env.optimizer.cost[0],
'gbest': env.optimizer.cost[-1],
'return': R,
'learn_steps': self.learned_steps}
def rollout_episode(self, env):
""" Called by method rollout and Tester.test
Parameter
----------
env: an environment consisting of a backbone optimizer and a problem sampled from test set
Return
----------
A dict: {'cost': list,
'fes': int,
'return': float
}
"""
state = env.reset()
is_done = False
R = 0 # total reward
while not is_done:
action = self.get_action(state)
next_state, reward, is_done = env.step(action) # feed the action to environment
R += reward # accumulate reward
state = next_state
return {'cost': env.optimizer.cost, 'fes': env.optimizer.fes, 'return': R}
Your backbone optimizer should follow this template:
from optimizer.learnable_optimizer import Learnable_Optimizer
class MyOptimizer(Learnable_Optimizer):
def __init__(self, config):
"""
Parameter
----------
config: An argparse.Namespace object for passing some core configurations such as maxFEs.
"""
super().__init__(config)
self.config = config
"""
Do whatever other setup is needed
"""
def init_population(self, problem):
""" Called by method PBOEnv.reset.
Init the population for optimization.
Parameter
----------
problem: a problem instance, you can call `problem.eval` to evaluate one solution.
Must To Do
----------
1. Initialize a counter named "fes" to record the number of function evaluations used.
2. Initialize a list named "cost" to record the best cost at logpoints.
3. Initialize a counter to record the current logpoint.
Return
----------
state: state features defined by developer.
"""
"""
Initialize the population, calculate the cost using method problem.eval and renew everything (such as some records) that related to the current population.
"""
self.fes = self.population_size # record the number of function evaluations used
self.cost = [self.best_cost] # record the best cost of first generation
self.cur_logpoint = 1 # record the current logpoint
"""
calculate the state
"""
return state
def update(self, action, problem):
""" update the population using action and problem.
Used in Environment's step
Parameter
----------
action: the action inferenced by agent.
problem: a problem instance.
Must To Do
----------
1. Update the counter "fes".
2. Update the list "cost" if logpoint arrives.
Return
----------
state: represents the observation of current population.
reward: the reward obtained for taking the given action.
is_done: whether the termination conditions are met.
"""
"""
update population using the given action and update self.fes
"""
# append the best cost if logpoint arrives
if self.fes >= self.cur_logpoint * self.config.log_interval:
self.cur_logpoint += 1
self.cost.append(self.best_cost)
"""
get state, reward and check if it is done
"""
if is_done:
if len(self.cost) >= self.config.n_logpoint + 1:
self.cost[-1] = self.best_cost
else:
self.cost.append(self.best_cost)
return state, reward, is_done
By the way, if you are developing classic optimizer, please refer to example classic optimizer.
After that, you should put your own declaring files in directory src/agent/
and src/optimizer/
respectively. Then the file structure should be like:
src
│
├─ agent
│ │
│ ├─ de_ddqn_agent.py
│ ├─ ...
│ ├─ rlepso_agent.py
│ └─ my_agent.py
└─ optimizer
│
├─ dq_ddqn_optimizer.py
├─ ...
├─ rlepso_optimizer.py
└─ my_optimizer.py
In addition, you should register you own agent and backbone optimizer in files src/agent/__init__.py
and src/optimizer/__init__.py
. For example, to register the previous class MyAgent, you should add one line into the src/agent/__init__.py
file as below:
from .my_agent import *
Meanwhile, you should also import your own agent and backbone optimizer into src/trainer.py
and src/tester.py
. Take trainer as an example, you should add two lines into file src/trainer.py
as follows:
...
# import your agent
from agent import{
...
MyAgent
}
# import your optimizer
from optimizer import{
...
MyOptimizer
}
The same action should be done also in src/tester.py
.
As mentioned, four modes are available:
run_experiment
your MetaBBO-RL optimizer.
run_experiment
mode implements fully automated workflow. Assume that you’ve written an agent class named MyAgent and a backbone optimizer class named MyOptimizer, the entire processes of train, rollout and test can be triggered by running command:
python main.py --run_experiment --train_agent MyAgent --train_optimizer MyOptimizer --t_optimizer_for_cp DEAP_DE JDE21 DEAP_CMAES Random_search
See Run Experiment for more details.
train
your agent.
python main.py --train --train_agent MyAgent --train_optimizer MyOptimizer
Once you run the above command, the runName
which is generated based on the run time and benchmark suite will appear at the command line.
See Training for more details.
rollout
your agent models.
Fetch your 21 trained agent models named checkpointN.pkl
in directory src/agent_model/train/MyAgent/runName/
and move them to directory src/agent_model/rollout/MyAgent/
. Rollout the models with train set using:
python main.py --rollout --agent_load_dir agent_model/rollout/ --agent_for_rollout MyAgent --optimizer_for_rollout MyOptimizer
When the rollout ends, check the result data in src/output/rollout/runName/rollout.pkl
and pick the best model to test.
See Rollout for more details.
test
your MetaBBO-RL optimizer.
Move the best .pkl
model file to directory src/agent_model/test/
, and rename the file to MyAgent.pkl
. Now use the test set to test MyAgent
with DEAP_DE
, JDE21
, DEAP_CMAES
and Random_search
:
python main.py --test --agent_load_dir agent_model/test/ --agent_for_cp MyAgent --l_optimizer_for_cp MyOptimizer --t_optimizer_for_cp DEAP_DE JDE21 DEAP_CMAES Random_search
See Testing for more details.
Notice that we record 21 checkpoint models during the whole training process. Rollout
could help you pick the suitable or best model to test and calculate the metrics.