Releases: rlberry-py/rlberry
v0.7.3
v0.7.2
Relax dependencies
v0.7.1
rlberry-v0.7.0
Release of version 0.7.0 of rlberry.
This is the first rlberry release since we did a major restructuration of rlberry in three repositories (PR #379) :
rlberry (this repo): everything for rl that is not an agent or an environment, e.g. experiment management, parallelization, statistical tools, plotting...
rlberry-scool: repository for teaching materials, e.g. simplified algorithms for teaching, notebooks for tutorials for learning RL...
rlberry-research: repository of agents and environments used inside Inria Scool team
Changes since last version.
PR #397
- Automatic save after fit() in ExperienceManager
PR #396
- Improve coverage and fix version workflow
- Switch from RTD to github page
PR #382
- switch to poetry
PR #376
- New plot_writer_data function that does not depend on seaborn and that can plot smoothed function and confidence band if scikit-fda is installed.
rlberry-v0.6.0
Release of version 0.6.0 of rlberry.
This is the last rlberry release before we do a major restructuration of rlberry in three repositories:
- rlberry: everything for rl that is not an agent or an environment, e.g. experiment management, parallelization, statistical tools, plotting...
- rlberry-scool: repository for teaching materials, e.g. simplified algorithms for teaching, notebooks for tutorials for learning RL...
- rlberry-research: repository of agents and environments used inside Inria Scool team
Changes since last version.
PR #276
- Non adaptive multiple tests for agent comparison.
PR #365
- Fix Sphinx version to <7.
PR #350
- Rename AgentManager to ExperimentManager.
PR #326
- Moved SAC from experimental to torch agents. Tested and benchmarked.
PR #335
- Upgrade from Python3.9 -> python3.10
rlberry-v0.5.0
Release of version 0.5.0 of rlberry.
With this release, rlberry switches to gymnasium!
New in version 0.5.0:
- Merge gymnasium branch into main, make gymnasium the default library for environments in rlberry.
Remark: for now stablebaselines 3 has no stable release with gymnasium. To use stablebaslines with gymnasium, use the main branch from github:
pip install git+https://github.com/DLR-RM/stable-baselines3
rlberry-v0.4.1
Release of version 0.4.1 of rlberry.
Before the rlberry installation, please install the fork of gym 0.21 : "gym[accept-rom-license] @ git+https://github.com/rlberry-py/gym_fix_021"
New in 0.4.1
PR #307
- Create fork gym0.21 for setuptools non-retrocompatible changes.
PR #306
- Add Q-learning agent in :class:
rlberry.agents.QLAgentand SARSA agent in :class:rlberry.agents.SARSAAgent.
PR #298
- Move old scripts (jax agents, attention networks, old examples...) that we won't maintain from the main branch to an archive branch.
PR #277
- Add and update code to use "Atari games" env
rlberry-v0.4.0
Release of version 0.4.0 of rlberry.
New in 0.4.0
PR #273
- Change the default behavior of plot_writer_data so that if seaborn has version >= 0.12.0 then a 90% percentile interval is used instead of sd.
PR #269
- Add rlberry.envs.PipelineEnv a way to define pipeline of wrappers in a simple way.
PR #262
- PPO can now handle continuous actions.
-
Implementation of Munchausen DQN in rlberry.agents.torch.MDQNAgent.
-
Comparison of MDQN with DQN agent in the long tests.
- Compress the pickles used to save the trained agents.
PR #235
- Implementation of rlberry.envs.SpringCartPole environment, an RL environment featuring two cartpoles linked by a spring.
-
Improve logging, the logging level can now be changed with rlberry.utils.logging.set_level().
-
Introduce smoothing in curves done with plot_writer_data when only one seed is used.
PR #223
- Moved PPO from experimental to torch agents. Tested and benchmarked.
rlberry-v0.3.0
Release of version 0.3.0 of rlberry.
New in 0.3.0
PR #206
- Creation of a Deep RL tutorial, in the user guide.
PR #132
- New tracker class
rlberry.agents.bandit.tools.BanditTrackerto track statistics to be used in Bandit algorithms.
PR #191
- Possibility to generate a profile with
rlberry.agents.manager.AgentManager.
- Misc improvements on A2C.
- New StableBaselines3 wrapper
rlberry.agents.stable_baselines.StableBaselinesAgentto import StableBaselines3 Agents.
PR #119
- Improving documentation for agents.torch.utils
- New replay buffer
rlberry.agents.utils.replay.ReplayBuffer, aiming to replace code in utils/memories.py - New DQN implementation, aiming to fix reproducibility and compatibility issues.
- Implements Q(lambda) in DQN Agent.
Feb 22, 2022 (PR #126)
- Setup
rlberry.__version__(currently 0.3.0dev0) - Record rlberry version in a AgentManager attribute equality of AgentManagers
- Override
__eq__method of the AgentManager class.
Feb 14-15, 2022 (PR #97, #118)
- (feat) Add Bandits basic environments and agents. See
~rlberry.agents.bandits.IndexAgentand~rlberry.envs.bandits.Bandit. - Thompson Sampling bandit algorithm with gaussian or beta prior.
- Base class for bandits algorithms with custom save & load functions (called
~rlberry.agents.bandits.BanditWithSimplePolicy)
- (fix) Fixed bug in
FiniteMDP.sample(): terminal state was being checked withself.stateinstead of givenstate - (feat) Option to use 'fork' or 'spawn' in
~rlberry.manager.AgentManager - (feat) AgentManager output_dir now has a timestamp and a short ID by default.
- (feat) Gridworld can be constructed from string layout
- (feat)
max_workersargument for~rlberry.manager.AgentManagerto control the maximum number of processes/threads created by thefitmethod.
Feb 04, 2022
- Add
~rlberry.manager.read_writer_datato load agent's writer data from pickle files and make it simpler to customize in~rlberry.manager.plot_writer_data - Fix bug, dqn should take a tuple as environment
- Add a quickstart tutorial in the docs
quick_start - Add the RLSVI algorithm (tabular)
~rlberry.agents.RLSVIAgent - Add the Posterior Sampling for Reinforcement Learning PSRL agent for tabular MDP
~rlberry.agents.PSRLAgent - Add a page to help contributors in the doc
contributing
rlberry-v0.2.1
New in v0.2
Improving interface and tools for parallel execution (#50)
AgentStatsrenamed toAgentManager.AgentManagercan handle agents that cannot be pickled.Agentinterface requireseval()method instead ofpolicy()to handle more general agents (e.g. reward-free, POMDPs etc).- Multi-processing and multi-threading are now done with
ProcessPoolExecutorandThreadPoolExecutor(allowing nested processes for example). Processes are created withspawn(jax does not work withfork, see #51).
New experimental features (see #51, #62)
- JAX implementation of DQN and replay buffer using reverb.
rlberry.network: server and client interfaces to exchange messages via sockets.RemoteAgentManagerto train agents in a remote server and gather the results locally (usingrlberry.network).
Logging and rendering:
- Data logging with a new
DefaultWriterand improved evaluation and plot methods inrlberry.manager.evaluation. - Fix rendering bug with OpenGL (bf606b4).
Bug fixes.
New in v0.2.1 (#65)
Features:
AgentandAgentManagerboth have aunique_idattribute (useful for creating unique output files/directories).DefaultWriteris now initialized in base classAgentand (optionally) wraps a tensorboard SummaryWriter.AgentManagerhas an optionenable_tensorboardthat activates tensorboard logging in each of itsAgents (with theirwriterattribute). Thelog_dirs of tensorboard are automatically assigned byAgentManager.RemoteAgentManagerreceives tensorboard data created in the server, when the methodget_writer_data()is called. This is done by a zip file transfer withrlberry.network.BaseWrapperandgym_makenow have an optionwrap_spaces. If set toTrue, this option convertsgym.spacestorlberry.spaces, which provides classes with better seeding (using numpy'sdefault_rnginstead ofRandomState)AgentManager: new method get_agent_instances() that returns trained instancesplot_writer_data: possibility to set xtag (tag used for x-axis)
Bug fixes:
- Fixed agent initialization bug in
AgentHandler(eval_envmissing in kwargs for agent_class).