uav dqn gazebo .pdf

File information

Original filename: uav_dqn_gazebo.pdf

This PDF 1.4 document has been generated by Google / , and has been sent on on 09/05/2017 at 06:51, from IP address 128.2.x.x. The current document download page has been viewed 438 times.
File size: 584 KB (1 page).
Privacy: public file

Download original PDF file

uav_dqn_gazebo.pdf (PDF, 584 KB)

Share on social networks

Link to this file download page

Document preview

Video results : FPV of quadrotor flight

Shut up and show me the code

Autonomous Quadrotor Flight in Simulation using RL
Ratnesh Madaan, Dhruv Mauria Saxena, Rogério Bonatti, Shohin Mukherjee



➢ Despite advancements in sensing technologies, it
is difficult to develop robust systems by separating
perception and control.
➢ Learning to fly in the real world is impractical since
it is time consuming and expensive.
➢ Learning to fly in simulation opens the possibility
of transferring learned policies to the real world.

➢ Develop an open-source Gazebo environment
integrated with Gym which can be used by the
community for reinforcement learning research.
➢ Train a deep Q-network capable of flying a drone
autonomously in the Gazebo environment.

➢ Environment consists of randomly places
cylindrical obstacles, simulated and rendered in
➢ Position of cylinders changes for each episode.
➢ Quadrotor is equipped with a planar laser
rangefinder and a front-facing RGB-D camera.

Partial Results
Graphs for grayscale, monocular camera images
Train Reward

Learning On Images
Network architecture:
➢ Input: 84 X 84 X 4 images
➢ Conv layer 1: 32, 8 X 8 filters, stride 4
➢ Conv layer 2: 64, 4 X 4 filters, stride 2
➢ Conv layer 3: 64, 3 X 3 filters, stride 1
➢ Fully-connected layer: 512 units
➢ Output: 9 units (correspond to yaw angles)

Train Episode Length

Test Reward

Conclusion And Future Work
Learning On Laser Data
Network architecture:
➢ Input: 70 X 4 array
➢ Fully-connected layer: 512 units
➢ Fully-connected layer: 512 units
➢ Output: 9 units (number of actions)

Learning Algorithm - DQN
➢ Qw- : Target Network
➢ Qw : Online Network

➢ With less than 1M iterations, we still could not
observe significant learning in the environments
with laser and depth images as inputs.
➢ Learning on depth images and laser data can be
more easily transferrable to real-life
➢ During the summer we plan to test real
quadcopters flying with the policies learned in

➢ [1] Mnih, Volodymyr, et al. "Playing atari with deep
reinforcement learning." arXiv preprint arXiv:1312.5602
➢ [2] F. Sadeghi and S. Levine, “(cad) 2 rl: Real single-image flight
without a single real image,” arXiv preprint arXiv:1611.04201,

Document preview uav_dqn_gazebo.pdf - page 1/1

Related documents

uav dqn gazebo
progressive report
final image based report mbuvha wang 2016
fake news detection final
enhance deliver

Link to this page

Permanent link

Use the permanent link to the download page to share your document on Facebook, Twitter, LinkedIn, or directly with a contact by e-Mail, Messenger, Whatsapp, Line..

Short link

Use the short link to share your document on Twitter or by text message (SMS)


Copy the following HTML code to share your document on a Website or Blog

QR Code

QR Code link to PDF file uav_dqn_gazebo.pdf