Insper Computer Engineering students created a simulation environment and used reinforcement learning techniques to train drones to search for shipwrecked people. Supervised by professor Fabrício Barth, students Enrico Francesco Damiani, Leonardo Duarte Malta de Abreu, Luís Filipe Sanchez Carrete and Manuel Castanares developed the Final Engineering Project (PFE) in partnership with Embraer, under the mentorship of product development engineer José Fernando Basso Brancalion.

 

One of the interests of the research requested by Embraer, one of the largest manufacturers of commercial jets in the world, is the control of autonomous drones, capable of making decisions on certain tasks without human intervention, as opposed to what usually happens in the operation of these vehicles. In the challenge proposed to undergraduates, the drones would navigate with the help of a map of probabilities of castaways’ locations. Roughly speaking, once the accident has been reported, the rescue team knows the coordinates of the accident and begins the search from that point.

 

An alternative to using manned aircraft would be a swarm of drones, a group of small unmanned aircraft equipped with sensors and actuators that search and interact with the search environment. Time is critical in these situations: the longer it takes to identify the shipwrecked people in certain locations, the lower the chances of survival. A traditional approach is to establish a pre-configured behavior for drone movements, following zigzag, circular or parallel patterns.

 

Fabrício Barth says that some researchers began to question whether there was a more efficient way in which the drone itself could, given the scenario, determine the best behavior to follow. They then began experimenting with the use of reinforcement learning, a training method in machine learning which offers rewards for each successful or unsuccessful attempt by the agent (in this case, the drone).

 

Autonomously, the aerial vehicle knows how to move (north, south, east and west, above and below) and view a certain area (with cameras, for example). “But when it performs an action, the environment changes”, explains the professor. “The drone receives a new status and a reward, which can be positive, negative or neutral. In this problem of ours, if the drone reports the search in a region of the ocean and finds the castaway, it will receive a super positive reward because it managed to achieve the goal.”

 

When the drone performs movement actions and nothing happens, the reward is neutral. However, if it collides with another drone in the swarm during this movement, leaves the search area or runs out of battery, the reward is negative. “This is what is called a reinforcement function,” says Barth. Embraer’s idea is to experiment with the reinforcement learning process using shipwreck scenarios which simulate the movements of machines and sea currents, for example, in a computational environment. Thousands or millions of similar learning tests on real drones would be expensive and time-consuming.

 

Own simulator

 

The students researched similar projects and found simulators for different problems, but none for rescuing shipwrecked people. Students decided to implement their own simulator and make it public, in accordance with the PettingZoo library standard, in the Python language, which will facilitate subsequent implementation by other interested parties — all documentation is available in this link. “If someone researches the search for castaways using reinforcement learning, this library developed by the students will be available for use”, says Barth.

 

The second product delivered by the Final Engineering Project was the implementation of the multiple agent environment, that is, several drones working in cooperation and trained using the reinforcement learning method. Configurations with one, two and four aircraft were simulated. In the research, each complete search for the castaway represents an episode. The report shows that, as the number of episodes increases, the experience of agents grows. Drones try new paths and learn through reinforcement.

 

The project opens up possibilities for several future projects, according to Barth. Maritime data can be used to improve the simulation of the castaway’s movements in the current and improve the probability matrix, test other reinforcement learning algorithms or apply the process to real drones and observe the model’s behavior. “This is not an easy subject, and the group managed to organize itself, divide the work, interact periodically with the client and present the expected result”, says the professor.

 

Group in tune

 

The integration of the four students helped in their performance. Luís Filipe Carrete and Manuel Castanares studied at the same school and remained in contact at the Computer Engineering school. Leonardo Malta and Enrico Damiani work in the same company and knew the work methods well. Together, the four had already shared previous projects in Insper courses. “I was lucky to be in this group, as we have a lot of respect for each other,” says Carrete. “I believe this was very helpful for our dynamics, as we were close and were not ashamed to express our opinions regarding the development of the PFE. Our relationship and friendship left a mark on the work, and without it our project would not be the same.”

 

Much of the knowledge necessary to advance the project was acquired in the classroom, says Castanares. Some specific technical aspects of the project, such as the production of the multi-agent reinforcement learning algorithm, came from external learning. “With the help of our supervisor, we had to do research before developing the algorithm,” explains Castanares. “Once the researches were done, we made a first iteration of the algorithm. Afterwards, there were several iterations until achieving the expected result.”

 

Malta says that the solution was in line with the goals set at the beginning of the PFE. “After collaborating with engineer José Fernando, from Embraer, to define the expectations of the project, our objective was to investigate the potential use of drones controlled by an artificial intelligence system, together with a probability matrix, to solve the search for castaways”, he says. “During the course of the work, we managed to achieve this goal, in addition to building a simulated environment to reproduce this scenario.”

 

To meet the established deadlines, the students opted for some simplifications that, according to them, could be further developed if there was more development time. “In addition, we encountered challenges that caused delays in some deliveries, and with an extension of the development period, we would have the opportunity to improve these simplifications”, says Malta. “Despite the setbacks faced and the simplifications adopted, we managed to meet the requirements and, in general, achieved the initial expectations established.”

 

The frequency of meetings with Embraer was decided at the beginning of the project. Every 15 days, the research progress was presented to the mentor at the company. “We also exchanged emails frequently and, whenever necessary, they replied with diligence and great enthusiasm, really helping us”, says Damiani. “In addition to these meetings held every two weeks, in order to show what we were doing and align the next steps, we had two more meetings with several members of the company, in which we made our intermediate and final presentations.”

 

As usual in Insper’s Final Engineering Project, the project will continue with another group in the second half of 2023. “We are available to assist and answer questions from the students who will undertake the continuation of the project”, says Malta. The report was published in an article repository and, now, the group is looking for opportunities to participate in science fairs to publicize the project in the field of study of artificial intelligence.




This website uses cookies

Learn how Insper handles your personal data in our Privacy Notice, available on the Privacy Portal.

Privacy Notice

Cookie Settings

Cookie Usage

Learn how Insper handles your personal data in our Privacy Notice, available on the Privacy Portal.

Privacy Notice