Skip links to main content

SIPL Projects

Machine & Deep Learning

Classification of ultrasonic vocalizations in mice using Deep Neural Networks to predict autistic behavior

Picture for Classification of ultrasonic vocalizations in mice using Deep Neural Networks to predict autistic behavior

2025

Student/s: Matan Levy and Tali Leiba

Supervisor/s: Dror Lederman

This work focused on analyzing and classifying ultrasonic vocalizations (USVs) of mice using deep neural networks from the PANN family, aiming to predict behaviors associated with autism. The primary objective was to evaluate the model's ability to differentiate between mice from the WT (control group) and HT (research group with a genetic mutation linked to autism). As part of the work, an automated process was developed for USV data analysis, which included: • Data preparation: Organizing and structuring files, using advanced augmentation tools, and addressing challenges such as significant data imbalance between groups.

Mark of Award this Project

Classification of SSEP Signals for Surgical Monitoring

Picture for Classification of SSEP Signals for Surgical Monitoring

2025

Student/s: Noa Solomon Ouzana and Roni Bar

Supervisor/s: Hadas Ofir

Spinal and back surgeries carry the risk of causing damage to the patient's neural function. To mitigate this risk, SSEP signals are measured in real time during surgery to detect potential neural impairments as early as possible. The goal of this project was to develop a system based on deep learning and signal processing that classifies SSEP signals into four categories, achieving a Recall of 80% for the category of significant drop in neural function. The signals were fed into a deep neural network in a time-sampled format, and the model's architecture was based on convolutional layers to exploit the temporal correlation of the signals.

Mark of Award this Project

How Does a Lexical Stress Look like?

Picture for How Does a Lexical Stress Look like?

2025

Student/s: Itai Allouche and Itay Asael

Supervisor/s: Rotem Rousso & Prof. Yossi Keshet

Lexical stress plays a crucial role in distinguishing word meanings and grammatical functions, particularly in minimal pairs (e.g., PREsent vs. presENT). The aim is to propose a classifier for detecting the stressed syllable and understanding the acoustic features underlying the decision. Disyllabic stress minimal word pairs and non-minimal word pairs (e.g., WALlet vs. extEND) were selected, and these words were located in several speech corpora using a forced aligner. A part-of-speech tagging system was used to label each minimal pair’s words as either a noun, which is associated with stress on the first syllable, or a verb, which is associated with stress on the last syllable.

Log-based anomaly detection assisted by LLMs Large Language Models

Picture for Log-based anomaly detection assisted by LLMs Large Language Models

2025

Student/s: Ido Porges

Supervisor/s: Pavel Lifshits

This work tackles the challenge of monitoring anomalies in a running software, specifically by using a large language model (LLM) trained on the log file. This challenge can also be considered as a classification problem with two classes (anomaly, ‘normal’) with the log files as data. The project’s goal is to explore and measure possible improvements to the latest techniques. In pursuit of this goal, we have conducted a short literature review and then we started replicating results of common models and date databases.

Predicting Complications After Orthopedic Surgeries

Picture for Predicting Complications After Orthopedic Surgeries

2025

Student/s: Madlene Haddad and Butrus Smair

Supervisor/s: Hadas Ofir

In this work, predictive models were developed to identify patients at high risk for complications following hip and knee replacement surgeries, including blood transfusion, venous thromboembolism, acute kidney injury, and infections. The workflow involved extensive data preprocessing, conversion of medical codes into binary features, and addressing class imbalance through appropriate class weighting. Classical machine learning methods (Logistic Regression and Random Forest) as well as a Deep Neural Network (DNN) with advanced hyperparameter tuning were employed. Model performance was assessed using metrics such as AUC, Recall, Accuracy.

Enhancing Speech-to-Text Models with Speech Signal Augmentations

Picture for Enhancing Speech-to-Text Models with Speech Signal Augmentations

2025

Student/s: Nitzan Alt and Gal Raviv

Automatic Speech Recognition (ASR) systems have seen major improvements through deep learning, yet they remain sensitive to variations in speaking rate, background noise, and speaker identity—factors that commonly degrade real-world performance. To address this, data augmentation is widely used to increase model robustness. This project evaluates the effectiveness of ScalerGAN, a generative augmentation technique, compared to classical fixed transformations such as time wrapping. ScalerGAN leverages a Generative Adversarial Network (GAN) to create realistic, faster-speaking mel-spectrograms aimed at enriching the training data with more natural variability.

Hand Gesture Classification Using Ultra-Wideband Radar Signals

Picture for Hand Gesture Classification Using Ultra-Wideband Radar Signals

2025

Student/s: Or Nitsan and Niv Ofir

Supervisor/s: Dr. Meir Bar-Zohar

Hand gesture recognition facilitates intuitive, hands-free human-computer interaction in applications such as smart environments, healthcare, and virtual reality. This study explores the use of Ultra-Wideband (UWB) radar signals for classifying dynamic hand gestures, leveraging UWB’s fine range resolution, low power consumption, and capability to detect close-range targets. To enhance classification accuracy, we apply clutter removal and systematic data augmentation as preprocessing steps. Several recent deep neural networks are evaluated and compared, with the most effective model selected for final implementation.

Leveraging Vision-Language Models for Diagnosis of Obstructive Sleep Apnea from CBCT Images

Picture for Leveraging Vision-Language Models for Diagnosis of Obstructive Sleep Apnea from CBCT Images

2025

Student/s: Omer Sde-Chen and Nadav Menahem

Supervisor/s: Nurit Spingarn

This work focuses on developing a Vision-Language Model (VLM) for identifying the hyoid bone in CBCT images and analysing its position relative to the mandibular plane. CBCT scans are commonly used in dental clinics but have not been fully utilized for diagnosing Obstructive Sleep Apnea (OSA). The project's goal is to develop an AI-assisted diagnostic tool that provides automatic textual descriptions of the hyoid bone's position, assisting clinicians in early detection. We adapted the MyVLM (Alaluf et al., 2024) architecture for medical applications, replacing key components with Medical CLIP and LLaVA-Med, which are specifically trained for analysing anatomical structures.

MWA – Multilingual Word Aligner

Picture for MWA – Multilingual Word Aligner

2025

Student/s: Roy Weber, Meidan Zehavi

Supervisor/s: Rotem Rousso & Prof. Yossi Keshet

We present the Multilingual Word Aligner (MWA), a new open source Multilingual model for words timestamps boundaries alignment for given audio. We composed new embedding and developed architecture for accurate, multilingual word alignment reached state-of-the-art performances on Timit and Buckeye known benchmarks. We evaluate Transformer, Conformer and VGG as sequence boundary detection models in our experiments, ultimately selecting Conformer, based on its superior performance. Finally, dynamic programming was used to perform the final alignment, refining the boundaries and ensuring accurate word segmentation.

Deepfake Face Detection In The Wild

Picture for Deepfake Face Detection In The Wild

2025

Student/s: Omer Peled, Yahav Freitag, Yossi Oppenheim, Dvir Ben-Aroush and Yuval Rosman

Supervisor/s: Yair Moshe

This work outlines Team SIPL’s solution to the 2025 Signal Processing Cup challenge, which focuses on the task of distinguishing real images from Deepfakes. Recent Deepfake image detection techniques rely on deep neural networks but face challenges related to limited generalization capabilities. Our proposed solution addresses these challenges by employing an ensemble of state-of-the-art methods. At the core of our approach is a neural network trained to intelligently combine embeddings from these individual detectors. This fusion enables the ensemble to effectively integrate complementary features, enhancing its ability to identify subtle artifacts.

Classify ECG Time Series Using Wavelet Analysis and Deep Learning

Picture for Classify ECG Time Series Using Wavelet Analysis and Deep Learning

2024

Student/s: Doron Hanuka (Part A+B), Coral Kashti (Part A only)

Supervisor/s: Dr. Meir Bar-Zohar

The goal of this work is to develop a system that classifies ECG signals into two categories: arrhythmias (ARR) and normal sinus rhythm (NSR). Upon receiving an ECG signal from a subject, the system operates as follows: The temporal signal is divided into windows, resulting in a time series of windows. A Wavelet Transform is applied to each window to obtain a time-frequency representation for the time segment within the window. Features are extracted from the windows using a convolutional network trained for this task, yielding a time series of features. Predictions are made on this time series using an LSTM network, providing a prediction of the subject's cardiac condition.

Estimating BMI from 2D Image

Picture for Estimating BMI from 2D Image

2024

Student/s: Tzvi Tal Noy, Ido Sagi

Supervisor/s: Nurit Spingarn

The BMI index is a crucial index which gives a quantitative assessment of whether a person is in normal weight, underweight or overweight. The index is calculated using height and weight data. The purpose of our project is to estimate a person's BMI from a single 2-dimensional image. This is a complex task because visual inspection of the image is not sensitive to the distance of the object from the camera and the angle of the shot. To approach this task, we relied on previous works in the field, on their results, and on the dataset published with them.

Biometric Authentication Using PPG Signals

Picture for Biometric Authentication Using PPG Signals

2024

Student/s: Inbal Ben Yehuda, Shany Danino

Supervisor/s: Yair Moshe

Today, the challenges of security and information safety are substantial, necessitating the development of high-quality and reliable verification methods. The use of biometric authentication methods is expanding as they provide secure and convenient means of verification. In this project, we explore the potential of using the PPG signal as a unique biometric authentication method. This signal represents changes in blood volume during cardiac cycles. Each individual’s PPG signal is influenced by a unique combination of physiological characteristics. This uniqueness allows us to use the PPG signal as a sort of "fingerprint" to identify the person.

Music Genre Classifier Using Deep Learning Networks

Picture for Music Genre Classifier Using Deep Learning Networks

2024

Student/s: Ilay Yavlovich, Amit Karp

Supervisor/s: Hadas Ofir

In this work, we created a model based on deep neural networks to classify music genres. During the process, we segmented each song into song excerpts and fed them into the model for training, validation, and testing. Throughout the project, we utilized the classification of songs with a single genre from the MTG-Jamendo song database, which involved working with a database of songs divided into genres in an unbalanced manner (the number of songs from each genre varies significantly). Therefore, we chose to work only with the ten largest genres and used different weighting schemes in hopes of improving the results.

Depth-Based Semantic Segmentation for Four–Legged Robot

Picture for Depth-Based Semantic Segmentation for Four–Legged Robot

2024

Student/s: Shany Cohen, Tal Sonis

Supervisor/s: Yair Moshe

This work’s goal is to enable maneuvering abilities for a four-legged robot in an indoor environment by employing semantic segmentation. The segmentation is performed using deep learning, based on low-resolution grayscale and depth images captured by a Pico Flexx camera mounted atop the robot. While most existing semantic segmentation methods rely on RGB and depth images, there are no pre-trained models specifically designed for grayscale images. In the project, we adapted an architecture intended for semantic segmentation using RGB and depth images, leveraging transfer learning to tailor it to our specific requirements.

Emotional Speech Synthesis

Picture for Emotional Speech Synthesis

2024

Student/s: Sagi Eyal, Loren Tzveniashvily

Supervisor/s: Yair Moshe

The goal of this work was to perform emotional speech synthesis. First, we experimented with the emotional voice conversion approach, where the system receives two voice signals and transfers the emotion from one recording to another. Later in the project, we focused on the emotional text-to-speech approach, where the system receives the transcription of the sentence we want to synthesize and the desired emotion and generates a recording of the desired sentence with the given emotion. First, we reproduced the results of the EmoSpeech system, which converts text to emotional speech in a fast and high-quality manner.

Characterizing Pedestrians in Parks

Picture for Characterizing Pedestrians in Parks

2024

Student/s: Shany Zehavy, Adi Levy

Supervisor/s: Ori Bryt

This work aims to address the pressing need for high-quality public open spaces in urban environments, with a focus on leveraging computer vision and deep learning techniques. The COVID-19 pandemic has emphasized the importance of public open spaces in enhancing the well-being and quality of life for city dwellers. It has become evident that these spaces serve as vital elements in urban landscapes and play a significant role in promoting physical and mental health, social interactions, and overall community resilience.

Deep Learning for Multiple Virus Detection Tests Using Sparse Genome Reads

Picture for Deep Learning for Multiple Virus Detection Tests Using Sparse Genome Reads

2024

Student/s: Eran Yermiyahu, Michal Maymon

Supervisor/s: Zuher Jahshan

This work focuses on the identification of diverse respiratory diseases through the utilization of advanced machine learning tools and neural networks applied to genomic sequences. The primary objective of our study is to develop a rapid and cost-effective diagnostic tool capable of detecting a range of respiratory illnesses and identifying the different variants of each disease. The urgency for accurate diagnosis of various respiratory diseases has become paramount, particularly considering the ongoing global COVID-19 pandemic. Additionally, the presence of comorbidities significantly heightens the risk of life-threatening complications.

Calibration of Deep Neural Networks

Picture for Calibration of Deep Neural Networks

2024

Student/s: Yonatan Leibovich, Avichay Ashur

Supervisor/s: Yair Moshe

Deep Neural Networks (DNNs) are a type of learned functions which consist of multiple layers between the input and output layers. These layers consist of neurons that are connected to each other, transmitting information from their input to their output. A widespread use of DNNs is to learn to classify complex data by learning from a set of labeled. It has been shown that DNNs suffer from miscalibration i.e. misalignment between predicted probabilities and actual outcomes. For example, if we have 100 samples that the DNN is 90% confident about, we expect the network to correctly classify 90 of these samples and make mistakes in 10 of them.

Image guided image generation using stable diffusion and CLIP

Picture for Image guided image generation using stable diffusion and CLIP

2024

Student/s: Ido Blayberg, Ohad Amsalem

Supervisor/s: Noam Elata

In recent years, AI-driven image editing has emerged as a promising field with numerous applications. This work explores the capabilities of the generative AI model, Diffusion, for image editing guided by reference images. We focus on leveraging a set of images that outline the desired editing features and applying them to a target image. By experimenting with various hyper-parameters, modifying the core components of the Diffusion model, and integrating the CLIP model, we demonstrate various improvements in image editing performance.

SAR Target Classification Using Deep Learning

Picture for SAR Target Classification Using Deep Learning

2024

Student/s: Avichay Ashur

Supervisor/s: Dr. Meir Bar-Zohar

This work’s goal is to develop a classifier based on convolution neural network to classify Synthetic Aperture Radar (SAR) targets using deep learning. Deep learning is a powerful technique that can be used to train robust classifiers. It has shown its effectiveness in diverse areas ranging from image analysis to natural language processing. These developments have huge potential for SAR data analysis and SAR technology in general, slowly being realized. A major task for SAR-related algorithms has long been object detection and classification, which is called automatic target recognition (ATR).

Digitizing The Yerushalmi Catalogue

Picture for Digitizing The Yerushalmi Catalogue

2024

Student/s: Rami Halabi, Salah Abbas

Supervisor/s: Ori Bryt

Joseph Yerushalmi, a librarian at the University of Haifa Library, created a catalogue with around 65,000 records on paper cards. The catalogue contains articles from the 1940s to the 1970s, focusing on individuals like artists, writers, philosophers, intellectuals, and historical figures. the collection also includes reviews on books and literary works. To preserve this valuable catalogue, digitization is needed, the project is divided to two parts: The first part is to Detect text regions, which means classifying each region to its appropriate label: Title, Author, Text, and other.

Bass Generation Based on Vocals via Deep Learning

Picture for Bass Generation Based on Vocals via Deep Learning

2024

Student/s: Dror Tiferet and Rom Ben Anat

Supervisor/s: Hila Manor & Gal Gershler

This work aims to create a bass accompaniment track for a solo vocal track. This is achieved using a machine learning model trained on a comprehensive dataset of bass and vocal tracks that sound good together. The system first processes the vocal track and converts it into a spectrogram, a graphical representation of the signal's frequency spectrum over time. A generative diffusion model is then used for producing a corresponding bass track that aligns with the input vocal spectrogram as a conditioning input to the model. Throughout the project, an extensive literature review was conducted to select appropriate models, including mel-gan and HIFI gan.

Advanced analysis methods for dynamics of functional connectivity in the brain during learning

Picture for Advanced analysis methods for dynamics of functional connectivity in the brain during learning

2024

Student/s: Gali Eytan and Ariel Engelman

Supervisor/s: Dr. Hadas Benisty

Recent studies have shown that motor learning entails the dynamic reorganization of functional connectivity in the brain’s neural networks. This work investigates these dynamics by analyzing the layer 2-3 pyramidal neurons of the motor cortex in mice during motor task learning, drawing inspiration from related studies on VTA (ventral-tegmental area) dopaminergic projections and their influence on network plasticity. Both prior research and this work employ the diffusion map algorithm with Riemannian distances to effectively reduce the dimensionality of neural activity correlations.

Analysis of long-QT in ECG signal

Picture for Analysis of long-QT in ECG signal

2024

Student/s: Yaniv Zegerson

Supervisor/s: Hadas Ofir

Long-QT Syndrome is an arrhythmia that might cause additional life-threatening arrhythmias. Thus, early detection of the appearance of Long-QT in people exposed to environmental factors, such as drug exposure, is vital. To avoid unnecessary hospitalization of patients for tracking their heart function, it’s desirable to develop a detection method for finding such cases using mobile ECG devices. These devices usually measure a single lead of the ECG. In this work, we evaluate several deep-learning-based solutions to detect cases of high-risk Long-QT out of single lead ECG measurements, with emphasis on evaluating the possibility to use mobile ECG devices for such detection.

Camera-Net: A Two Stage Framework for Effective Camera ISP Learning

Picture for Camera-Net: A Two Stage Framework for Effective Camera ISP Learning

2024

Student/s: Meral Holi and Bayan Najamy

Supervisor/s: Dr. Meir Bar-Zohar

The traditional image signal processing (ISP) pipeline involves a series of sequential image processing modules within a camera that transform raw sensor data into a high-quality sRGB image. Recently, methods have been developed to enhance traditional ISP performance using convolutional neural networks (CNNs). However, these approaches typically train a CNN to handle ISP tasks without adequately considering the interconnections among different ISP components. Consequently, the image quality in challenging scenarios, like low-light conditions, remains subpar.

Search for similarities and clusters of Gamma Ray Bursts light-curves from Fermi GBM database

Picture for Search for similarities and clusters of Gamma Ray Bursts light-curves from Fermi GBM database

2024

Student/s: Solomon Margolin

Supervisor/s: Nimrod Peleg & Yair Moshe, Prof. David Malah and Prof. Ehud Behar

Gamma Ray Bursts (GRBs) are among the most energetic and transient events observed in the universe, with emission concentrated in the high-energy gamma-ray part of the electromagnetic spectrum. Since their unexpected discovery in the late 1960s by U.S. satellites initially deployed to monitor for nuclear detonations, GRBs have become a central focus in high-energy astrophysics. These bursts, observable across vast cosmological distances, present unique challenges due to their brief and intense nature, with implications for both astrophysical phenomena and the structure of the universe.

Depth Maps Quality Assessment Using Deep Features

Picture for Depth Maps Quality Assessment Using Deep Features

2023

Student/s: Amit Shpigelman, Simcha Lipner

Supervisor/s: Ori Bryt

Within the field of Image Quality Assessment (IQA), there are a few main methods of producing an evaluation. One such method, which this project focuses on, is a Perceptual index meaning a measure of a photos visual quality, what looks good to human eyes. Our goal is to create such a measure, which can assist Deep Neural Networks to perform various tasks upon depth images. Examples for tasks can be classification, denoising, compression and reconstruction, etc. Our work includes an attempt to create such a measure, using the responses of a DNN we design for different datasets.

Image Reconstruction from Deep Diffractive Neural Network

$Picture for Image Reconstruction from Deep Diffractive Neural Network$

2023

Student/s: Iggy Segev Gal, Tamar Sde Chen

Supervisor/s: Matan Kleiner

Deep diffractive neural networks have emerged as a promising framework that combines the speed and energy efficiency of optical computing with the power of deep learning. This has opened new possibilities for optical computing suit for machine learning tasks and all-optical sensors. One proposed application of this framework is the design of a diffractive camera that preserves privacy by only imaging target classes while optically erasing all other unwanted classes. In this work, we investigated whether this camera design truly erases the information in unwanted class data. We used K-NN to achieve up to 94% accuracy in classifying optically erased images.

Identification of Dairy Cows

Picture for Identification of Dairy Cows

2023

Student/s: Kfir Bendic, Itzhak Mandelman

Supervisor/s: Ido Cohen

Classifying dairy cows is a critical operation for dairy farms. The primary goal of dairy farms is to maximize milk production, which is achieved by monitoring various aspects of each cow, including milk yield, health status, estrus time, and other characteristics. Therefore, the foremost objective of a dairy farm is to establish a reliable method for identifying each cow accurately. Currently, common methods for cow identification rely on permanent measures such as ear or back tattooing, as well as the use of ear tags equipped with radio frequency identification (RFID) technology. However, these methods have limitations as they can fade, fall off, or break over time.

Acoustic Scene Classification

Picture for Acoustic Scene Classification

2023

Student/s: Shira Lifshitz, Ellinor Elimeleh

Supervisor/s: Dr. Meir Bar-Zohar

This work deals with acoustic scene classification on a dataset published in the DCASE2017 challenge. The goal is to achieve better performance than the performance presented in the challenge, using neural networks and mel-spectrogram features. We present the processing of the dataset, the classifier and models, and the selected hyperparameters. The best performance was obtained using mel-spectrogram features, an EfficientNet V2 S neural network, and a MiniNet net as selection algorithm. Accuracy of 83.33% was achieved, which is higher than the performance to which we compare the results.

Classification of Heart Sounds Using Deep Convolutional Networks

Picture for Classification of Heart Sounds Using Deep Convolutional Networks

2023

Student/s: Shlomi Zvenyashvili, Arik Berenshtein

Supervisor/s: Dr. Meir Bar-Zohar

Heart cardiovascular disease is a leading cause of death globally, with over 17 million deaths each year according to the World Health Organization (WHO). Accurate classification of heart sounds is crucial for early detection and effective management of heart conditions. However, this task is challenging due to the complexity of heart sound data, which includes variations caused by low quality recordings and differing physiological conditions. Robust and efficient models are needed for handling such diverse data and improving diagnostic accuracy. In this work, we propose a machine learning-based solution using Deep Convolutional Networks.

Recognizing Autism in Mice by Analyzing Their Squeaks

Picture for Recognizing Autism in Mice by Analyzing Their Squeaks

2022

Student/s: Itamar Ginsberg, Alon Schreuer

Supervisor/s: Dr. Dror Lederman, Prof. Hava Golan

Diagnosis of autism at an early age is an extensive area of research, as it has a massive impact on the ability to treat and aid those suffering from the syndrome. So far diagnosis has been based on professional behavioral observation, a flawed tool since it is subjective and imprecise, but also due to the fact that it is only effective at a late developmental stage (age 4-5 years). The goal of this work is to develop a diagnostic-assist tool for classifying mice into two categories: mice with symptoms of ASD (Autism Spectrum Disorder) and mice without such symptoms, based on recordings of their squeaks.

Deep Learning Based Target Cancellation for Speech Dereverberation

Picture for Deep Learning Based Target Cancellation for Speech Dereverberation

2022

Student/s: Neriya Golan, Mikhail Klinov

Supervisor/s: Yair Moshe, Baruch Berdugo

Background noise and reverberation can degrade the quality of speech signals and reduce their intelligibility. Reverberations also reduce the performance of important systems such as hearing aids or voice recognition applications. There are a variety of classic methods for dereverberation of speech signals, but their performance is usually unsatisfactory and not generalizable. In light of this, there has been an increase in recent years in research on dereverberation using modern methods based on deep learning.

Textual Explorable Super Resolution

Picture for Textual Explorable Super Resolution

2022

Student/s: Noam Elata, Rotem Idelson

Supervisor/s: Tomer Michaeli, Yuval Bahat

In this work, we developed an explorable Super-Resolution model, which generates a high-resolution image that is both consistent with the original low-resolution image and consistent with the semantic information desired by the user. The control of the image exploration is obtained by using a text prompt that is processed for its semantic information using CLIP network. We have investigated several methods of performing the above task; We first attempted expanding an existing explorable Super-Resolution network to optimize over the semantic information in the text.

Image Colorization for Thermal Mobile Camera Images

Picture for Image Colorization for Thermal Mobile Camera Images

2022

Student/s: Idan Friedman, Tomer Lotan

Supervisor/s: Ori Bryt

Thermal image colorization is a topic that is gaining momentum in the world of artificial intelligence. In recent years, with a significant improvement in the tools, and with the algorithmic development of deep learning, the world of computer vision has managed to achieve impressive achievements in everything related to image processing and analysis. A significant development that has led to the rapid increase in achievements is the development of the CNN called GAN - Generative Adversarial Network. Networks of this type make it possible to produce a new set of information based on the characteristics of the existing information.

Speech-to-Singing Conversion Using Deep Learning

Picture for Speech-to-Singing Conversion Using Deep Learning

2022

Student/s: Omri Jurim, Ohad Mochly

Supervisor/s: Yair Moshe, Gal Greshler

The purpose of this work is to develop an algorithm for converting speech to singing using deep learning methods. The system can help memorize various short texts like phone numbers and lists, as well as for entertainment. There are research papers on the subject that are based on classical signal processing methods as well as works that are based on deep learning, but so far (while working on this project) no results have been achieved that preserve speech content so that it is understandable and humane, along with converting it to desired melody.

Voice Disorder Detection via Deep Learning

Picture for Voice Disorder Detection via Deep Learning

2022

Student/s: Yiftach Edelstein, Chen Katzir

Supervisor/s: Hadas Ofir, Dr. Ariel Roitman

The project deals with the diagnosis of various voice pathologies related to the throat and vocal cords which today can only be diagnosed by a long and multi-stage process that includes listening to the patient's voice by an otolaryngologist specialist and then an invasive examination using special equipment. We assume that there is plenty of information about those pathologies in the voice recordings of the subjects, and therefore we wish to use them to design a simpler diagnosis procedure that is based on machine-learning algorithms.

Prediction of Anesthesia Depth based on EEG Signals

Picture for Prediction of Anesthesia Depth based on EEG Signals

2022

Student/s: Nadav David, Isaac Ben-David

Supervisor/s: Hadas Ofir, Ya-Wei Lin

Spinal Surgery is a high-risk procedure with sever potential complications including paralysis and permanent sensory loss. Most of these complications are preventable or can be mitigate using Intra-Operative Neuromonitoring. The field of IONM is rather new, but it is rapidly becoming a standard-of-care in neurosurgery, orthopedics and ENT (ears, nose, throat) procedures. During neuromonitoring of a case, relevant bio-signals are recorded and processed prior to and during the surgery, by which the neurophysiologists can detect pending neurological insults. EEG is one of the most important bio-signals in neuromonitoring, allowing to assess the depth on anesthesia.

Image Denoising Using CNN Autoencoder

Picture for Image Denoising Using CNN Autoencoder

2021

Student/s: Avihu Amar, Gil Barum

Supervisor/s: Dr. Meir Bar-Zohar

In this work we show a practical solution for image denoising using CNN Autoencoder Neural Network. The network we built is easy to implement and provide relatively high performance when compared to other classic methods like BM3D, and even compared to other, more complex networks. This network is also very flexible and can be adjusted to match different memory capacity of the graphic cards available for the training. We show how we use a relatively simple design and improve it by using custom performance metrics designed to evaluate images, replacing standard layers like MaxPool and UpSampling with convolutional layers, implementing custom loss functions and comparing between them.

Error Resilient Real AdaBoost

Picture for Error Resilient Real AdaBoost

2021

Student/s: Asaf Goren, Da-El Klang

Supervisor/s: Yuval Ben-Hur

AdaBoost is a binary classification algorithm that combines several weak classifiers into one strong classifier. This algorithm has relatively good results, even for nearly random base classifiers. Ever since it was published, many variants of the algorithm have been developed for different specific cases. In this project, we focus on a specific version of the algorithm, Real AdaBoost, in which the output of each weak classifier is a real number. Each number represents the confidence level of the classifier in the specific classification decision, and the final classification result is the sum of outputs of all the classifiers.

Ultrasonic Water Meter Calibration by Deep Learning

Picture for Ultrasonic Water Meter Calibration by Deep Learning

2021

Student/s: Tamir Bitton

Supervisor/s: Hadas Ofir

Water meter calibration is an essential process in order to maintain the performance of a meter, but it is a complicated process. The process contains several stages, when each step includes sampling many measurements from the uncalibrated meters and error calculation for each measurement. Consequently, the calibration process is very slow and expensive. The goal of this project is to significantly shorten the calibration time using a deep learning based method for predicting the results, while maintaining a certain bound on the prediction error. The system uses a dataset provided by ARAD technologies, that contains calibration factors of different water meters.

Mark of Award this Project

Image Manipulation with GANs Spatial Control

Picture for Image Manipulation with GANs Spatial Control

2021

Student/s: Karin Jakoel, Liron Efraim

Supervisor/s: Tamar Rott

We suggest a new approach that enables spatial editing and manipulation of images using Generative Adversarial Networks (GANs). Though many tasks have been solved utilizing the powerful abilities of GANs, this is the first time that a spatial control is suggested. This ability is possible thanks to a test-time spatial normalization that uses the trained model as is and does not requires any fine tuning. Therefore our method is significantly fast and does not required further training. We demonstrate the new approach for the task of class hybridization and saliency manipulation.

Creating Image Segmentation Maps Using GANs

Picture for Creating Image Segmentation Maps Using GANs

2021

Student/s: Inbal Aharoni, Shani Israelov

Supervisor/s: Idan Kligvasser

The use of GAN has drastically affected low-level vision in graphics, particularly in tasks related to image creation and image-to-image translation. Today the training process, despite all the latest developments, is still unstable. Given a semantic segmentation map in which we can separate and look at each pixel in the image and tag it to the relevant class it represents, we can (with the help of GAN) produce images based on this map and hope to reach a more stable model. With the success of GANs we produced segmentation maps. With these maps and with the help of the generative model we can get a semantic understanding of the data set and even create completely new scenes.

SinGAN for Temporal Super-Resolution

Picture for SinGAN for Temporal Super-Resolution

2021

Student/s: Tomer Arama, Itay Shemer

Supervisor/s: Tamar Rott Shaham

Super resolution in images and video is a complex task that represents an array of different perceptual abilities, from object recognition to movement flow recognition. SinGANs architecture showed that SOTA super resolution from a single training image (without priors) is possible. TSR is an architecture that performs temporal super resolution on videos, that showed SOTA performance on a single training video. In this project we tried modifying SinGANs architecture and explore its ability to generalize its super resolution capabilities to 3D data videos, the main difference from TSRs architecture is our usage of GANs and the adversarial training scheme.

A Random-Projection Based Approach for Generative Modelling

Picture for A Random-Projection Based Approach for Generative Modelling

2021

Student/s: Elad David

Supervisor/s: Prof. Tomer Michaeli

Generative models have been widely studied in recent years using large and costly DNN-based models. Yet, results still have much room for improvement in terms of both accuracy and runtime. In our work, we aim to tackle the Generative Modeling problem using a different, computationally lighter approach, based on an iterative fitting process between marginals of source and target distributions. Intuitively, one can think of this process as an analogue to the process of Tomography where each direction of observation adds information of the objects density. In this report, we formulate the underlying theory, demonstrate the algorithms performance, and analyze its abilities and weaknesses.

Features Extraction for Classification of Dolphin Sounds

Picture for Features Extraction for Classification of Dolphin Sounds

2021

Student/s: Harel Plut, Or Cohen

Supervisor/s: Dr. Roee Diamant

With the large increase in human marine activity, our rivers and seas have become populated with boats and ships projecting acoustic emissions of extremely high power that often affect areas of up to 20 square km and more. The underwater radiated noise (URN) level from large ships can exceed 100 PSI and is wideband, such that even at km distances of several kilometres from the vessel, the acoustic pressure level is still high. While evidence showed evidence for a clear disturbance impact on the hearing and behavior of marine mammals, there is still no systematic proof to the extent of this effect.

Mark of Award this Project

Generative Deep Features

Picture for Generative Deep Features

2021

Student/s: Hila Manor, Da-El Klang

Supervisor/s: Tamar Rott Shaham

The goal of this work is to research the capability of generating a completely new image with the same visual content of a single given natural image, by using unsupervised learning of a deep neural network without the use of a GAN. This project is based on the work presented in the paper: "SinGAN: Learning a Generative Model from a Single Natural Image" (Rot-Shahamet al.) Different papers published in the last couple of years have already established the connection between the deep features of classification networks and the semantic content of images, such that we can define the visual content of an image by the statistics of its deep features.

Deep Learning Based Image Processing for a Smartphone Camera

Picture for Deep Learning Based Image Processing for a Smartphone Camera

2021

Student/s: Alexey Golub, Yanay Dado

Supervisor/s: Dr. Meir Bar-Zohar

In the first part of the project, our focus was on the PyNET network. This network was designed to replace the full ISP pipeline, which is responsible for the conversion of raw information detected by a digital camera sensor (known as a Bayer image or a RAW image) into the color image seen on the screen (of the DSLR camera, of the smartphone, etc.). Specifically, we tested different loss functions in order to improve PyNETs performance. In the second part of the project, we explored additional ways to improve this performance.

Voice DeepFake

Picture for Voice DeepFake

2021

Student/s: Idan Roth, Zahi Cohen

Supervisor/s: Yair Moshe

The goal of this work is to design a method for performing voice conversion between two speakers. The method employs deep learning techniques, particularly autoencoder architecture, to convert the source speakers voice into the target speakers voice while preserving the source speakers linguistic content. The baseline model architecture is VC-AGAIN. This model uses a one-shot approach. In this approach, it is sufficient to receive in the inference stage a single speech signal from the source and target speakers, on whom the system has not been trained, in order to perform voice conversion.

Seeing Sound: Estimating Image From Sound

Picture for Seeing Sound: Estimating Image From Sound

2021

Student/s: Sagy Gersh, Yahav Vinokur

Supervisor/s: Tamar Rott Shaham, Idan Kligvassser

The goal of this work is to train a neural network so that it will reconstruct an image of an audio signal input source. Under the assumption that the audio signal contains enough features of the image that created it, we tried to use an audio classifier to extract those features and transform them to a features vector, from which we can reconstruct the audio source image using GAN. The transformation was achieved using a simple deep neural network, which has been successful in reconstructing images in a small domain (only 2 classes of image, audio pairs of musical instruments) training and test cases.

Mark of Award this Project

Gunshot Detection in Video Games

Picture for Gunshot Detection in Video Games

2020

Student/s: Amit Ben Aroush, Asaf Arad

Supervisor/s: Hadas Ofir

The projects goal is to build an automated system for real-time acoustic detection of gunshots in computer game scenarios, using deep learning. The system uses a neural network to detect gunshots. A few stages were in the development process: First, we constructed the dataset. During this work we tested several features and chose those who proved separation between audio segments that contain gunshots to those who do not. The second stage was to find a network suitable for the needs of the project and train it using our dataset, and to perform real-time test to detect gunshots. This solution was compared to traditional classifying methods, e.g.

Mark of Award this Project

Deep Image Interpolation

Picture for Deep Image Interpolation

2020

Student/s: Navve Wasserman, Noam Rotstein

Supervisor/s: Tomer Michaeli

Images that describe the real world are naturally continuous functions that lose significant amount of information when transferring to the discrete digital world. Therefore, the ability to perform various actions on the digital image is required in order to complete missing information, improve the quality of the digital image, and preserve its natural appearance and properties. The classic method that is still used today in a wide variety of applications is interpolation. In the project we present a new method for interpolation using neural networks. The method uses a neural network to estimate the continuous function that describes each image.

Mark of Award this Project

Unsupervised Abnormality Detection by Using Intelligent and Heterogeneous Autonomous Systems

Picture for Unsupervised Abnormality Detection by Using Intelligent and Heterogeneous Autonomous Systems

2020

Student/s: David Ben-Said, Samuel Sendrowicz, Theo Adrai

Supervisor/s: Yair Moshe & Pavel Lifshits

This project describes a method for unsupervised anomaly detection in autonomous systems, submitted for the IEEE Signal Processing Cup 2020. The challenge involves determining the abnormality behavior of autonomous aerial vehicles using a sequence of images (video) and Inertial Measurement Unit (IMU) data. The proposed method handles this noisy highdimensional multimodal data by embedding both appearance and motion features in a reduced dimension representation, utilizing the correlation between sensors as an indicator of abnormality. First, appearance features are extracted from each image by a pre-trained ResNet-18 deep neural network.

Mark of Award this Project

Early Detection of Cancer Using Thermal Video Analysis

Picture for Early Detection of Cancer Using Thermal Video Analysis

2019

Student/s: Idan Barazani

Supervisor/s: Aviad Levis, Ori Bryt

Cancer is a major challenge to modern medicine, the disease has many victims everywhere in the world, therefore multiple efforts and resources are invested in the attempt for cancer annihilation. As part of the characterization of diseases in general and especially cancer, early detection has the potential to increase the patient's chances of recovery. The primary goal of the project - early identification of external cancer (tongue/ cheek /Lip) using cooling and heating patterns of these biological tissues.

Mark of Award this Project

Speaker Diarization using Deep Learning

Picture for Speaker Diarization using Deep Learning

2019

Student/s: Matanel Yaacov, Shay Avig

Supervisor/s: Nurit Spingarn

Speaker Diarization is a process of dividing a given sound segment or audio stream into segments based on the speaker's identity. This method is designed to answer the question "Who spoke and when?" And can be useful in many different cases where it is important to know the speaker's identity. For example, phone calls, radio interviews, podcasts, and even emergencies where recordings from the scene are investigated (black boxes in aircraft, etc. ...). Speaker Diarization is the well-known and famous method for segmenting audio segments by speaker identity, which until today has been implemented by classical algorithms from audio signal processing.

Mark of Award this Project

Physics Classroom Augmented Reality with Your Smartphone Part B

Picture for Physics Classroom Augmented Reality with Your Smartphone Part B

2019

Student/s: Georgee Tsintsadze, Yonatan Sackstein

Supervisor/s: Yair Moshe

The project, Physics Classroom Augmented Reality with Your Smartphone, is the second project having the same goal as the previous one creating an android app that will allow, using a photo of a drawing of a physical system, to create a running simulation of the said physical system. This project uses classic image processing algorithms and animation programming tools. The project is based on a previous one that detects, classifies and localizes objects in an image. The first stage of the project was to create an application for presenting a simple animation of a physical interaction.

Mark of Award this Project

Deep Learning for Physics Classroom Augmented Reality App

Picture for Deep Learning for Physics Classroom Augmented Reality App

2019

Student/s: Tom Kratter, Yonatan Sackstein

Supervisor/s: Yair Moshe

The project Deep Learning for Classroom Augmented Reality Android App is a second project having the same goal as the previous one creating an android app that will allow, using an image of a drawing of a physical system, to create a running simulation of the said physical system. The goal of this project, similar to that of the previous project didnt succeed, and as part of the overall solution, is to classify and localize different objects in the drawing of the physical system. Our project tries (and usually succeeds) to do so using deep learning algorithms, as opposed to the previous project that tried and hasnt managed to do so using classic image processing algorithms.

Mark of Award this Project

Efficient Deep Learning for Pedestrian Traffic Light Recognition

Picture for Efficient Deep Learning for Pedestrian Traffic Light Recognition

2019

Student/s: Roni Ash, Dolev Ofri

Supervisor/s: Yair Moshe

Crossing a road is a dangerous activity for pedestrians and therefore pedestrian crossings and intersections often include pedestrian-directed traffic lights. These traffic lights may be accompanied by audio signals to aid the visually impaired. In many cases, when such an audio signal is not available, a visually impaired pedestrian cannot cross the road without help. In this project, we propose a technique that may help visually impaired people by detecting pedestrian traffic lights and their state (walk/dont walk) from video taken with a mobile phone camera.

Optimizing Mutual Information in Deep Neural Networks

Picture for Optimizing Mutual Information in Deep Neural Networks

2018

Student/s: Adar Elad, Doron Haviv

Supervisor/s: Prof. Tomer Michaeli

The recently proposed information bottleneck (IB) theory of deep nets suggests that during training, each layer attempts to maximize its mutual information (MI) with the target labels (so as to allow good prediction accuracy), while minimizing its MI with the input (leading to effective compression and thus good generalization). To date, evidence of this phenomenon has been indirect and aroused controversy due to theoretical and practical complications. In particular, it has been pointed out that the MI with the input is theoretically infinite in many cases of interest, and that the MI with the target is fundamentally difficult to estimate in high dimensions.

Mark of Award this Project

From Deep Features To Image Restoration

Picture for From Deep Features To Image Restoration

2018

Student/s: Ori Sztyglic

Supervisor/s: Tamar Rott, Idan Kligvasser

In recent years, the use of deep features as an image perceptual descriptor is very popular, mainly for measuring the perceptual similarity between two images. In the field of image restoration, this has proved to be very useful for tasks such as super-resolution and style transfer. In this project, we suggest a different direction: rather than using deep features as a similarity measure, we suggest using them to construct a natural image prior. This can be done by learning the statistics of natural image's deep features. Using this special prior, we can gain from both world: the deep one, and the "classic" one.

Video Classification Using Deep Learning

Picture for Video Classification Using Deep Learning

2018

Student/s: Ifat Abramovich, Tomer Ben-Yehuda

Supervisor/s: Dr. Rami Cohen

Much recent advancement in Computer Vision is attributed to large datasets and the ability to use them to train deep neural networks. In 2016 Google announced the publishing of a public dataset containing about 8-million tagged videos called YouTube-8M. In this project, we used this database to train several deep neural networks for tagging videos in a variety of categories. In the first stage, we downloaded 5000 videos for 5 different categories. Next, we trained two deep networks, with slightly different architectures, to tag a video into one of the five categories. One network uses the LSTM architecture and the other uses the BiLSTM architecture.

Pedestrian Traffic Light Recognition for the Visually Impaired Using Deep Learning

Picture for Pedestrian Traffic Light Recognition for the Visually Impaired Using Deep Learning

2018

Student/s: Idan Friedman, Jonathan Brokman

Supervisor/s: Yair Moshe

This project is a part of a series of projects carried out in SIPL dedicated to creating an Android application that will assist the visually impaired people with pedestrian traffic lights. The current project consists of two parts: 1. Recognition of pedestrian traffic lights in a single image taken with a mobile phone from a pedestrian perspective. We use the Faster RCNN object detector with transfer learning on more than 900 pedestrian traffic light images, and achieve 98% accuracy. 2. Using the recognition module from part 1 along with object tracking to detect light switches from red to green or vice versa, for improved recognition robustness. For this aim, we use the KCF object tracker.

Mark of Award this Project

Advanced Framework For Deep Reinforcement Learning

Picture for Advanced Framework For Deep Reinforcement Learning

2015

Student/s: Shai Rozenberg, Nadav Bhonker

Supervisor/s: Itay Hubara

This project is based on previous work done by Google Deep Mind, in which reinforcement learning was used in order to teach a computer to play computer games on an Atari 2600 game console, which was popular in the 70s and 80s. In our project, we build a more advanced learning environment thus supporting a more advanced game console, Super Nintendo Entertainment System (SNES), and by so more complex and stochastic computer games and with the proper modifications to the algorithm, improve the human-like behavior and decision process of the computer.