Skip links to main content

SIPL Projects

Audio & Speech Signals

Classification of ultrasonic vocalizations in mice using Deep Neural Networks to predict autistic behavior

Picture for Classification of ultrasonic vocalizations in mice using Deep Neural Networks to predict autistic behavior

2025

Student/s: Matan Levy and Tali Leiba

Supervisor/s: Dror Lederman

This work focused on analyzing and classifying ultrasonic vocalizations (USVs) of mice using deep neural networks from the PANN family, aiming to predict behaviors associated with autism. The primary objective was to evaluate the model's ability to differentiate between mice from the WT (control group) and HT (research group with a genetic mutation linked to autism). As part of the work, an automated process was developed for USV data analysis, which included: • Data preparation: Organizing and structuring files, using advanced augmentation tools, and addressing challenges such as significant data imbalance between groups.

Mark of Award this Project

How Does a Lexical Stress Look like?

Picture for How Does a Lexical Stress Look like?

2025

Student/s: Itai Allouche and Itay Asael

Supervisor/s: Rotem Rousso & Prof. Yossi Keshet

Lexical stress plays a crucial role in distinguishing word meanings and grammatical functions, particularly in minimal pairs (e.g., PREsent vs. presENT). The aim is to propose a classifier for detecting the stressed syllable and understanding the acoustic features underlying the decision. Disyllabic stress minimal word pairs and non-minimal word pairs (e.g., WALlet vs. extEND) were selected, and these words were located in several speech corpora using a forced aligner. A part-of-speech tagging system was used to label each minimal pair’s words as either a noun, which is associated with stress on the first syllable, or a verb, which is associated with stress on the last syllable.

Enhancing Speech-to-Text Models with Speech Signal Augmentations

Picture for Enhancing Speech-to-Text Models with Speech Signal Augmentations

2025

Student/s: Nitzan Alt and Gal Raviv

Automatic Speech Recognition (ASR) systems have seen major improvements through deep learning, yet they remain sensitive to variations in speaking rate, background noise, and speaker identity—factors that commonly degrade real-world performance. To address this, data augmentation is widely used to increase model robustness. This project evaluates the effectiveness of ScalerGAN, a generative augmentation technique, compared to classical fixed transformations such as time wrapping. ScalerGAN leverages a Generative Adversarial Network (GAN) to create realistic, faster-speaking mel-spectrograms aimed at enriching the training data with more natural variability.

Voice interference detection

Picture for Voice interference detection

2025

Student/s: Or Beyar

Supervisor/s: Nimrod Peleg & Roni Chernyak

Dysphonia is the impairment of voice production as diagnosed by a clinician, often used interchangeably with the complaint of hoarseness, which is a symptom of altered voice quality. While many patients experience dysphonia as a natural part of the aging process, it can be a symptom of a serious underlying condition. The goal of this project is to continue the work done by previous projects and try and improve the classification results by pre-processing the data in a different way than done previously. An algorithm that determines whether the patient is healthy or suffers from various voice disorders was developed.

Speaker Coach: A Real-Time System for Public Speaking Feedback

Picture for Speaker Coach: A Real-Time System for Public Speaking Feedback

2025

Student/s: Amit Belzer, Ravid Goldenberg

Supervisor/s: Ori Bryt & Erez Shalev

The project "Speaker Training" focuses on developing a system for ranking speakers based on audio recordings. The goal is to create a neural network that can identify and rank different speakers based on acoustic features, using the TEDLIUM dataset. This dataset contains recordings from various speakers in different languages, allowing the network to learn the unique characteristics of each speaker. The process includes stages of audio signal processing, extracting features such as frequency parameters, speech rate, and other acoustic traits, followed by training the neural network to identify the speaker and provide a ranking.

MWA – Multilingual Word Aligner

Picture for MWA – Multilingual Word Aligner

2025

Student/s: Roy Weber, Meidan Zehavi

Supervisor/s: Rotem Rousso & Prof. Yossi Keshet

We present the Multilingual Word Aligner (MWA), a new open source Multilingual model for words timestamps boundaries alignment for given audio. We composed new embedding and developed architecture for accurate, multilingual word alignment reached state-of-the-art performances on Timit and Buckeye known benchmarks. We evaluate Transformer, Conformer and VGG as sequence boundary detection models in our experiments, ultimately selecting Conformer, based on its superior performance. Finally, dynamic programming was used to perform the final alignment, refining the boundaries and ensuring accurate word segmentation.

Low Latency Voice Conversion

Picture for Low Latency Voice Conversion

2024

Student/s: Lior Bashari, Yonatan Kleerekoper

Supervisor/s: Yair Moshe

Voice Conversion (VC) involves modifying one or more aspects of a speech signal while preserving linguistic information. Deep learning-based voice conversion is a relatively new area that focuses mainly on improving quality but often suffers from high latency due to sequential computation and high computation complexity. The projects goal is to develop a deep learning-based VC with latency of up to 400 milliseconds suitable for real-time applications. We propose an approach based on low-latency QuickVC by Guo et al. Our solution uses 5-second windows with 250-millisecond delay on the first window, enabling real-time processing while maintaining high quality.

Estimating breathing clinical data using a smart stethoscope

Picture for Estimating breathing clinical data using a smart stethoscope

2024

Student/s: Ness Alkobi, Gal Epshtein

Supervisor/s: Yehonatan-Itay Segman & Hadas Ofir

The goal of this work is to estimate the breathing cycle, with an emphasis on the inhalation and exhalation of recordings made by a smart stethoscope from Sanolla. Additionally, this estimation is supposed to aid in identifying potential lung diseases. In the initial and primary stage of the project, we used filters and various techniques to filter out noise from the signal that was data obtained from the stethoscope’s accelerometer. The clean signal was used for estimating breathing cycle and identify patterns that provided us with information about the inhalation and exhalation process.

COVID-19 Detection by Cough Sound Classification

Picture for COVID-19 Detection by Cough Sound Classification

2024

Student/s: Kevin Benhamou, Benjamin Amsellem

Supervisor/s: Yair Moshe

In 2020, the COVID-19 pandemic, a highly contagious respiratory disease, began spreading globally. Before effective vaccines were developed, the primary strategy to control the outbreak was to quickly identify and isolate infected individuals to prevent them from transmitting the virus to others. To facilitate this, we sought in this project to develop a tool that could efficiently detect the presence of the coronavirus through analysis of patient cough sounds recorded on smartphones. A major challenge was obtaining relevant training data, so we used data from similar projects conducted at other universities.

Emotional Speech Synthesis

Picture for Emotional Speech Synthesis

2024

Student/s: Sagi Eyal, Loren Tzveniashvily

Supervisor/s: Yair Moshe

The goal of this work was to perform emotional speech synthesis. First, we experimented with the emotional voice conversion approach, where the system receives two voice signals and transfers the emotion from one recording to another. Later in the project, we focused on the emotional text-to-speech approach, where the system receives the transcription of the sentence we want to synthesize and the desired emotion and generates a recording of the desired sentence with the given emotion. First, we reproduced the results of the EmoSpeech system, which converts text to emotional speech in a fast and high-quality manner.

Automatic Speech Recognition for Torah Reading with Cantillation Marks

Picture for Automatic Speech Recognition for Torah Reading with Cantillation Marks

2024

Student/s: Aviv Shem-Tov and Ori Levi

Supervisor/s: Oren Mishali & Nimrod Peleg

This work aims to develop a speech-to-text model that recognizes Torah readings with cantillation marks (Trop) and transcribes the verses accurately, including the cantillations. This model will enable the detection of reading errors and suggest corrections, thereby improving the accuracy of Torah readings. The project focuses on developing a system capable of listening to Torah readings, identifying the spoken text, and generating an accurate transcription, including the cantillation marks. These marks are special symbols accompanying the biblical text that indicate pronunciation, intonation, and word emphasis.

Mark of Award this Project

RTF Estimation Using Riemannian Geometry for Speech Enhancement in the Presence of Interferences

Picture for RTF Estimation Using Riemannian Geometry for Speech Enhancement in the Presence of Interferences

2024

Student/s: Or Ronai and Yuval Sitton

Supervisor/s: Amitay Bar & Prof. Ronen Talmon

We address the problem of multichannel audio signal enhancement in reverberant environments with interfering sources. We propose an approach that leverages the Riemannian geometry of the spatial correlation matrices of the received signals to estimate the relative transfer function (RTF) of the desired source. Specifically, we compute the spatial correlation matrices in short-time segments, and subsequently, their Riemannian mean, which preserves shared spectral components while attenuating unshared ones. This enables an effective intermittent interference rejection, leading to accurate RTF estimation.

Location Estimation of Illegal Firework Launch Source Using Localized Microphone Array

Picture for Location Estimation of Illegal Firework Launch Source Using Localized Microphone Array

2024

Student/s: Adam Antoshin and Roy Shpilberg

Supervisor/s: Hadas Ofir

In this work an acoustic-based algorithm for locating illegal firework launches was developed. Given the sounds from the firework’s launch flight and explosion, which are recorded by a microphone array, the launch point must be located. The main challenge is that, unlike existing geopositioning algorithms, which assume that the distance between the microphones in the array is of the same order of magnitude as the distance from the microphones to the source, in our case, the distance between the microphones is negligible compared to the distance from the array to the source. In addition, the result must be calculated within a few seconds.

Read Up: Echo Reading Voice Analysis for Enhanced Literacy

Picture for Read Up: Echo Reading Voice Analysis for Enhanced Literacy

2024

Student/s: Ory Schaul and Asaf Meseri

Supervisor/s: Hadas Ofir

This work focused on creating a platform for the Koren Center, which supports children, teens, and adults with reading and writing difficulties. The Koren Center’s methodology involves students reading aloud short text segments displayed on a screen, while the teacher listens and marks the errors for himself. Our goal was to develop a system capable of detecting reading and pronunciation errors autonomously, without requiring a human supervisor. The main errors defined by the Koren Center are: "omission of words", "change of word order" and "pronunciation errors".

Speech Enhancement for Augmented Reality

Picture for Speech Enhancement for Augmented Reality

2024

Student/s: Or Norman and Leah Gasman

Supervisor/s: Ariel Frank

This work focuses on solving the cocktail party problem, a well-known challenge in the field of signal processing. The goal of the project is to identify and enhance the audio of a specific conversation from among multiple conversations in the same space and in the presence of background noise. This is particularly relevant for applications such as hearing aids, video calls in noisy environments, and automatic speech recognition systems. We implemented the solution by creating a beamformer to enhance noisy speech signals captured by a wearable microphone array. The MVDR Beamformer allows us to focus on the desired source while suppressing background noise.

Acoustic Tracking Drone

Picture for Acoustic Tracking Drone

2024

Student/s: David Molin and David Cojocaru

Supervisor/s: Hadas Ofir

The goal of this work was to develop an algorithm for acoustic detection, localization, and tracking motorized targets, with the intention of implementing this algorithm on a drone in the future, which will be able to actively track a detected motorized target. For the benefit of the project, a small array of microphones is mounted on a drone, which picks up the sound coming from our target. The localization algorithm finds the most likely direction in which the target is located relative to the drone, based on the time differences between the microphones.

Synthetic Speech Attribution (2022 IEEE Signal Processing Cup)

Picture for Synthetic Speech Attribution (2022 IEEE Signal Processing Cup)

2023

Student/s: Rotem Rousso Matan Millionschik, Yael Hamo, Adir Cohen-Nissan

Supervisor/s: Yair Moshe, Pavel Lifshits

This report describes Team SIPLs solution to the 2022 Signal Processing Cup challenge. We developed a method that, given an audio recording of a synthetically generated speech track, can detect which method among a list of candidates has been used to synthesize the speech, and can also accommodate for unknown speech synthesis algorithms. Our solution relies on speech signal analysis using signal processing and machine learning techniques, particularly deep neural networks. Using an ensemble of features and classifiers allows our method to achieve high performance and to be robust to noise. Another strategy we use for noise robustness is data augmentation for training with noisy audio tracks.

A System for Spatial Hearing with 3D speakers in Acoustic Room

Picture for A System for Spatial Hearing with 3D speakers in Acoustic Room

2023

Student/s: Alon Barash

Supervisor/s: Nimrod Peleg, Joseph Attias

The project's goal was to prepare a system infrastructure to enable audiologists and hearing researchers to run clinical experiments of spatial hearing in a special acoustic room with 17 speakers. The project combined hardware and software, with following parts: calibration for all speakers, upgrade of the old control system, software modules that control each speaker, software to enable running experiments with different stimuli and noises, software for GitHub integration into the project for, and software for automatic report generation (used also for experiment reproduction).

Countermeasures Against Speech Manipulation Attacks

Picture for Countermeasures Against Speech Manipulation Attacks

2023

Student/s: Maayan Lifshitz, Ayala Luz

Supervisor/s: Yael Segal

With the expansion of neural network usage, systems based on them have become targets for various manipulation attacks. One of the common types of such attacks is adversarial attacks, which involve adding noise to the incoming signal to the system in order to produce a false outcome (adversary noise addition). This project focuses on adversarial attacks on speech signals in speech classification systems. As part of the research, neural networks based on the VGG model were trained on two types of speech signal datasets: the first one containing words, and the second containing vowels. Attacks of varying intensities were applied to the input signals, causing the network to make mistakes.

Audio Classification using Transformers

Picture for Audio Classification using Transformers

2023

Student/s: Matan Millionschik, Michael Berko

Supervisor/s: Yael Segal

In the last decade, deep learning has been expanding and taking over many areas of signal processing of different kinds - image, audio and text among others. With a set of diverse architectures such as neural networks, convolutional networks and lately, transformers, deep learning showcases better results than seen with classical methods in many signal processing tasks in general, and audio processing specifically. In the last few years, convolutional architectures rule the audio world especially in classification, emotion detection and feature extraction. Similar to the computer vision area, the learned audio features can be optimized on a broad spectrum of datasets and labels.

Acoustic Scene Classification

Picture for Acoustic Scene Classification

2023

Student/s: Shira Lifshitz, Ellinor Elimeleh

Supervisor/s: Dr. Meir Bar-Zohar

This work deals with acoustic scene classification on a dataset published in the DCASE2017 challenge. The goal is to achieve better performance than the performance presented in the challenge, using neural networks and mel-spectrogram features. We present the processing of the dataset, the classifier and models, and the selected hyperparameters. The best performance was obtained using mel-spectrogram features, an EfficientNet V2 S neural network, and a MiniNet net as selection algorithm. Accuracy of 83.33% was achieved, which is higher than the performance to which we compare the results.

Classification of Heart Sounds Using Deep Convolutional Networks

Picture for Classification of Heart Sounds Using Deep Convolutional Networks

2023

Student/s: Shlomi Zvenyashvili, Arik Berenshtein

Supervisor/s: Dr. Meir Bar-Zohar

Heart cardiovascular disease is a leading cause of death globally, with over 17 million deaths each year according to the World Health Organization (WHO). Accurate classification of heart sounds is crucial for early detection and effective management of heart conditions. However, this task is challenging due to the complexity of heart sound data, which includes variations caused by low quality recordings and differing physiological conditions. Robust and efficient models are needed for handling such diverse data and improving diagnostic accuracy. In this work, we propose a machine learning-based solution using Deep Convolutional Networks.

Recognizing Autism in Mice by Analyzing Their Squeaks

Picture for Recognizing Autism in Mice by Analyzing Their Squeaks

2022

Student/s: Itamar Ginsberg, Alon Schreuer

Supervisor/s: Dr. Dror Lederman, Prof. Hava Golan

Diagnosis of autism at an early age is an extensive area of research, as it has a massive impact on the ability to treat and aid those suffering from the syndrome. So far diagnosis has been based on professional behavioral observation, a flawed tool since it is subjective and imprecise, but also due to the fact that it is only effective at a late developmental stage (age 4-5 years). The goal of this work is to develop a diagnostic-assist tool for classifying mice into two categories: mice with symptoms of ASD (Autism Spectrum Disorder) and mice without such symptoms, based on recordings of their squeaks.

Deep Learning Based Target Cancellation for Speech Dereverberation

Picture for Deep Learning Based Target Cancellation for Speech Dereverberation

2022

Student/s: Neriya Golan, Mikhail Klinov

Supervisor/s: Yair Moshe, Baruch Berdugo

Background noise and reverberation can degrade the quality of speech signals and reduce their intelligibility. Reverberations also reduce the performance of important systems such as hearing aids or voice recognition applications. There are a variety of classic methods for dereverberation of speech signals, but their performance is usually unsatisfactory and not generalizable. In light of this, there has been an increase in recent years in research on dereverberation using modern methods based on deep learning.

A System for Real-Time Deep Speech Denoising

Picture for A System for Real-Time Deep Speech Denoising

2022

Student/s: Wajd Boulos, Saba Saba

Supervisor/s: Hadas Ofir

Recorded speech is often mixed with a variety of background noises, such as a leaf blower, washing machine, dog barking, baby crying, kitchen noises, etc. Background noise significantly degrades the quality and intelligibility of the perceived speech. In this project, we present a real-time single channel audio denoising deep learning solution, based on recurrent neural networks. Three algorithms were chosen for this purpose: DCCRN, FullSubnet and one that we proposed which is a mix of the two: Complex-FullSubnet. Weve trained and integrated all 3 algorithms into a real time speech denoising infrastructure and adjusted them to run in a real time environment.

Audio Signals Dereverberation Algorithms Performance Evaluation

Picture for Audio Signals Dereverberation Algorithms Performance Evaluation

2022

Student/s: Nitzan Yehezkel, Nadav Reichler

Supervisor/s: Hadas Ofir

Speech signals processing technologies play an important role in our daily lives, with the focus being on improving the signals quality by reducing noise and reverberations. When an audio signal is received in a microphone array, there are two types of signals added to it which corrupt its quality - noise (statistically independent) and reverberations (statistically dependent). however, most of the existing applications for dereverberation, show reliable performances only when the microphone is posed near the speaker. In addition, finding practical algorithms that can reduce reverberations in real-time remains one of the most difficult challenges of the field.

Ultrasonic User Authentication With Smartphones

Picture for Ultrasonic User Authentication With Smartphones

2022

Student/s: Ofir Ben Yosef, Neta Gevirtzer

Supervisor/s: Alon Eilam

The goal of this work is to implement a solution to the smart user authentication, without users contact with the workstation. The solution includes two applications; the first one is an Android app at the users smartphone which indicates the presence of the user near the workstation. The second app is a windows app which serves as a dynamic lock screen for the workstation. The solution is implemented by using an audio signal, consisted of combination of chosen frequencies, which creates a symbol. A sequence of four symbols is the password for locking\unlocking the workstation. The password played by the mobile phone is encrypted and generated randomly for each user.

Presenter Coach

Picture for Presenter Coach

2022

Student/s: Amit Zach, Hadar Horn

Supervisor/s: Hadas Ofir

In this work we design a tool that is going to assist people with becoming more professional presenters. The tool will be based on an algorithm which will receive a presenter's audio recording as input, extract features out of it, and bestow the presenter with recommendations and advice to improve the quality of their presentation, based on the extracted features. Examples for those features are pitch, pace, sentences and breaks lengths, technical quality measure of the recording, etc. Those are parameters which will serve as reliable indicators to the quality of the presentation.

Speech-to-Singing Conversion Using Deep Learning

Picture for Speech-to-Singing Conversion Using Deep Learning

2022

Student/s: Omri Jurim, Ohad Mochly

Supervisor/s: Yair Moshe, Gal Greshler

The purpose of this work is to develop an algorithm for converting speech to singing using deep learning methods. The system can help memorize various short texts like phone numbers and lists, as well as for entertainment. There are research papers on the subject that are based on classical signal processing methods as well as works that are based on deep learning, but so far (while working on this project) no results have been achieved that preserve speech content so that it is understandable and humane, along with converting it to desired melody.

Voice Disorder Detection via Deep Learning

Picture for Voice Disorder Detection via Deep Learning

2022

Student/s: Yiftach Edelstein, Chen Katzir

Supervisor/s: Hadas Ofir, Dr. Ariel Roitman

The project deals with the diagnosis of various voice pathologies related to the throat and vocal cords which today can only be diagnosed by a long and multi-stage process that includes listening to the patient's voice by an otolaryngologist specialist and then an invasive examination using special equipment. We assume that there is plenty of information about those pathologies in the voice recordings of the subjects, and therefore we wish to use them to design a simpler diagnosis procedure that is based on machine-learning algorithms.

Audio-Visual Voice Activity Detection and Localization Using Deep Correlated Representations

Picture for Audio-Visual Voice Activity Detection and Localization Using Deep Correlated Representations

2022

Student/s: Kfir Bendic, Itzhak Mandelman

Supervisor/s: Ofir Lindenbaum

One of the problems in performing signal processing operations on sound clips stems from noise added to the measurement device. Noise can drastically damage the performance of accurate analysis of an audio signal. One method to deal with this problem is to use multimodal observations so that one of the modalities is not affected by the noise of the other. An example is a video source independent of noise added to the audio, thus unaffected by this noise. This way, one can try to extract information lost in the audio due to the noise, using the video. The purpose of this project is to perform spatial and temporal detection and recognition of speech, both in audio and video.

Seeing Sound: Estimating Image From Sound

Picture for Seeing Sound: Estimating Image From Sound

2022

Student/s: Sagy Gersh, Yahav Vinokur

Supervisor/s: Yair Moshe

The goal of this work is to train a deep neural network so that it can receive an audio signal as input, and output a reconstructed image of the source from which that audio signal was produced. Under the assumption that an audio signal contains spatial properties of the object that produced it, we tried to use an audio classifier to extract these properties and transform them into a feature vector from which we can reconstruct the image from which the source was produced using a deep network with the GAN architecture. This project is a follow-up project with the same goal.

Acoustic Vehicle Localization

Picture for Acoustic Vehicle Localization

2021

Student/s: Nadav Abayov, Gefen Levite

Supervisor/s: Hadas Ofir

While on a busy street or rural roads, pedestrians wearing headphones are exposed to danger from passing traffic. This project is a first stage in developing a system that will detect an approaching vehicle, alert the user of the danger of a vehicle in the vicinity and in addition, will alert its direction and distance. We dealt with determination of the direction of the vehicle in relation to the pedestrian along its trajectory based on the time differences between microphones. Next, we experimented with estimating the distance of the vehicle using machine learning. Finally, we have built an algorithm and a demo system that simulate data processing from microphones in real time.

Indoor/Outdoor Classification of Voice for Mobile Devices

Picture for Indoor/Outdoor Classification of Voice for Mobile Devices

2021

Student/s: Gabriel Mannes, Odelia Longini

Supervisor/s: Ori Bryt

This projects goal is to classify a two-way radio recording into one of two classes: indoor or outdoor recording. A literature review that we have made led us to choose a neural network that was designed for solving a similar problem. The network is a ResNet-based network. The system transforms audio signals to log-mel spectrograms, and the result is then classified by the network. Since the data base we have is too small, another goal was defined for the project: training on a bigger dataset and perform inference on the small one.

Speech Enhancement Evaluation Using Speech Recognition algorithm

Picture for Speech Enhancement Evaluation Using Speech Recognition algorithm

2021

Student/s: Yotam Elia

Supervisor/s: Hadas Ofir, Baruch Berdugo

In the project we created a tool for evaluating speech denoising algorithms. The aim of the project is to evaluate speech enhancement algorithms using automatic speech recognition algorithms. The problem of evaluating the speech enhancement algorithm is mainly based on the difference between a noise-free signal and an intelligible signal for hearing. Since our goal is to test the quality of improvement in terms of human understanding, the choice of speech recognition algorithms is natural. The project had three stages: 1) Getting familiar with the tools I would need to test differently: Baidu, OM-LSA, DTLN.

User Specific Speech Recognition For Controlling a 3D Printed Prosthetic Hand

Picture for User Specific Speech Recognition For Controlling a 3D Printed Prosthetic Hand

2021

Student/s: Noa Tykochinsky, Itay Wengrowicz

Supervisor/s: Shunit Polinsky

In this work we provide a solution for a voice controlling algorithm for a 3D printed prosthetic hand. The goal was to create a cheap and accessible solution, for an algorithm which will recognize and verify the voice of the prosthetic hand user. The system will recognize the words the user is saying in real-time and will be able to detect the activation words and keywords which represent the hand movement. The entire processing time, starting from the moment the audio input was received until the hand movement result, takes about 1.5 seconds and the risk for a false-positive result stand by less than 2%.

Domain Adaptation for Mobile Device Acoustic Based Proximity Sensor

Picture for Domain Adaptation for Mobile Device Acoustic Based Proximity Sensor

2021

Student/s: Niv Menashe

Supervisor/s: Pavel Lifshits

In our modern days, almost every person in the western world owns a relatively new smartphone. Every smartphone is equipped with an infra-red photoelectric proximity sensor, which is placed next to the phones speaker, and used for turning off the smartphones screen when the sensor is blocked. In example, when one is receiving a phone call, the person places the smartphone besides his ear, thus create a blockage on the proximity sensor that will turn off the screen. We propose a different approach for creating a proximity sensor, which does not require special sensor, using an acoustic based proximity sensor convenient method that does not require any additional hardware.

Features Extraction for Classification of Dolphin Sounds

Picture for Features Extraction for Classification of Dolphin Sounds

2021

Student/s: Harel Plut, Or Cohen

Supervisor/s: Dr. Roee Diamant

With the large increase in human marine activity, our rivers and seas have become populated with boats and ships projecting acoustic emissions of extremely high power that often affect areas of up to 20 square km and more. The underwater radiated noise (URN) level from large ships can exceed 100 PSI and is wideband, such that even at km distances of several kilometres from the vessel, the acoustic pressure level is still high. While evidence showed evidence for a clear disturbance impact on the hearing and behavior of marine mammals, there is still no systematic proof to the extent of this effect.

Classification of Dolphin Whistles

Picture for Classification of Dolphin Whistles

2021

Student/s: Jonathan Masin, Racheli Katz

Supervisor/s: Dr. Roee Diamant

With the large increase in human marine activity, our seas have become populated with boats and ships projecting acoustic emissions of extremely high power that often affect areas of up to 20 square km and more. The underwater radiated noise (URN) level from large ships can exceed 100 PSI and is wideband, such that even at distances of several kilometres from the vessel, the acoustic pressure level is still high with a clear disturbance impact on the hearing and behaviour of marine fauna.

Otoacoustic Emissions (OAE) as a Tool for Early Autism Diagnosis

Picture for Otoacoustic Emissions (OAE) as a Tool for Early Autism Diagnosis

2021

Student/s: Amit Shpigelman, Simcha Lipner

Supervisor/s: Barr Morgenstein

One of the main challenges with diagnosing autism is the lack of a structured method of diagnosis. Today, the way autism is diagnosed is by observing behaviors of the subjects, which develop at a late stage, around the age of 4. This late diagnosis has its toll, mainly a late start of treatment and increased difficulty in integration into general society. SensPD company, with whom we worked during the project, have a theory that autism can be diagnosed using a cutting-edge method.

Acoustic Fence Using Multi-Microphone Speaker Separation

Picture for Acoustic Fence Using Multi-Microphone Speaker Separation

2021

Student/s: Tomer Fait, Orel Ben-Reuven

Supervisor/s: Amir Ivry

The goal of an acoustic fencing algorithm is to separate speakers by their physical location in space. In this project, we examine an algorithm which solves this problem, define suitable performance criteria, and test the algorithm in varied environments, both simulated and real. The real recordings were acquired by us with suitable acoustic equipment. We examine a speech separation algorithm based on spectral masking inferred from the speakers direction. The algorithm assumes the existence of a dominant speaker in each time-frequency (TF) bin and classifies these bins by employing a deep convolutional neural network.

Voice DeepFake

Picture for Voice DeepFake

2021

Student/s: Idan Roth, Zahi Cohen

Supervisor/s: Yair Moshe

The goal of this work is to design a method for performing voice conversion between two speakers. The method employs deep learning techniques, particularly autoencoder architecture, to convert the source speakers voice into the target speakers voice while preserving the source speakers linguistic content. The baseline model architecture is VC-AGAIN. This model uses a one-shot approach. In this approach, it is sufficient to receive in the inference stage a single speech signal from the source and target speakers, on whom the system has not been trained, in order to perform voice conversion.

Mark of Award this Project

Speaker Localization Inside a Car Using a Microphone Array

Picture for Speaker Localization Inside a Car Using a Microphone Array

2020

Student/s: Or Streicher, Adir Goldovsky

Supervisor/s: Ori Kats

We present the results of a final project that was intended to evaluate a speaker position inside a vehicle, using semi-supervised learning algorithm. The dataset that was used in order to create and validate the proposed solution was labeled recordings which were sampled by an array of microphones positioned at the front of the vehicle. The recordings were labeled by different parameters (such as head angle of the speaker, presence of another person in the vehicle, etc). while the main parameter was the speaker position.

Mark of Award this Project

Iterative adaptive estimation of underwater channel transfer function based on soft information using turbo equalization

Picture for Iterative adaptive estimation of underwater channel transfer function based on soft information using turbo equalization

2020

Student/s: Asaf Gendler, Nadav Shalev

Supervisor/s: Kobi Bucris

Underwater acoustic communication has a rising interest in recent years as a result of increasing use of autonomous underwater vehicles. Underwater communication creates a difficult challenge because of different reasons such as ISI, Doppler and time variant channels, in addition to lack of research in compare to RF. In order to overcome the channel distortion problems, it is common to use equalizers for diminishing the channel effect. The state of the art equalizer today is a DFE followed by a standard decoding scheme.

Mark of Award this Project

Proximity Sensor for Smartphones based on Acoustic Measurements

Picture for Proximity Sensor for Smartphones based on Acoustic Measurements

2019

Student/s: Pavel Lifshits

Supervisor/s: Andy Rodan, Zacharie Cohen

Modern mobile phones are equipped with infra-red photoelectric proximity sensor, most commonly applied to turn off the touch screen during a phone call to prevent accidental touches when users face/ear is detected in proximity to the screen. In this project we propose to achieve the sensing functionality without using a special sensor. Specifically, we use an already existing speaker and microphones for proximity sensing, without interfering with their originally intended operation. We build our method on the observation that the transfer function from the mobile phone speaker to the microphones varies as a function of objects located in the vicinity of the mobile phone.

Mark of Award this Project

Speaker Diarization using Deep Learning

Picture for Speaker Diarization using Deep Learning

2019

Student/s: Matanel Yaacov, Shay Avig

Supervisor/s: Nurit Spingarn

Speaker Diarization is a process of dividing a given sound segment or audio stream into segments based on the speaker's identity. This method is designed to answer the question "Who spoke and when?" And can be useful in many different cases where it is important to know the speaker's identity. For example, phone calls, radio interviews, podcasts, and even emergencies where recordings from the scene are investigated (black boxes in aircraft, etc. ...). Speaker Diarization is the well-known and famous method for segmenting audio segments by speaker identity, which until today has been implemented by classical algorithms from audio signal processing.

Acoustic 3D Positioning of Smartphones in Motion

Picture for Acoustic 3D Positioning of Smartphones in Motion

2019

Student/s: Guy Dascalu, Omer Movshovits

Supervisor/s: Alon Eilam

In recent years, as the use of mobile smartphones, personal assistants and other IoT devices is growing, there is an increasing demand for positioning systems that provide a reliable and accurate location in areas where Global Positioning System (GPS) cannot work. Places such as office buildings, museums, parking lots, airports and shopping malls all suffer from the limitation of satellite signals that do not pass through metal and concrete walls. Some methods for positioning in such environments make use of electromagnetic waves such as Bluetooth or Wi-Fi, while others utilize a different approach where acoustic, often ultrasonic waves are used.

Mark of Award this Project

Robust Automatic Detector And Feature Extractor For Dolphin Whistles

Picture for Robust Automatic Detector And Feature Extractor For Dolphin Whistles

2019

Student/s: Guy Shkury, Yoel Bud

Supervisor/s: Roee Diamant

A key in Dolphins conservation efforts is population estimation in their natural environment. A common method for mapping Dolphins appearance is the detection of their vocalizations. In this paper, we propose a novel detection technique for Dolphins whistles, referred to as ECV (Entropy, Correlation, and Viterbi algorithm). ECV is a robust detector of low complexity that automatically detects dolphins whistles and extracts their spectral features, using a single receiver with only a few system parameters. The method employs a chain of decisions based on spectral entropy and time-domain correlation followed by constrained Viterbi algorithm to extract the whistles features.

Mark of Award this Project

Acoustic positioning with unsynchronized sound sources

Picture for Acoustic positioning with unsynchronized sound sources

2018

Student/s: Guy Feferman, Michal Blatt

Supervisor/s: Alon Eilam, Guy Shofen

The problem In recent years, cellular phones are widely used for GPS-based navigation. There is a growing demand to provide navigation with cellular phones in areas where GPS signals can't be received such as in airports, hospitals, shopping centers and underground parking lots. Furthermore, it is required to provide better than GPS accuracy for locating products in a supermarket, finding rental cars pickup spots or a hotel room door in a corridor. In this project we created an indoor positioning system for cellular phones, based on unsynchronized acoustic signals.

Mark of Award this Project

Audio retrieval by voice imitation

Picture for Audio retrieval by voice imitation

2018

Student/s: Mohamad Khatib, Samah Khawaled

Supervisor/s: Hadas Benisty

Using existing sound search systems by textual keywords is problematic because it is not done by accurately describing the exact sound content the user is looking for. Recently, an innovative direction has been proposed for sound search systems - search by imitation. The goal of our project is to design a technique for searching sounds in a database given an audio signal that is an imitation of the desired sound. The user imitates sound, for example: (cat sound, or the sound of a landing plane) and the system returns the most likely audio as an output from a library that contains the dataset. This system provides innovative tools for interaction with computerized systems.

Mark of Award this Project

Siren Detecction Algorithm in Noisy Environment for The Hearing Impared

Picture for Siren Detecction Algorithm in Noisy Environment for The Hearing Impared

2017

Student/s: Ariel Yeshurun, Dean Carmel

Supervisor/s: Yair Moshe

People with Hearing Disabilities experience many difficulties in everyday life that affects them and their surroundings. The technological development in our lives helped them in many areas, but it made Driving even harder experience. They can't hear noises and beeping, but most importantly they can't hear approaching emergency vehicles. This inability makes them a safety hazard both to themselves and to their surroundings, because they can accidentally cause a roadblock or even an accident. There isn't a uniform standard for sirens today, and there isn't an algorithm that can detect sirens from different countries.

Speaker Diarization Using Dimension Reduction

Picture for Speaker Diarization Using Dimension Reduction

2016

Student/s: Lee Twito, Ori Shahar

Supervisor/s: Nurit Spingarn

Diarization problem is well known problem in the world of speech recognition and speech processing. Our project goal is Speaker diarization in recorded conversation. We try a new approach for solving this problem, using dimension reduction algorithm (LLE). The results are compared to a famous method for solving this problem, using Bottom-Up algorithm. We tested our method on merged TIMIT files, and recordings we recorded by ourselves.

Mark of Award this Project

Audio QR Over Streaming Media

Picture for Audio QR Over Streaming Media

2016

Student/s: Gal Binyamin, Itai Dagan

Supervisor/s: Alon Eilam

We describe a system that delivers a website address to a cellular phone by encoding inaudible binary data in an analogue audio signal, which is received by the microphone of the cellular phone. This is an alternative to encoding a web site address in a QR code label, which is scanned by the cellular phones camera. Data embedding in the audio signal is done by modifying the phase of the signal's modulated complex lapped transform (MCLT) coefficients, while the perceived quality of the embedded audio signal remains the same as that of the original audio signal. A whole system was implemented and tested both in simulation and in reality. The data rate achieved in reality at a distance of 1.

Distress Situation Detection in Speech

Picture for Distress Situation Detection in Speech

2016

Student/s: Yehav Alkaher, Osher Dahan

Supervisor/s: Yair Moshe

When a person is in a distress situation, there are signs which are reflected in his speech or in the audio of its surroundings. The project deals with speaker-independent distress detection in speech of a single speaker. The solution involves the extraction of relevant features from the speech signal and the comparison between the different methods of extraction. A distress situation is defined by the discrete emotions of anger and fear. The project concludes with the classification of distress in speech with 91% accuracy, using the Berlin Emotional Speech Database in the German language.

Cry-based Detection of Developmental Disorders in Infants

Picture for Cry-based Detection of Developmental Disorders in Infants

2015

Student/s: Amit Oren, Avi Matzliach

Supervisor/s: Rami Cohen

Developmental disorders are a group of neurological conditions originating at childhood, that involve serious impairments in various areas (language, learning, motor skills). These conditions also comprise Autism Spectrum Disorders. As of 2008, approximately 15% of children in the United States have been diagnosed with some sort of developmental disorder, is comparison to only 12.8% in 1997 [1]. Early detection of developmental disorders is crucial, as it enables early intervention (e.g. speech therapist, occupational therapy), which may reduce neurological and functional deficits in infants.

Fax In Your Pocket

Project default image

2014

Student/s: Smadar Shapira

Supervisor/s: Alon Eilam & Pavel Lifshits

...

Mark of Award this Project

Non-Coherent Multichannel Speech Enhancement in Non-Stationary Noise Environments

Project default image

2010

Student/s: Nir Kahana, Liav Levi

Supervisor/s: Ronen Talmon

...

Mark of Award this Project

People Metering Using Mobile Devices

Project default image

2010

Student/s: Oded Yeruhami, Yuval Bahat

Supervisor/s: Rafi Steinberg

...

Mark of Award this Project

Temporal Decomposition of Speech

Project default image

2006

Student/s: Erez Cohen, Ronen Krupnik

Supervisor/s: Guy Narkiss

...

Mark of Award this Project

Speech Bandwidth Extension

Project default image

2004

Student/s: Eran Borstain, Aviram Shmueli

Supervisor/s: Ariel Sagi

...

Mark of Award this Project

Detection of Spectral Signature in SONAR Signal, Part A+B

Project default image

2004

Student/s: Ronen Peisakh, Amir Chasdai

Supervisor/s: Erez Sabbag

...

Mark of Award this Project

Voice Morphing

Project default image

2002

Student/s: Gidon Porat

Supervisor/s: Yizhar Lavner

...

Mark of Award this Project

HMM Based Speech Recognition System

Project default image

2002

Student/s: Tal Levinstein, Boaz Shoval

Supervisor/s: Anelia Baruch-Somekh

...

Mark of Award this Project

Real-Time Digital Watermarking System for Audio Signals Using Perceptual Masking

Picture for Real-Time Digital Watermarking System for Audio Signals Using Perceptual Masking

2001

Student/s: Yuval Cassuto, Michael Lustig

Supervisor/s: Shay Mizrachi

Recent development in the field of digital media raises the issue of copyright protection. Digital watermarking offers a solution to copyright violation problems. The watermark is a signature, embedded within the data of the original signal, which in addition to being inaudible to the human ear, should also be statistically undetectable, and resistant to any attempts to remove it. In addition, the watermark should be able to resolve multiple ownership claims (known as the deadlock problem), which is achieved by using the original signal (i.e., the unsigned signal) in the signature detection process.

Mark of Award this Project

Development of a New Algorithm for Voice Modification

Project default image

2001

Student/s: Assaf Rubin, Michael Kats

Supervisor/s: Yizhar Lavner

...

Mark of Award this Project

Real-Time Embedding of Digital Watermarking for Audio Signal

Project default image

2000

Student/s: George Leifman, Eran Borenstein, Tal Mizrahi

Supervisor/s: Shay Mizrachi

...

Mark of Award this Project

Real Time Implementation of Low Bit-Rate Speech Compression on Sharc DSP

Project default image

2000

Student/s: Arthur Kol, Yaniv Shaked

Supervisor/s: Ronen Mayrench

...

Mark of Award this Project

Widening of Band-Width in Telephony

Project default image

1996

Student/s: Leonid Sandomirsky, Alex Simchovich

Supervisor/s: Guy Cohen

...

Speech Sounds Playing in PC Platform

Project default image

1990

Student/s: Gal Havshush, Eyal Ovadya

Supervisor/s: Guy Cohen, Amnon Yonash

...

Speech Identification Based on DSP56000

Project default image

1989

Student/s: Ronen Korman, Eyal Virjansky

Supervisor/s: Gal Ben-David

...

Filtering of Adaptive Speech Signals from Noise

Project default image

1989

Student/s: Itzhak Ben-Basat, Ilan Herbst

Supervisor/s: Aharon Satt, Alberto Berstein

...