SIPL Projects

Audio & Speech Signals

Low Latency Voice Conversion
Picture for Low Latency Voice Conversion
2024
Student/s: Lior Bashari, Yonatan Kleerekoper
Supervisor/s: Yair Moshe
Logo of Elbit Systems Collaborator
Voice Conversion (VC) involves modifying one or more aspects of a speech signal while preserving linguistic information. Deep learning-based voice conversion is a relatively new area that focuses mainly on improving quality but often suffers from high latency due to sequential computation and high computation complexity. The projects goal is to develop a deep learning-based VC with latency of up to 400 milliseconds suitable for real-time applications. We propose an approach based on low-latency QuickVC by Guo et al. Our solution uses 5-second windows with 250-millisecond delay on the first window, enabling real-time processing while maintaining high quality.
Estimating breathing clinical data using a smart stethoscope
Picture for Estimating breathing clinical data using a smart stethoscope
2024
Student/s: Ness Alkobi, Gal Epshtein
Supervisor/s: Yehonatan-Itay Segman & Hadas Ofir
Logo of Sanolla Collaborator
The goal of this work is to estimate the breathing cycle, with an emphasis on the inhalation and exhalation of recordings made by a smart stethoscope from Sanolla. Additionally, this estimation is supposed to aid in identifying potential lung diseases. In the initial and primary stage of the project, we used filters and various techniques to filter out noise from the signal that was data obtained from the stethoscope’s accelerometer. The clean signal was used for estimating breathing cycle and identify patterns that provided us with information about the inhalation and exhalation process.
COVID-19 Detection by Cough Sound Classification
Picture for COVID-19 Detection by Cough Sound Classification
2024
Student/s: Kevin Benhamou, Benjamin Amsellem
Supervisor/s: Yair Moshe
In 2020, the COVID-19 pandemic, a highly contagious respiratory disease, began spreading globally. Before effective vaccines were developed, the primary strategy to control the outbreak was to quickly identify and isolate infected individuals to prevent them from transmitting the virus to others. To facilitate this, we sought in this project to develop a tool that could efficiently detect the presence of the coronavirus through analysis of patient cough sounds recorded on smartphones. A major challenge was obtaining relevant training data, so we used data from similar projects conducted at other universities.
Emotional Speech Synthesis
Picture for Emotional Speech Synthesis
2024
Student/s: Sagi Eyal, Loren Tzveniashvily
Supervisor/s: Yair Moshe
Logo of Elbit Systems Collaborator
The goal of this work was to perform emotional speech synthesis. First, we experimented with the emotional voice conversion approach, where the system receives two voice signals and transfers the emotion from one recording to another. Later in the project, we focused on the emotional text-to-speech approach, where the system receives the transcription of the sentence we want to synthesize and the desired emotion and generates a recording of the desired sentence with the given emotion. First, we reproduced the results of the EmoSpeech system, which converts text to emotional speech in a fast and high-quality manner.
Automatic Speech Recognition for Torah Reading with Cantillation Marks
Picture for Automatic Speech Recognition for Torah Reading with Cantillation Marks
2024
Student/s: Aviv Shem-Tov and Ori Levi
Supervisor/s: Oren Mishali & Nimrod Peleg
This work aims to develop a speech-to-text model that recognizes Torah readings with cantillation marks (Trop) and transcribes the verses accurately, including the cantillations. This model will enable the detection of reading errors and suggest corrections, thereby improving the accuracy of Torah readings. The project focuses on developing a system capable of listening to Torah readings, identifying the spoken text, and generating an accurate transcription, including the cantillation marks. These marks are special symbols accompanying the biblical text that indicate pronunciation, intonation, and word emphasis.
RTF Estimation Using Riemannian Geometry for Speech Enhancement in the Presence of Interferences
Picture for RTF Estimation Using Riemannian Geometry for Speech Enhancement in the Presence of Interferences
2024
Student/s: Or Ronai and Yuval Sitton
Supervisor/s: Amitay Bar & Prof. Ronen Talmon
We address the problem of multichannel audio signal enhancement in reverberant environments with interfering sources. We propose an approach that leverages the Riemannian geometry of the spatial correlation matrices of the received signals to estimate the relative transfer function (RTF) of the desired source. Specifically, we compute the spatial correlation matrices in short-time segments, and subsequently, their Riemannian mean, which preserves shared spectral components while attenuating unshared ones. This enables an effective intermittent interference rejection, leading to accurate RTF estimation.
Synthetic Speech Attribution (2022 IEEE Signal Processing Cup)
Picture for Synthetic Speech Attribution (2022 IEEE Signal Processing Cup)
2023
Student/s: Rotem Rousso Matan Millionschik, Yael Hamo, Adir Cohen-Nissan
Supervisor/s: Yair Moshe, Pavel Lifshits
Logo of IEEE Collaborator
This report describes Team SIPLs solution to the 2022 Signal Processing Cup challenge. We developed a method that, given an audio recording of a synthetically generated speech track, can detect which method among a list of candidates has been used to synthesize the speech, and can also accommodate for unknown speech synthesis algorithms. Our solution relies on speech signal analysis using signal processing and machine learning techniques, particularly deep neural networks. Using an ensemble of features and classifiers allows our method to achieve high performance and to be robust to noise. Another strategy we use for noise robustness is data augmentation for training with noisy audio tracks.
A System for Spatial Hearing with 3D speakers in Acoustic Room
Picture for A System for Spatial Hearing with 3D speakers in Acoustic Room
2023
Student/s: Alon Barash
Supervisor/s: Nimrod Peleg, Joseph Attias
Logo of AudioNeuro Lab Collaborator
The project's goal was to prepare a system infrastructure to enable audiologists and hearing researchers to run clinical experiments of spatial hearing in a special acoustic room with 17 speakers. The project combined hardware and software, with following parts: calibration for all speakers, upgrade of the old control system, software modules that control each speaker, software to enable running experiments with different stimuli and noises, software for GitHub integration into the project for, and software for automatic report generation (used also for experiment reproduction).
Countermeasures Against Speech Manipulation Attacks
Picture for Countermeasures Against Speech Manipulation Attacks
2023
Student/s: Maayan Lifshitz, Ayala Luz
Supervisor/s: Yael Segal
With the expansion of neural network usage, systems based on them have become targets for various manipulation attacks. One of the common types of such attacks is adversarial attacks, which involve adding noise to the incoming signal to the system in order to produce a false outcome (adversary noise addition). This project focuses on adversarial attacks on speech signals in speech classification systems. As part of the research, neural networks based on the VGG model were trained on two types of speech signal datasets: the first one containing words, and the second containing vowels. Attacks of varying intensities were applied to the input signals, causing the network to make mistakes.
Audio Classification using Transformers
Picture for Audio Classification using Transformers
2023
Student/s: Matan Millionschik, Michael Berko
Supervisor/s: Yael Segal
In the last decade, deep learning has been expanding and taking over many areas of signal processing of different kinds - image, audio and text among others. With a set of diverse architectures such as neural networks, convolutional networks and lately, transformers, deep learning showcases better results than seen with classical methods in many signal processing tasks in general, and audio processing specifically. In the last few years, convolutional architectures rule the audio world especially in classification, emotion detection and feature extraction. Similar to the computer vision area, the learned audio features can be optimized on a broad spectrum of datasets and labels.
Acoustic Scene Classification
Picture for Acoustic Scene Classification
2023
Student/s: Shira Lifshitz, Ellinor Elimeleh
Supervisor/s: Dr. Meir Bar-Zohar
This work deals with acoustic scene classification on a dataset published in the DCASE2017 challenge. The goal is to achieve better performance than the performance presented in the challenge, using neural networks and mel-spectrogram features. We present the processing of the dataset, the classifier and models, and the selected hyperparameters. The best performance was obtained using mel-spectrogram features, an EfficientNet V2 S neural network, and a MiniNet net as selection algorithm. Accuracy of 83.33% was achieved, which is higher than the performance to which we compare the results.
Classification of Heart Sounds Using Deep Convolutional Networks
Picture for Classification of Heart Sounds Using Deep Convolutional Networks
2023
Student/s: Shlomi Zvenyashvili, Arik Berenshtein
Supervisor/s: Dr. Meir Bar-Zohar
Heart cardiovascular disease is a leading cause of death globally, with over 17 million deaths each year according to the World Health Organization (WHO). Accurate classification of heart sounds is crucial for early detection and effective management of heart conditions. However, this task is challenging due to the complexity of heart sound data, which includes variations caused by low quality recordings and differing physiological conditions. Robust and efficient models are needed for handling such diverse data and improving diagnostic accuracy. In this work, we propose a machine learning-based solution using Deep Convolutional Networks.
Recognizing Autism in Mice by Analyzing Their Squeaks
Picture for Recognizing Autism in Mice by Analyzing Their Squeaks
2022
Student/s: Itamar Ginsberg, Alon Schreuer
Supervisor/s: Dr. Dror Lederman, Prof. Hava Golan
Diagnosis of autism at an early age is an extensive area of research, as it has a massive impact on the ability to treat and aid those suffering from the syndrome. So far diagnosis has been based on professional behavioral observation, a flawed tool since it is subjective and imprecise, but also due to the fact that it is only effective at a late developmental stage (age 4-5 years). The goal of this work is to develop a diagnostic-assist tool for classifying mice into two categories: mice with symptoms of ASD (Autism Spectrum Disorder) and mice without such symptoms, based on recordings of their squeaks.
Deep Learning Based Target Cancellation for Speech Dereverberation
Picture for Deep Learning Based Target Cancellation for Speech Dereverberation
2022
Student/s: Neriya Golan, Mikhail Klinov
Supervisor/s: Yair Moshe, Baruch Berdugo
Background noise and reverberation can degrade the quality of speech signals and reduce their intelligibility. Reverberations also reduce the performance of important systems such as hearing aids or voice recognition applications. There are a variety of classic methods for dereverberation of speech signals, but their performance is usually unsatisfactory and not generalizable. In light of this, there has been an increase in recent years in research on dereverberation using modern methods based on deep learning.
A System for Real-Time Deep Speech Denoising
Picture for A System for Real-Time Deep Speech Denoising
2022
Student/s: Wajd Boulos, Saba Saba
Supervisor/s: Hadas Ofir
Recorded speech is often mixed with a variety of background noises, such as a leaf blower, washing machine, dog barking, baby crying, kitchen noises, etc. Background noise significantly degrades the quality and intelligibility of the perceived speech. In this project, we present a real-time single channel audio denoising deep learning solution, based on recurrent neural networks. Three algorithms were chosen for this purpose: DCCRN, FullSubnet and one that we proposed which is a mix of the two: Complex-FullSubnet. Weve trained and integrated all 3 algorithms into a real time speech denoising infrastructure and adjusted them to run in a real time environment.
Audio Signals Dereverberation Algorithms Performance Evaluation
Picture for Audio Signals Dereverberation Algorithms Performance Evaluation
2022
Student/s: Nitzan Yehezkel, Nadav Reichler
Supervisor/s: Hadas Ofir
Speech signals processing technologies play an important role in our daily lives, with the focus being on improving the signals quality by reducing noise and reverberations. When an audio signal is received in a microphone array, there are two types of signals added to it which corrupt its quality - noise (statistically independent) and reverberations (statistically dependent). however, most of the existing applications for dereverberation, show reliable performances only when the microphone is posed near the speaker. In addition, finding practical algorithms that can reduce reverberations in real-time remains one of the most difficult challenges of the field.
Ultrasonic User Authentication With Smartphones
Picture for Ultrasonic User Authentication With Smartphones
2022
Student/s: Ofir Ben Yosef, Neta Gevirtzer
Supervisor/s: Alon Eilam
The goal of this work is to implement a solution to the smart user authentication, without users contact with the workstation. The solution includes two applications; the first one is an Android app at the users smartphone which indicates the presence of the user near the workstation. The second app is a windows app which serves as a dynamic lock screen for the workstation. The solution is implemented by using an audio signal, consisted of combination of chosen frequencies, which creates a symbol. A sequence of four symbols is the password for locking\unlocking the workstation. The password played by the mobile phone is encrypted and generated randomly for each user.
Presenter Coach
Picture for Presenter Coach
2022
Student/s: Amit Zach, Hadar Horn
Supervisor/s: Hadas Ofir
In this work we design a tool that is going to assist people with becoming more professional presenters. The tool will be based on an algorithm which will receive a presenter's audio recording as input, extract features out of it, and bestow the presenter with recommendations and advice to improve the quality of their presentation, based on the extracted features. Examples for those features are pitch, pace, sentences and breaks lengths, technical quality measure of the recording, etc. Those are parameters which will serve as reliable indicators to the quality of the presentation.
Speech-to-Singing Conversion Using Deep Learning
Picture for Speech-to-Singing Conversion Using Deep Learning
2022
Student/s: Omri Jurim, Ohad Mochly
Supervisor/s: Yair Moshe, Gal Greshler
The purpose of this work is to develop an algorithm for converting speech to singing using deep learning methods. The system can help memorize various short texts like phone numbers and lists, as well as for entertainment. There are research papers on the subject that are based on classical signal processing methods as well as works that are based on deep learning, but so far (while working on this project) no results have been achieved that preserve speech content so that it is understandable and humane, along with converting it to desired melody.
Voice Disorder Detection via Deep Learning
Picture for Voice Disorder Detection via Deep Learning
2022
Student/s: Yiftach Edelstein, Chen Katzir
Supervisor/s: Hadas Ofir, Dr. Ariel Roitman
The project deals with the diagnosis of various voice pathologies related to the throat and vocal cords which today can only be diagnosed by a long and multi-stage process that includes listening to the patient's voice by an otolaryngologist specialist and then an invasive examination using special equipment. We assume that there is plenty of information about those pathologies in the voice recordings of the subjects, and therefore we wish to use them to design a simpler diagnosis procedure that is based on machine-learning algorithms.
Audio-Visual Voice Activity Detection and Localization Using Deep Correlated Representations
Picture for Audio-Visual Voice Activity Detection and Localization Using Deep Correlated Representations
2022
Student/s: Kfir Bendic, Itzhak Mandelman
Supervisor/s: Ofir Lindenbaum
One of the problems in performing signal processing operations on sound clips stems from noise added to the measurement device. Noise can drastically damage the performance of accurate analysis of an audio signal. One method to deal with this problem is to use multimodal observations so that one of the modalities is not affected by the noise of the other. An example is a video source independent of noise added to the audio, thus unaffected by this noise. This way, one can try to extract information lost in the audio due to the noise, using the video. The purpose of this project is to perform spatial and temporal detection and recognition of speech, both in audio and video.
Seeing Sound: Estimating Image From Sound
Picture for Seeing Sound: Estimating Image From Sound
2022
Student/s: Sagy Gersh, Yahav Vinokur
Supervisor/s: Yair Moshe
The goal of this work is to train a deep neural network so that it can receive an audio signal as input, and output a reconstructed image of the source from which that audio signal was produced. Under the assumption that an audio signal contains spatial properties of the object that produced it, we tried to use an audio classifier to extract these properties and transform them into a feature vector from which we can reconstruct the image from which the source was produced using a deep network with the GAN architecture. This project is a follow-up project with the same goal.
Acoustic Vehicle Localization
Picture for Acoustic Vehicle Localization
2021
Student/s: Nadav Abayov, Gefen Levite
Supervisor/s: Hadas Ofir
While on a busy street or rural roads, pedestrians wearing headphones are exposed to danger from passing traffic. This project is a first stage in developing a system that will detect an approaching vehicle, alert the user of the danger of a vehicle in the vicinity and in addition, will alert its direction and distance. We dealt with determination of the direction of the vehicle in relation to the pedestrian along its trajectory based on the time differences between microphones. Next, we experimented with estimating the distance of the vehicle using machine learning. Finally, we have built an algorithm and a demo system that simulate data processing from microphones in real time.
Indoor/Outdoor Classification of Voice for Mobile Devices
Picture for Indoor/Outdoor Classification of Voice for Mobile Devices
2021
Student/s: Gabriel Mannes, Odelia Longini
Supervisor/s: Ori Bryt
Logo of RAFAEL Collaborator
This projects goal is to classify a two-way radio recording into one of two classes: indoor or outdoor recording. A literature review that we have made led us to choose a neural network that was designed for solving a similar problem. The network is a ResNet-based network. The system transforms audio signals to log-mel spectrograms, and the result is then classified by the network. Since the data base we have is too small, another goal was defined for the project: training on a bigger dataset and perform inference on the small one.
Speech Enhancement Evaluation Using Speech Recognition algorithm
Picture for Speech Enhancement Evaluation Using Speech Recognition algorithm
2021
Student/s: Yotam Elia
Supervisor/s: Hadas Ofir, Baruch Berdugo
In the project we created a tool for evaluating speech denoising algorithms. The aim of the project is to evaluate speech enhancement algorithms using automatic speech recognition algorithms. The problem of evaluating the speech enhancement algorithm is mainly based on the difference between a noise-free signal and an intelligible signal for hearing. Since our goal is to test the quality of improvement in terms of human understanding, the choice of speech recognition algorithms is natural. The project had three stages: 1) Getting familiar with the tools I would need to test differently: Baidu, OM-LSA, DTLN.
User Specific Speech Recognition For Controlling a 3D Printed Prosthetic Hand
Picture for User Specific Speech Recognition For Controlling a 3D Printed Prosthetic Hand
2021
Student/s: Noa Tykochinsky, Itay Wengrowicz
Supervisor/s: Shunit Polinsky
Logo of Haifa-3D Collaborator
In this work we provide a solution for a voice controlling algorithm for a 3D printed prosthetic hand. The goal was to create a cheap and accessible solution, for an algorithm which will recognize and verify the voice of the prosthetic hand user. The system will recognize the words the user is saying in real-time and will be able to detect the activation words and keywords which represent the hand movement. The entire processing time, starting from the moment the audio input was received until the hand movement result, takes about 1.5 seconds and the risk for a false-positive result stand by less than 2%.
Domain Adaptation for Mobile Device Acoustic Based Proximity Sensor
Picture for Domain Adaptation for Mobile Device Acoustic Based Proximity Sensor
2021
Student/s: Niv Menashe
Supervisor/s: Pavel Lifshits
In our modern days, almost every person in the western world owns a relatively new smartphone. Every smartphone is equipped with an infra-red photoelectric proximity sensor, which is placed next to the phones speaker, and used for turning off the smartphones screen when the sensor is blocked. In example, when one is receiving a phone call, the person places the smartphone besides his ear, thus create a blockage on the proximity sensor that will turn off the screen. We propose a different approach for creating a proximity sensor, which does not require special sensor, using an acoustic based proximity sensor convenient method that does not require any additional hardware.
Features Extraction for Classification of Dolphin Sounds
Picture for Features Extraction for Classification of Dolphin Sounds
2021
Student/s: Harel Plut, Or Cohen
Supervisor/s: Dr. Roee Diamant
Logo of ANL Haifa Collaborator
With the large increase in human marine activity, our rivers and seas have become populated with boats and ships projecting acoustic emissions of extremely high power that often affect areas of up to 20 square km and more. The underwater radiated noise (URN) level from large ships can exceed 100 PSI and is wideband, such that even at km distances of several kilometres from the vessel, the acoustic pressure level is still high. While evidence showed evidence for a clear disturbance impact on the hearing and behavior of marine mammals, there is still no systematic proof to the extent of this effect.
Classification of Dolphin Whistles
Picture for Classification of Dolphin Whistles
2021
Student/s: Jonathan Masin, Racheli Katz
Supervisor/s: Dr. Roee Diamant
Logo of ANL Haifa Collaborator
With the large increase in human marine activity, our seas have become populated with boats and ships projecting acoustic emissions of extremely high power that often affect areas of up to 20 square km and more. The underwater radiated noise (URN) level from large ships can exceed 100 PSI and is wideband, such that even at distances of several kilometres from the vessel, the acoustic pressure level is still high with a clear disturbance impact on the hearing and behaviour of marine fauna.
Otoacoustic Emissions (OAE) as a Tool for Early Autism Diagnosis
Picture for Otoacoustic Emissions (OAE) as a Tool for Early Autism Diagnosis
2021
Student/s: Amit Shpigelman, Simcha Lipner
Supervisor/s: Barr Morgenstein
Logo of SensPD Collaborator
One of the main challenges with diagnosing autism is the lack of a structured method of diagnosis. Today, the way autism is diagnosed is by observing behaviors of the subjects, which develop at a late stage, around the age of 4. This late diagnosis has its toll, mainly a late start of treatment and increased difficulty in integration into general society. SensPD company, with whom we worked during the project, have a theory that autism can be diagnosed using a cutting-edge method.
Acoustic Fence Using Multi-Microphone Speaker Separation
Picture for Acoustic Fence Using Multi-Microphone Speaker Separation
2021
Student/s: Tomer Fait, Orel Ben-Reuven
Supervisor/s: Amir Ivry
Logo of STEM Audio Collaborator
The goal of an acoustic fencing algorithm is to separate speakers by their physical location in space. In this project, we examine an algorithm which solves this problem, define suitable performance criteria, and test the algorithm in varied environments, both simulated and real. The real recordings were acquired by us with suitable acoustic equipment. We examine a speech separation algorithm based on spectral masking inferred from the speakers direction. The algorithm assumes the existence of a dominant speaker in each time-frequency (TF) bin and classifies these bins by employing a deep convolutional neural network.
Voice DeepFake
Picture for Voice DeepFake
2021
Student/s: Idan Roth, Zahi Cohen
Supervisor/s: Yair Moshe
The goal of this work is to design a method for performing voice conversion between two speakers. The method employs deep learning techniques, particularly autoencoder architecture, to convert the source speakers voice into the target speakers voice while preserving the source speakers linguistic content. The baseline model architecture is VC-AGAIN. This model uses a one-shot approach. In this approach, it is sufficient to receive in the inference stage a single speech signal from the source and target speakers, on whom the system has not been trained, in order to perform voice conversion.
Mark of Award this ProjectSpeaker Localization Inside a Car Using a Microphone Array
Picture for Speaker Localization Inside a Car Using a Microphone Array
2020
Student/s: Or Streicher, Adir Goldovsky
Supervisor/s: Ori Kats
Logo of Alango Collaborator
We present the results of a final project that was intended to evaluate a speaker position inside a vehicle, using semi-supervised learning algorithm. The dataset that was used in order to create and validate the proposed solution was labeled recordings which were sampled by an array of microphones positioned at the front of the vehicle. The recordings were labeled by different parameters (such as head angle of the speaker, presence of another person in the vehicle, etc). while the main parameter was the speaker position.
Mark of Award this ProjectIterative adaptive estimation of underwater channel transfer function based on soft information using turbo equalization
Picture for Iterative adaptive estimation of underwater channel transfer function based on soft information using turbo equalization
2020
Student/s: Asaf Gendler, Nadav Shalev
Supervisor/s: Kobi Bucris
Logo of RAFAEL Collaborator
Underwater acoustic communication has a rising interest in recent years as a result of increasing use of autonomous underwater vehicles. Underwater communication creates a difficult challenge because of different reasons such as ISI, Doppler and time variant channels, in addition to lack of research in compare to RF. In order to overcome the channel distortion problems, it is common to use equalizers for diminishing the channel effect. The state of the art equalizer today is a DFE followed by a standard decoding scheme.
Mark of Award this ProjectProximity Sensor for Smartphones based on Acoustic Measurements
Picture for Proximity Sensor for Smartphones based on Acoustic Measurements
2019
Student/s: Pavel Lifshits
Supervisor/s: Andy Rodan, Zacharie Cohen
Modern mobile phones are equipped with infra-red photoelectric proximity sensor, most commonly applied to turn off the touch screen during a phone call to prevent accidental touches when users face/ear is detected in proximity to the screen. In this project we propose to achieve the sensing functionality without using a special sensor. Specifically, we use an already existing speaker and microphones for proximity sensing, without interfering with their originally intended operation. We build our method on the observation that the transfer function from the mobile phone speaker to the microphones varies as a function of objects located in the vicinity of the mobile phone.
Mark of Award this ProjectSpeaker Diarization using Deep Learning
Picture for Speaker Diarization using Deep Learning
2019
Student/s: Matanel Yaacov, Shay Avig
Supervisor/s: Nurit Spingarn
Speaker Diarization is a process of dividing a given sound segment or audio stream into segments based on the speaker's identity. This method is designed to answer the question "Who spoke and when?" And can be useful in many different cases where it is important to know the speaker's identity. For example, phone calls, radio interviews, podcasts, and even emergencies where recordings from the scene are investigated (black boxes in aircraft, etc. ...). Speaker Diarization is the well-known and famous method for segmenting audio segments by speaker identity, which until today has been implemented by classical algorithms from audio signal processing.
Acoustic 3D Positioning of Smartphones in Motion
Picture for Acoustic 3D Positioning of Smartphones in Motion
2019
Student/s: Guy Dascalu, Omer Movshovits
Supervisor/s: Alon Eilam
Logo of Sonarax Collaborator
In recent years, as the use of mobile smartphones, personal assistants and other IoT devices is growing, there is an increasing demand for positioning systems that provide a reliable and accurate location in areas where Global Positioning System (GPS) cannot work. Places such as office buildings, museums, parking lots, airports and shopping malls all suffer from the limitation of satellite signals that do not pass through metal and concrete walls. Some methods for positioning in such environments make use of electromagnetic waves such as Bluetooth or Wi-Fi, while others utilize a different approach where acoustic, often ultrasonic waves are used.
Mark of Award this ProjectRobust Automatic Detector And Feature Extractor For Dolphin Whistles
Picture for Robust Automatic Detector And Feature Extractor For Dolphin Whistles
2019
Student/s: Guy Shkury, Yoel Bud
Supervisor/s: Roee Diamant
A key in Dolphins conservation efforts is population estimation in their natural environment. A common method for mapping Dolphins appearance is the detection of their vocalizations. In this paper, we propose a novel detection technique for Dolphins whistles, referred to as ECV (Entropy, Correlation, and Viterbi algorithm). ECV is a robust detector of low complexity that automatically detects dolphins whistles and extracts their spectral features, using a single receiver with only a few system parameters. The method employs a chain of decisions based on spectral entropy and time-domain correlation followed by constrained Viterbi algorithm to extract the whistles features.
Mark of Award this ProjectAcoustic positioning with unsynchronized sound sources
Picture for Acoustic positioning with unsynchronized sound sources
2018
Student/s: Guy Feferman, Michal Blatt
Supervisor/s: Alon Eilam, Guy Shofen
Logo of Sonarax Collaborator
The problem In recent years, cellular phones are widely used for GPS-based navigation. There is a growing demand to provide navigation with cellular phones in areas where GPS signals can't be received such as in airports, hospitals, shopping centers and underground parking lots. Furthermore, it is required to provide better than GPS accuracy for locating products in a supermarket, finding rental cars pickup spots or a hotel room door in a corridor. In this project we created an indoor positioning system for cellular phones, based on unsynchronized acoustic signals.
Mark of Award this ProjectAudio retrieval by voice imitation
Picture for Audio retrieval by voice imitation
2018
Student/s: Mohamad Khatib, Samah Khawaled
Supervisor/s: Hadas Benisty
Using existing sound search systems by textual keywords is problematic because it is not done by accurately describing the exact sound content the user is looking for. Recently, an innovative direction has been proposed for sound search systems - search by imitation. The goal of our project is to design a technique for searching sounds in a database given an audio signal that is an imitation of the desired sound. The user imitates sound, for example: (cat sound, or the sound of a landing plane) and the system returns the most likely audio as an output from a library that contains the dataset. This system provides innovative tools for interaction with computerized systems.
Mark of Award this ProjectSiren Detecction Algorithm in Noisy Environment for The Hearing Impared
Picture for Siren Detecction Algorithm in Noisy Environment for The Hearing Impared
2017
Student/s: Ariel Yeshurun, Dean Carmel
Supervisor/s: Yair Moshe
People with Hearing Disabilities experience many difficulties in everyday life that affects them and their surroundings. The technological development in our lives helped them in many areas, but it made Driving even harder experience. They can't hear noises and beeping, but most importantly they can't hear approaching emergency vehicles. This inability makes them a safety hazard both to themselves and to their surroundings, because they can accidentally cause a roadblock or even an accident. There isn't a uniform standard for sirens today, and there isn't an algorithm that can detect sirens from different countries.
Speaker Diarization Using Dimension Reduction
Picture for Speaker Diarization Using Dimension Reduction
2016
Student/s: Lee Twito, Ori Shahar
Supervisor/s: Nurit Spingarn
Diarization problem is well known problem in the world of speech recognition and speech processing. Our project goal is Speaker diarization in recorded conversation. We try a new approach for solving this problem, using dimension reduction algorithm (LLE). The results are compared to a famous method for solving this problem, using Bottom-Up algorithm. We tested our method on merged TIMIT files, and recordings we recorded by ourselves.
Mark of Award this ProjectAudio QR Over Streaming Media
Picture for Audio QR Over Streaming Media
2016
Student/s: Gal Binyamin, Itai Dagan
Supervisor/s: Alon Eilam
Logo of Prontoly Collaborator
We describe a system that delivers a website address to a cellular phone by encoding inaudible binary data in an analogue audio signal, which is received by the microphone of the cellular phone. This is an alternative to encoding a web site address in a QR code label, which is scanned by the cellular phones camera. Data embedding in the audio signal is done by modifying the phase of the signal's modulated complex lapped transform (MCLT) coefficients, while the perceived quality of the embedded audio signal remains the same as that of the original audio signal. A whole system was implemented and tested both in simulation and in reality. The data rate achieved in reality at a distance of 1.
Distress Situation Detection in Speech
Picture for Distress Situation Detection in Speech
2016
Student/s: Yehav Alkaher, Osher Dahan
Supervisor/s: Yair Moshe
Logo of cMeSafe Collaborator
When a person is in a distress situation, there are signs which are reflected in his speech or in the audio of its surroundings. The project deals with speaker-independent distress detection in speech of a single speaker. The solution involves the extraction of relevant features from the speech signal and the comparison between the different methods of extraction. A distress situation is defined by the discrete emotions of anger and fear. The project concludes with the classification of distress in speech with 91% accuracy, using the Berlin Emotional Speech Database in the German language.
Cry-based Detection of Developmental Disorders in Infants
Picture for Cry-based Detection of Developmental Disorders in Infants
2015
Student/s: Amit Oren, Avi Matzliach
Supervisor/s: Rami Cohen
Developmental disorders are a group of neurological conditions originating at childhood, that involve serious impairments in various areas (language, learning, motor skills). These conditions also comprise Autism Spectrum Disorders. As of 2008, approximately 15% of children in the United States have been diagnosed with some sort of developmental disorder, is comparison to only 12.8% in 1997 [1]. Early detection of developmental disorders is crucial, as it enables early intervention (e.g. speech therapist, occupational therapy), which may reduce neurological and functional deficits in infants.
Fax In Your Pocket
Project default image
2014
Student/s: Smadar Shapira
Supervisor/s: Alon Eilam & Pavel Lifshits
Mark of Award this ProjectReal-Time Digital Watermarking System for Audio Signals Using Perceptual Masking
Picture for Real-Time Digital Watermarking System for Audio Signals Using Perceptual Masking
2011
Student/s: Yuval Cassuto, Michael Lustig
Supervisor/s: Shay Mizrachi
Recent development in the field of digital media raises the issue of copyright protection. Digital watermarking offers a solution to copyright violation problems. The watermark is a signature, embedded within the data of the original signal, which in addition to being inaudible to the human ear, should also be statistically undetectable, and resistant to any attempts to remove it. In addition, the watermark should be able to resolve multiple ownership claims (known as the deadlock problem), which is achieved by using the original signal (i.e., the unsigned signal) in the signature detection process.
Mark of Award this ProjectNon-Coherent Multichannel Speech Enhancement in Non-Stationary Noise Environments
Project default image
2010
Student/s: Nir Kahana, Liav Levi
Supervisor/s: Ronen Talmon
Logo of Israel Police Collaborator
Mark of Award this ProjectPeople Metering Using Mobile Devices
Project default image
2010
Student/s: Oded Yeruhami, Yuval Bahat
Supervisor/s: Rafi Steinberg
Mark of Award this ProjectTemporal Decomposition of Speech
Project default image
2006
Student/s: Erez Cohen, Ronen Krupnik
Supervisor/s: Guy Narkiss
Logo of Tadiran Communication Collaborator
Mark of Award this ProjectSpeech Bandwidth Extension
Project default image
2004
Student/s: Eran Borstain, Aviram Shmueli
Supervisor/s: Ariel Sagi
Mark of Award this ProjectDetection of Spectral Signature in SONAR Signal, Part A+B
Project default image
2004
Student/s: Ronen Peisakh, Amir Chasdai
Supervisor/s: Erez Sabbag
Logo of MAFAT Collaborator
Mark of Award this ProjectVoice Morphing
Project default image
2002
Student/s: Gidon Porat
Supervisor/s: Yizhar Lavner
Mark of Award this ProjectHMM Based Speech Recognition System
Project default image
2002
Student/s: Tal Levinstein, Boaz Shoval
Supervisor/s: Anelia Baruch-Somekh
Mark of Award this ProjectDevelopment of a New Algorithm for Voice Modification
Project default image
2001
Student/s: Assaf Rubin, Michael Kats
Supervisor/s: Yizhar Lavner
Mark of Award this ProjectReal-Time Embedding of Digital Watermarking for Audio Signal
Project default image
2000
Student/s: George Leifman, Eran Borenstein, Tal Mizrahi
Supervisor/s: Shay Mizrachi
Mark of Award this ProjectReal Time Implementation of Low Bit-Rate Speech Compression on Sharc DSP
Project default image
2000
Student/s: Arthur Kol, Yaniv Shaked
Supervisor/s: Ronen Mayrench
Mark of Award this ProjectWidening of Band-Width in Telephony
Project default image
1996
Student/s: Leonid Sandomirsky, Alex Simchovich
Supervisor/s: Guy Cohen
Logo of IC-COM Collaborator
Speech Sounds Playing in PC Platform
Project default image
1990
Student/s: Gal Havshush, Eyal Ovadya
Supervisor/s: Guy Cohen, Amnon Yonash
Logo of Algorithms Research CollaboratorLogo of Quazar Collaborator
Speech Identification Based on DSP56000
Project default image
1989
Student/s: Ronen Korman, Eyal Virjansky
Supervisor/s: Gal Ben-David
Filtering of Adaptive Speech Signals from Noise
Project default image
1989
Student/s: Itzhak Ben-Basat, Ilan Herbst
Supervisor/s: Aharon Satt, Alberto Berstein
Logo of DSP Group Collaborator