Itay Laish scite author profile

Itay Laish

5Publications

62Citation Statements Received

94Citation Statements Given

How they've been cited

How they cite others

Affiliations

Google (United States)

Publications

Order By: Most citations

Customization scenarios for de-identification of clinical notes

Hartman

Howell

Dean

et al. 2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. Objective: We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. Methods: We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Results: Fully customized systems remove 97-99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Conclusion: Health organizations should be aware of the levels of customization available when selecting a deidentification deployment solution, in order to choose the one that best matches their resources and target performance level.

show abstract

Learning and Evaluating a Differentially Private Pre-trained Language Model

Hoory¹,

Feder²,

Tendler³

et al. 2021

View full text Add to dashboard Cite

Contextual language models have led to significantly better results, especially when pretrained on the same data as the downstream task. While this additional pre-training usually improves performance, it can lead to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially-private language model, but this usually comes at the expense of model performance. Also, in the absence of a differentially private vocabulary training, it is not possible to modify the vocabulary to fit the new data, which might further degrade results. In this work we bridge these gaps, and provide guidance to future researchers and practitioners on how to improve privacy while maintaining good model performance. We introduce a novel differentially private word-piece algorithm, which allows training a tailored domain-specific vocabulary while maintaining privacy. We then experiment with entity extraction tasks from clinical notes, and demonstrate how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of = 1.1 and with only a small degradation in performance. Finally, as it is hard to tell given a privacy parameter what was the effect on the trained representation, we present experiments showing that the trained model does not memorize private information.

show abstract

Audio De-identification: A New Entity Recognition Task

Cohn¹,

Laish

Beryozkin

et al. 2019

Preprint

View full text Add to dashboard Cite

Audio De-identification - a New Entity Recognition Task

Cohn¹,

Laish

Beryozkin

et al. 2019

View full text Add to dashboard Cite

Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline's results on it.

show abstract

Efficient Dynamic Approximate Distance Oracles for Vertex-Labeled Planar Graphs

Laish¹,

Mozes²

2017

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Itay Laish

Customization scenarios for de-identification of clinical notes

Learning and Evaluating a Differentially Private Pre-trained Language Model

Audio De-identification: A New Entity Recognition Task

Audio De-identification - a New Entity Recognition Task

Efficient Dynamic Approximate Distance Oracles for Vertex-Labeled Planar Graphs

Contact Info

Product

Resources

About