HHSCyberSecurityBreachesR Documentation

Cybersecurity breaches reported to the US Department of Health and Human Services


Since October 2009 organizations in the U.S. that store data on human health are required to report any incident that compromises the confidentiality of 500 or more patients / human subjects (45 C.F.R. 164.408) These reports are publicly available. HHSCyberSecurityBreaches was downloaded from the Office for Civil Rights of the U.S. Department of Health and Human Services, 2015-02-26




A dataframe containing 1151 observations of 9 variables:


A character vector identifying the organization involved in the breach.


A factor giving the two-letter abbreviation of the US state or territory where the breach occurred. This has 52 levels for the 50 states plus the District of Columbia (DC) and Puerto Rico (PR).


A factor giving the organization type of the covered entity with levels "Business Associate", "Health Plan", "Healthcare Clearing House", and "Healthcare Provider"


An integer giving the number of humans whose records were compromised in the breach. This is 500 or greater; U.S. law requires reports of breaches involving 500 or more records but not of breaches involving fewer.


Date when the breach was reported.


A factor giving one of 29 different combinations of 7 different breach types, separated by ", ": "Hacking/IT Incident", "Improper Disposal", "Loss", "Other", "Theft", "Unauthorized Access/Disclosure", and "Unknown"


A factor giving one of 47 different combinations of 8 different location categories: "Desktop Computer", "Electronic Medical Record", "Email", "Laptop", "Network Server", "Other", "Other Portable Electronic Device", "Paper/Films"


Logical = (Covered.Entity.Type == "Business Associate")


A character vector giving a narrative description of the incident.


This contains the breach report data downloaded 2015-02-26 from the US Health and Human Services. This catalogues reports starting 2009-10-21. Earlier downloads included a few breaches prior to 2009 when the law was enacted (inconsistently reported), and a date for breach occurrence in addition to the date of the report.

The following corrections were made to the file:


"Breaches Affecting 500 or More Individuals" downloaded from the Office for Civil Rights of the U.S. Department of Health and Human Services, 2015-02-26

breaches for an earlier download of these data. The exact reporting requirements and even the number and definitions of variables included in the data.frame have changed.


## 1.  mean(Individuals.Affected)
## 2.  Basic Breach Types
tb <- as.character(HHSCyberSecurityBreaches$Type.of.Breach)
tb. <- strsplit(tb, ', ')
# 8 levels, but two are the same apart from 
# a trailing blank.  
## 3.  Location.of.Breached.Information 
lb <- as.character(HHSCyberSecurityBreaches[[
lb. <- strsplit(lb, ', ')
# 8 levels 
table(sapply(lb., length))
#   1    2    3    4    5    6    7    8 
#1007  119   13    8    1    1    1    1 
# all 8 levels together observed once 
# There are 256 = 2^8 possible combinations 
# of which 47 actually occur in these data.