Using R to convert SPSS data files for Mplus

For the structural equation modeling that I do, I typically use Mplus. When I need to write a script for a Monte Carlo simulation, or for additional data analysis and visualization, I use Matlab. One of the challenges is that I am usually not the one who does the actual data collection. Most data files I analyze come to me as .sav or .por files, which are native to SPSS. The good thing about this is that when SPSS is used for data entry, the metadata fields are usually pretty well-populated. That helps considerably in understanding what each of the variables represent, and how the data was collected. The problem, however, is that the .sav format is not universal, so other analysis tools have their own way of reading and storing data. Mplus is particularly fussy. Data must be stored in a plain text ASCII file, either space or comma separated, and even column headers will cause reading of the input data file to fail. So, how do we resolve this situation?

Before I continue, I should note that this certainly isn’t the only approach. In fact, Christian Geiser gives a detailed example in his book “Data Analysis with Mplus” of how to convert files using SPSS so they can be read by Mplus. In my situation, however, I want to do it without using SPSS. Although I technically could use SPSS, it is inconvenient for me because I use Linux, and SPSS runs only on windows 1. I do have a virtual machine that I can use, but what I was looking for here was an option that would work on my native operating system. Enter R.

R is great because it’s open-source and incredibly easy to install. A simple

$ sudo aptitude install r-base-core

at the command-line will do the trick. Then, you can launch it with

$ R

There are multiple R packages on cran that support this conversion. My understanding is that the foreign package does all of the heavy lifting, even though I am directly calling the hmisc package. I’ll show the conversion of a .por file here, because that’s what I last used. The hmisc package can be installed (again at the command-line) with

$ sudo aptitude install r-cran-hmisc

The exmple below shows the steps that are usually required: importing the data, setting appropriate missing data identifiers, selecting a subset of the data, and writing the data out to file in the appropriate format.

# First, load the hmisc library.
library(Hmisc)

# Second, read in the file.
myData<-spss.get('./inputFile.por')

# Third, put in missing data identifiers that Mplus likes.
# Replace NA with -9999.
myData[is.na(myData)] <- -9999

# Construct a new dataset with just the parts we want.
subds <- cbind( c(levels(myData[,1])[myData[,1]]),  # UID
                c(myData[,5]),                      # Time
                c(myData[,33]),                     # Happy
                c(myData[,35]),                     # Enerjetic
                c(myData[,37]) )                    # Cheerful

## Name the columns
colnames(subds) <- c("UID","Time","Happy","Enerjetic","Cheerful")

# Save the data to file.
write.table(subds,"subds.dat",sep="\t",row.names=FALSE,col.names=FALSE,quote=FALSE)

# Create a metadata file and save that too.
# The descriptives for the column names in the file "subds.dat" are:
fileConn <- file("subds.dat_metadata.txt")
writeLines("The descriptives for the column names in the file \"subds.dat\" are:",fileConn)
close(fileConn)
write.table(colnames(subds),"subds.dat_metadata.txt",append=TRUE)

Of course, you will have to modify the above code to reflect the name and path of your input file, the variables you want to export, etc. The code above can either be run directly from the R interface, or with Rscript. Once I get the bugs worked out of the conversion routine, I like to use Rscript. Assuming your conversion script is called convert.R, it can be called with:

$ Rscript convert.R

If you have any questions regarding the implementation, feel free to post them in the comments section!

  1. This is not technically true. SPSS supports RedHat and Debian operating systems, but I feel like getting SPSS (or almost any proprietary software) onto my machine is like pulling teeth. There is a lot to it. You have to purchase a separate license from the company, you have to get a separate download, you have to install it in a different way than you’re used to on Windows. In short, it’s a pain for IT so asking for native Linux software can burn through your social capital with IT pretty quickly.