Paul Johnson "Summarizing Runs of an Agent-Based Model" Pols 909 "Agent-Based Modeling" April 29, 2003 The newest version of this writeup is saved in (all one line): http://lark.cc.ku.edu/~pauljohn/ps909/ Agent-Based_Models/summarizingRuns.txt With simulations, it is often necessary to run models over and over and then compare the results. I compare many runs for each combination of parameter settings in order to find out if the luck of the draw--pure randomness--is having a significant impact. That is also a way to assess "path dependence". Then I also compare outcomes across parameter settings to find out if parameters have an impact on averages across runs. How to do this? First, in the ModelSwarm I typically use some plain old C code to with fprintf to write summary measures from the model out to a file. I described that here: http://lark.cc.ku.edu/~pauljohn/ps909/Agent-Based_Models/fprintfnote.txt Swarm can also output data from EZGraphs in HDF5 format and that works great too, especially if you are wanting to make plots of time series. The artificial stock market (ASM: http://ArtStkMkt.sourceforge.net) shows how to do both of these approaches. In my opinion research model, runs generate a stream of files, one for each run, and they are named like this: DataCulture1 DataCulture2 DataCulture3 ... If I run the model for many parameter combinations, then there are many directories full of these files. Just so you can experiment, I copied 10 of these output files from one directory up here: http://lark.cc.ku.edu/~pauljohn/ps909/Agent-Based_Models/DataCulture The other scripts and programs I describe below are in there. ***Task 1*** Summarize/Compare outcomes across runs The last line in each of these files can be thought of as a "snapshot" at termination time, so often the first task is to collect up those last lines and then build summaries of them. Here is a script I wrote that you run in the directory where the DataCulture files are. It will grab the first line out of the first data file and use that to create a row of variable names, and then it will tack on the last line from each file to make a summary data set. Here is the program "grablastline," which you can put in a file, make it executable, and run then it. #!/bin/sh #grabLastLine. echo "the input was $1" if [ -z $1 ]; then echo "-z says there is no input 1=$1" fn="lastline.dat" else fn=$1 fi echo "Output name is $fn" sed 1q DataCulture1 > $fn for x in D* do tail -n 1 $x >> $fn done exit 0 If you don't want the output file called "lastline.dat", you can give a command line argument and it will use that argument as the file name. If you have many directories of runs, I suggest you give your file a meaningful name, like "model2b04.dat". Do that by typing: ./grabLastLine model2b04.dat You have to edit this grabLastLine program if your data output files are not called DataCultureX. It is easy to see where. Change DataCulture to the root name of your output files and change "for x in D*" and replace the D* with the first name of your data sets. If you copy all of your *.dat files into a single "analysis" directory, you can use the statistical program R to make an easy summary of each file. Here is an R program I have used in the past. It will, one by one, grab all the *.dat files, and for each one it will create an output file that gives the mean, standard deviation of the variables. It reads the first line in the datafile to find the variable names, and then those summaries are dumped out. This ignores non numeric variables. I've been using this script for 2 or 3 years, several users in the R list gave me good tips in controlling the output. The output *.dat.summary file easily can load into a word processor program or spreadsheet. Here is the file "Summary.R" --------------------------------------- myDat<-list.files(pattern="*.dat$") #if you want a particular file summarized, say "Mod7.dat" replace the previous with #myDat<-list.files(pattern="Mod7.dat") createSummary <-function(dsName){ data<-read.table(dsName,header=T,as.is = TRUE) indices<-1:dim(data)[2] indices<-na.omit(ifelse(indices*sapply(data,is.numeric),indices,NA)) mean<-sapply(data[,indices],mean) sd<-sapply(data[,indices],sd) newOutput<-rbind(mean,sd) newOutput<-round(newOutput,digits=6) outputdsname<-c(paste(dsName,".summary",sep="")) write.table(t(newOutput),file=outputdsname,quote=FALSE, sep="\t",col.names=FALSE) } processData <-function(dat){ for (i in 1:length(dat)) { createSummary(dat[[i]]) } } processData(myDat) ------------------------------------------------ That takes each *.dat file in your current directory and creates a *.dat.summary file for it. IT is just columns of means and standard deviations. You can see in the "createSummary" method where you can add other numbers if you want. To use that program is very simple. You can type it in from the command line of R if you want to. But it is easier to just read in the file of commands. Just copy "Summary.R" into the directory where you collected up the *.dat files, then start R $ R And after R starts, you read in ("source") the program > source("Summary.R") That reads in and runs the program, and the *.summary files should have popped up already in your directory. Then quit R with > q() and answer "n" unless you want to save the workspace. You probably don't want that. Note the summary files do exist! ***Task 2*** Sometimes we want more detailed analysis of a particular run of the model. Suppse we want a time-series plot of several variables, and we want to see that plot for each of the many data sets. It turns out R is ideal for that procedure, because it makes it very easy to load and unload datasets. I realize that my R code is not clever, but it works, and so I'm not apologizing for it. There is an example of one of these pictures here: http://lark.cc.ku.edu/~pauljohn/ResearchPapers/APSA01/apsa0105.gif I have many different pictures scattered throughout the various reports on that opinion model. Here is an R program I run on the DataCulture* datasets to make pictures like that. ---------------------------------- #"RHuckFigure2.R" run<-1 updateDataSet <- function(run) { aChar<-character(1) aChar<-as.character(run) dsname<-c(paste("DataCulture",aChar,sep="")) data<-read.table(dsname,header=T,as.is = TRUE) } buildGraph <-function(data) { tmp1<-plot(data$acquaint~data$T,type='l', ylim=c(0,1),ylab="average proportion",xlab="PERIOD",lty=1,pch=1,main="") par("new"=TRUE) tmp2<-plot(data$harmony~data$T,type='l', ylim=c(0,1),ylab="average proportion",xlab="PERIOD",lty=2,pch=1,main="") par("new"=TRUE) tmp3<-plot(data$identical~data$T,type='l', ylim=c(0,1),ylab="average proportion",xlab="PERIOD",lty=3,pch=1,main="") par("new"=TRUE) tmp4<-plot(data$totalEntropy~data$T,type='l',ylim=c(0,1),ylab="", xlab="",lty=4,pch=1,main="",cex=10) #want a legend? #legend(max(data$T)/2,0.2,c("Acquaintance","Harmonious", "Identical","Entropy"),lty=1:4,xjust=1,yjust=1) #if you want to interact, do this: #legend(locator(1),c("Acquaintance","Harmonious","Identical"),lty=1:3) #or do this to place the lables, one after the other text(locator(4),c("Acquaintance","Harmony","Identical","Entropy")) } redraw<-function() { dsname<-updateDataSet(run) buildGraph(dsname) } new<-function() { dsname<-updateDataSet(run) buildGraph(dsname) run <<- run+1 } new() ---------------------------------------- This will cycle through all the DataCulture files and make a plot for each one. It lays the plots onto the same "piece of paper". It begins with DataCulture1 and counts up to the last DataSet*. This one is a little tricky because I used R's "locator()" function. That is a special thing for commenting plots (putting text in) and you might comment it out with a pound sign if it bugs you. It has the effect that, after the plot is created--the lines are drawn--then R stops and waits for you to click 4 times to tell it where to put the words. You have to click 4 times because, in this example, it wants you to put in the text for 4 words, Acquaintance, Harmony, Identical, Entropy. There's a comment in the code that says you can use the automatically generated R legend instead. To use that program, I copy the program RHuckFigure2.R into the directory where I have all the DataCulture files, I start R > R and then I read in that code > source("RHuckFigure2.R") then you see a picture, and when you want the next one, type > new() This code causes a plot that overlays several variables onto the same time series.