Book datasets
All datasets used in the book are described in Appendix A of the book. Some datasets are provided by R packages; therefore are not provided separately as data files in this companion website. Some other data files used in he code examples throughout the book are either my own or they are openly available data provided as is or converted to proper formats for your convenience. These files are provided in this companion website and described below.
Network databases
In addition to data files provided here data collections from several research institutions or collectives are available publicly. Following are some links to these collections. Please let me know if you are aware of any other collections that may be useful to readers:
- http://moreno.ss.uci.edu/data.html
- http://konect.uni-koblenz.de/networks/
- http://networkdata.ics.uci.edu/index.html
- http://www.casos.cs.cmu.edu/computational_tools/datasets/index.html
- https://snap.stanford.edu/data/
- https://sparse.tamu.edu/Pajek
Desriptions of data files provided
A significant majority of the datasets used in the book are the ones mentioned in the primary reference book by Wasserman and Faust (Wasserman and Faust 1994). They are explained in detail below. You may download most of them from the book website at http://appliedsna.mgencer.com/data/
Padgett’s Florentine families
This dataset was compiled by John F. Padgett (2010). He collected the relationships between medieval Florentine families. Among these relational datasets, the one used in the book concerns marriages among the families. The data file available here in Pajek format (link) provides the marriage network data and another one provides the wealth vector of families (link).
Once you save the data files in your own computer under a directory named “data” within your working directory in R, you can read the dataset and wealth information as follows, using read.graph() function from the igraph library:
require(igraph)
g <- read.graph("data/PadgettMarital.net", format="pajek")
wealthData <- read.csv("data/PadgettWealth.csv",header=TRUE)
Krackhardt’s network of high-tech managers
Compiled by David Krackhardt, the dataset shows the advisory relationships among 21 managers working at a high-technology machinery manufacturing company (Krackhardt 1987). This dataset is included in NetData, an R package; So it is not provided as a separate data file here. The data can be loaded as:
require(NetData) #https://cran.r-project.org/web/packages/NetData/NetData.pdf
data(kracknets)
Please see Appendix A of the book for converting and using this data set.
International trade
This dataset shows the import/export relations between 24 prominent countries. It is one of the basic datsets used in Wasserman and Faust book (Wasserman and Faust 1994). The original data can be found at http://vlado.fmf.uni-lj.si/pub/networks/data/WaFa/default.htm. A copy is provided here for your convenience (link) . The data format is suitable for network package but not igraph. Thus you can import the data as follows:
require(network)
require(sna)
n<-read.paj("data/CountriesTrade.net")
n1 <- n$networks$ws0
# uncomment following line to plot
#plot.network(n1,displaylabels=T)
After that you will need to make a format conversion to use it in igraph package. The intergraph package provides functions for such conversion:
require(intergraph)
require(igraph)
g<-asIgraph(n1)
#uncomment the following line to plot
#plot(g)
Zachary’s Karate club dataset
The data comes from a research in the field of anthropology, collected by Wayne Zachary (Zachary 1977). The data includes several relations between 34 members of a karate club in a university. Zachary studied a social duality what he labels as conflict and fission. There were two factions in this social group. Zachary have observed the group over some time and have found that the ties within factions became more intense (fission) whereas whatever ties existed between member from different groups has vanished over time (conflict).
This dataset is available from an R library called igraphdata, as igraph objects. So it is not provided here as a separate file. It is an undirected and weighted dataset. It can be loaded and used as follows
require(igraphdata)
data(karate)
require(igraph)
summary(karate)
help(karate) # Please read the explanations for the dataset
plot(karate)
Coleman’s highschool boys’ friendship network (longitudinal)
This a dataset collected by Coleman (Coleman a,1964) and it shows the friendship ties between high school boys as assessed by the question: “What fellows here in school do you go around with most often?”. The measurement was repeated twice, in 1957 and 1958 and the data shows the change in friendship network between the two years. The dataset is provided in both ggraph and sna packages, so it is not provided here as a separate file. But note that their data object classes are different but can be converted to igraph:
require(ggraph)
data(highschool)
class(highschool)
## [1] "data.frame"
library(igraph)
g1 <- graph_from_data_frame(highschool)
#plot(g1)
require(sna)
data(coleman)
class(coleman)
## [1] "array"
g1957<-graph_from_adjacency_matrix(coleman[1,,])
#graph for 1957 converted to igraph
plot(g1957)
Victor Hugo’s les Miserables characters
This is a dataset about the “scene co-appearance” network of the characters in Victor Hugo’s famous novel “Les Miserables”. The original data is retrieved from the Stanford network database (Knuth 1993). This network has 77 actors/nodes, that appear throughout this voluminous novel. The data is provided in GML format (link). It can be loaded and used as follows:
require(igraph) g <- read.graph(“data/lesmiserables.gml”, format=”gml”) gu<-simplify(as.undirected(g)) #preferably plot(gu)
US air traffic dataset
This is a relatively larger network example. The Air Traffic network data comes from the Koblenz Network. It contains the data about the airline connection between US airports, with 1227 airport in the data. See: http://konect.uni-koblenz.de/networks/maayan-faa. A copy of data set files are providede here for your convenience (link). Use as follows:
require(igraph)
g<-read.graph("data/maayan-faa/out.maayan-faa",format="edgelist")
gu<-simplify(as.undirected(g)) #preferably
lot(gu)
References
Coleman, J.S.. (1964). Introduction to Mathematical Sociology.NewYork, GlencoeCox, T. F, & Cox, M.A. (2000). Multidimensional Scaling. Chapman Hall/CRC.
Knuth, D E. (1993). Les miserables: coappearance network of characters in the novel les miserables. The Stanford GraphBase: A Platform for Combinatorial Computing.
Krackhardt, D. (1987). Cognitive social structures. Social Networks 9 (2), 109–34. https://doi.org/10.1016/0378-8733(87)90009-8.
Padgett, J. F. (2010). Open Elite? Social Mobility, Marriage, and Family in Florence, 1282–1494. Renaissance Quarterly 63 (2), 357–411. https://doi.org/10.1086/655230.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge university press.
Zachary, W. W. (1977). An Information Flow Model for Conflict and Fission in Small Groups. Journal of Anthropological Research 33 (4), 452–73.