Social Data Analytics 501 @ PSU
https://burtmonroe.github.io/SoDA501
COVID-19 REMOTE LEARNING 2020
- Synchronous meetings will occur via Zoom at the regularly scheduled course time.
- Slides, recordings, lecture notes, and readings will be available via the course Canvas site (as will grades).
- Tutorials / notebooks / exercise solutions will generally still be hosted here.
- Course discussion chat is hosted on Slack, under team name SODA-501-2020.
Syllabus
Guest Speakers & Talks of interest (Spring 2020)
- Jan 28: Nilam Ram (Human Development & Family Studies) - “Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them.”
- Jan 31 (FRIDAY): C-SoDA Research Roundup
- Feb 4: Kari Lock Morgan (Statistics) – Innovations in Research Design for Causal Inference
- Feb 11: Timothy Brick (Human Development & Family Studies) - “Associations Between Slow-and Fast-Timescale Indicators of Emotional Functioning.”
- Feb 14 (FRIDAY): C-SoDA Speaker: Alexandra Chouldechova (Statistics & Public Policy, CMU)
- Feb 18: Suzanna Linn (Political Science) - “Automated Text Classification of News Articles: A Practical Guide.”
- Feb 25: (SPEAKER POSTPONED)
- Mar 2 (MONDAY): C-SoDA Speaker: Adriana Crespo Tenorio (Facebook)
- Mar 3: Guangqing Chi (Rural Sociology) - “The Representativeness of Twitter Data”
- Mar 10: SPRING BREAK
- I ANTICIPATE ALL OF THE FOLLOWING TO BE CANCELED DUE TO COVID-19 SHIFT TO REMOTE DELIVERY
Mar 24: Bruce Desmarais (Political Science) - “Latent Space Networks.”
Mar 27 (FRIDAY): C-SoDA Speaker: Sandra González-Bailón (Communication, UPenn)
Apr 7: Luke Glowacki (Anthropology) - “Surveying Nomadic Pastoralists”
Apr 14: Lee Giles (IST / Informatics / CSE) - “Deep Learning and Automata”
Apr 21: Soundar Kumara (Industrial & Mechanical Eng) - “Uncovering the Effect of Dominant Attributes on Community Topology: A Case of Facebook Networks”
May 2 (SATURDAY): New Faces in Political Methodology Conference
Guest Speakers & Talks of interest (Spring 2019)
- Jan 28 (MONDAY): C-SoDA Open House / Lightning Talks / Posters
- Jan 29: Sarah Rajtmajer (IST / Informatics / Rock Ethics) - “Beyond the Crisis: Research Ethics in the Age of Open Data”
- Feb 5: Scott Yabiku (Sociology) (2:40 start) - “Comparing Modes of Retrospective Activity Space Data Collection.”
- Feb 12: (CLASS CANCELED BY WEATHER)
- Feb 19: Kenneth (Ting-Hao) Huang (IST / Informatics) - “Crowdsourcing and Crowd-AI Systems.”
- Feb 21 (THURSDAY): PRI Innovative Methods Working Group Speaker, Scott Yabiku - “Concepts for Big Data Analysis and Examples from Penn State’s Advanced CyberInfrastructure (ICS-ACI).”
- Feb 25 (MONDAY): C-SoDA Speaker, Ben Hansen (Statistics, Michigan) - “Infinite Regression Discontinuity”
- Feb 26 (AFTER CLASS): BERD Recent Topics in Research Methods Seminar Series, Nilam Ram (HDFS) - ““Analysis of Experience Sampling and Ecological Momentary Assessment Data, Part 2: Using Multilevel Models of Intraindividual Covariation.” 4:00
- Mar 5: SPRING BREAK
- Mar 19: Anthony Robinson (Geography) (1:00 start) - “Exploring the Presence of Absence in Big Social Data.”
- Apr 1: (MONDAY): C-SoDA Speaker, Amelia Hoover Green (Politics, Drexel) / ICS Symposium Day
- Apr 9: Corina Graif (Criminology / Sociology) - “Crime, Neighborhoods, Networks, and Modern Urban Data.”
- Apr 10 (WEDNESDAY): C-SoDA Speaker, Yu-Ru Lin (Computing & Information, Pittsburgh)
- Apr 16: Amulya Yadav (IST / Informatics) - “AI for Social Good.”
- *Apr 25: (THURSDAY): Clogg Lecture, Michael Sobel.
- Apr 27 (SATURDAY): New Faces in Political Methodology Conference
Guest Speakers (Spring 2018)
- Feb 1: Bing Pan (RPTM), “Big Data and Forecasting in Tourism.”
- Feb 8: Clio Andris (GEOG), “What AirBnB and Yelp Can Teach Us about Human Behavior in Cities.”
- Feb 15: Conrad Tucker (IE), “Cybersecurity Policies and their Impact on Dynamic Data Driven Application Systems.” (postponed)
- Feb 22: David Reitter (IST), “Alignment in Web-Based Dialogue: Studies in Big Data Computational Linguistics.”
- Mar 15: Daniel DellaPosta (SOC), “Network Closure and Integration in the Mid-20th Century Mafia.”
- Mar 29: Naomi Altman (STAT), “Generalizing Principal Components Analysis.”
- Apr 5: Prasenjit Mitra (IST), “Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages.”
- Apr 19: (Speaker postponed)
Materials (Slides, Tutorials, Examples, Explainers)
-
Social Data Analytics Glossary
- Social Scientific Concepts in the Data Layer
- Measurement validity and reliability
- Observational data, unobtrusive / nonreactive measures, latent variables
- Sampling and surveys
- Causal inference, experiments, observational designs
- Human subjects research
-
“Big Data,” “Data Science,” “Analytics,” and “Social Data Analytics”
-
Ethics and scientific responsibility (privacy, bias, transparency, reproducibility)
- Data wrangling and manipulation
- Data formats, open data, APIs
- Web scraping
- Locality sensitive hashing
- Data compression, dimensionality reduction
- Record linkage, entity resolution
- Split-Apply-Combine and Map-Reduce
- Regularization, shrinkage, priors (Primer)
- Kernels, convolution, smoothing
- Data representation and interpretation
- Vector space models, information retrieval, embeddings
- Similarity, distance, and similar measures
- Eigenvectors (Eigendecomposition, Geometry, PCA, Network Centrality, PageRank, Markov Chains)
- Matrix decompositions, latent variable measurement
- Social Data Structures and Channels
- Text
- Networks
- Space and Time
- Spatial Data
- Time Series Concepts (Primer)
- Hierarchy
- Tools
- Bash, Git, cluster computing (ACI, XSEDE), cloud computing (AWS, Azure, Jetstream)
- R
- Python
- “Big data” tools (Hadoop, Spark, …)
- Deep learning tools (TensorFlow, H2O, Keras, …)
- Advanced data science tools (Scala, Julia, Haskell, Clojure, …)
- Glossary