Sparklyr Cheat Sheet



Below is a list of contributors to this blog.

NameRoleBio
Joseph RickertAmbassador at LargeJoseph is RStudio’s “Ambassador at Large” for all things R, is the chief editor of the R Views blog. He works with the rest of the RStudio team and the R Consortium to promote open source activities, the R language and the R Community. Joseph also represents RStudio on the R Consortium board of directors.
Mine Çetinkaya-RundelData Scientist and Professional EducatorMine is Professional Educator at RStudio and Assistant Professor of the Practice at Duke University. Her work focuses on innovation in statistics pedagogy, with an emphasis on computation, reproducible research, open-source education, and student-centered learning. She is the author of three open-source introductory statistics textbooks as part of the OpenIntro project and teaches the popular Statistics with R MOOC on Coursera.
Jonathan RegensteinEnterprise AdvocateJonathan studied International Relations as an undergraduate at Harvard, worked in finance at JP Morgan and then did graduate work in Political Economy at Emory University before joining RStudio.
Sean LoppSolutions EngineerSean leads teams to create useful, enjoyable products. Before RStudio he was a data scientist and worked on alternative vehicle models at NREL, infant sleep dynamics, and originally studied mathematics. He lives outside Denver, CO and skis and bikes with his family.
Nathan StephensDirector of Solutions EngineeringNathan has a background in analytic solutions and consulting. He has experience building data science teams, architecting analytic infrastructure, and delivering innovative data products. He is a long time user of R.
Edgar RuizSolutions EngineerEdgar has a background in deploying enterprise reporting and business intelligence solutions. He is the author of multiple articles and blog posts sharing analytics insights and server infrastructure for data science. Recently, Edgar authored the “Data Science on Spark using sparklyr” cheat sheet.
James BlairSolutions EngineerJames holds a master’s degree in data science from the University of the Pacific and works as a solutions engineer. His past consulting work centered around helping businesses derive insight from data assets by leveraging R. Outside of R and data science, James’s interests include spending time with his wife and children, cooking, camping, cycling, racquetball, and exquisite food. Also, he never turns down a funnel cake.
Andrie de VriesSolutions EngineerAndrie started using R in 2009 for market research statistics. He is a regular contributor to StackOverflow and co-author of “R for Dummies”. He contributed several R packages to CRAN, including miniCRAN, checkpoint, ggdendro, sss, and surveydata, and regularly speaks at industry events and R user groups. He is a qualified yoga teacher, and continues to study yoga therapy annually in Chennai, India.
Greg WilsonData Scientist and Professional EducatorGreg Wilson has been a programmer, a teacher, and an author, and is now combining all three roles as a data scientist and professional educator at RStudio. He was the co-founder of Software Carpentry, a non-profit organization that teaches basic computing skills to researchers, and co-editor of “Beautiful Code”, “Making Software”, and “The Architecture of Open Source Applications”. In his spare time, he writes children’s books and is learning to play the cello.
Max KuhnSoftware EngineerMax Kuhn is a software engineer at RStudio. He is currently working on improving R’s modeling capabilities. He was a Director of Nonclinical Statistics at Pfizer Global R&D in Connecticut. He was applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. Max is the author of a number of R packages for techniques in machine learning and reproducible research and is an Associate Editor for the Journal of Statistical Software. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association, which recognizes the best book reviewed in Technometrics in 2015. Their new book, Feature Engineering and Selection, was released in 2019.
Alex GoldSolutions EngineerAlex is a longtime data nerd who worked on economic policy research, electoral politics, and healthcare at various times. He enjoys cooking and practicing martial arts and handstands in his spare time.
Cole ArendtSolutions EngineerCole is a solutions engineer and has a background in mathematics and big data. He has architected and managed analytic frameworks for reporting that use R, SAS, and Tableau. He has a diverse set of skills and interests that include soccer, philosophy, economics, and open source software. Cole lives in Raleigh, NC with his wife and two young children.
Garrett GrolemundData Scientist and Professional EducatorGarrett specializes in teaching, data science, and teaching data science. He has a PhD in Statistics, wrote the popular lubridate package, invented the RStudio cheatsheets, and has (co)authored three books: Hands-On Programming with R, R for Data Science, and R Markdown: The Definitive Guide.
Hadrien DykielCustomer Success RepresentativeHadrien is an avid adventurer who grew up training horses. He now continues to ride on weekends along with trail running and working towards his private pilot’s license. He fell in love with data science and R after college and has worked in various industries including tech and insurance. He is now a member of the customer success team at RStudio.
Kelly O’BriantSolutions EngineerKelly O’Briant is a solutions engineer at RStudio interested in configuration and workflow management with a passion for R administration.
Brian LawCustomer SuccessBrian helps people improve their work lives by getting more out of the RStudio tools. He has a Ph.D. in Political Science, a love of the Detroit Red Wings, and several raincoats that are necessary in the Pacific Northwest.

Once you’ve gotten started learning R, you can expand your skills by exploring many of the specialized capabilities of R. Here are 6 of the most common areas that people who already have some experience in R find particularly rewarding to learn.

  • Grab some cheat sheets. No one can possibly remember all the functions and arguments for every R package, which is why cheat sheets were invented. RStudio publishes a free collection of cheat sheets for the most popular R features and packages to help jog your memory. If you decide you’d like to collect them all, you may clone the cheat sheet github repository.

  • Learn to get help. Everyone gets stuck. Learning where and how to ask for R help is a powerful skill to hone. The Tidyverse site offers some expert advice for how to help others help you. One package you’ll grow to love is the reprex package for creating reproducible R code examples. Read through the reprex articles, which feature loads of animated gifs to illustrate the steps like Magic reprex and Using datapasta with reprex. Where to ask for help? The RStudio Community is a warm and welcoming online discussion forum to ask (and answer!) any questions about using R.

  • Improve your visualizations. You may already know how to create a basic plot using ggplot2, but can you build one that makes your audience go “Wow?” You can start by expanding your knowledge of the Grammar of Graphics and ggplot2 by reading Hadley Wickham’s (2016) book, ggplot2: Elegant Graphics for Data Analysis. Paper and Kindle versions are available on Amazon for the second edition of the book. The third edition is in-progress and can be viewed for free online, with the source files on GitHub. If you’d like Hadley to personally explain his philosophy of using ggplot2 in his data science work, check out Hadley’s talk from OpenVisConf 2017, The Role of Visualiation in Exploratory Data Analysis. Bookmark the updated R Graphics Cookbook by Winston Chang (2018) too; it is filled with recipes that tackle specific ggplot2 problems.

  • Develop interactive applications with htmlwidgets and Shiny. One concrete way to communicate your analyses better is to make your visualizations interactive. You can learn how to add browser-based interactivity to your graphics with just a few lines of code at www.htmlwidgets.org. If your interactive needs demand help from R code that needs to run on a server, learn how to write Shiny applications at shiny.rstudio.com, or follow along as Wickham (2020) writes the new Mastering Shiny book. Both approaches can be integrated with R Markdown to create polished interactive dashboards using the flexdashboard package.

  • Simplify your model explorations with tidymodels. Much of data science involves modeling, but each modeling package seems to invent its own interface and arguments. Enter tidymodels, a meta-package for modeling and analysis that shares the underlying design philosophy, grammar, and data structures of the tidyverse. If you previously have used caret for a uniform modeling interface, the tidymodels package parsnip is its more up-to-date child. While this project is still under development, it promises to dramatically simplify model exploration. RStudio’s Edgar Ruiz wrote up A Gentle Introduction to tidymodels to get you started.

  • Explore other specialized packages. R attracts data scientists because of its more than 13,000 packages that address nearly every use case. If you’re interested in genomics, you’ll want to spend some time learning the bioconductor collection of packages. If you’re working with Big Data on Spark clusters, check out sparklyr. If you want to dive into finance, you’ll probably want to start with quantmod. To find out what packages you should explore, we recommend some of the topic-based package catalogs such as Awesome R or the CRAN task views.

Sparklyr cheat sheet pdf


Books & packages referenced

Bryan, Jennifer, Jim Hester, David Robinson, and Hadley Wickham. 2019. Reprex: Prepare Reproducible Example Code via the Clipboard. https://CRAN.R-project.org/package=reprex.

Chang, Winston. 2018. R Graphics Cookbook: Practical Recipes for Visualizing Data. O’Reilly Media. https://r-graphics.org/.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2019. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Iannone, Richard, JJ Allaire, and Barbara Borges. 2018. Flexdashboard: R Markdown Format for Flexible Dashboards. https://CRAN.R-project.org/package=flexdashboard.

Sheet

Sparklyr Cheat Sheet Pdf

Weekly R-Spatial Cheat Sheet (due by 11:59 pm) 4 EAS 543-FALL 2019 corderos@umich.edu. Spark with Sparklyr, 14) Tidy evaluation with rlang, 15) caret package. RStudio IDE cheat sheet: openSharedProject: Open a project shared with you: openShinyCheatSheet: Build web applications with Shiny: openSourceDoc: Open an existing file: openSourceDocNewColumn: Open an existing file in a new column: openSparklyrCheatSheet: Interfacing Apache Spark with sparklyr: packratBootstrap: Use packrat with this project.

Kuhn, Max, and Davis Vaughan. 2018. Parsnip: A Common Api to Modeling and Analysis Functions. https://CRAN.R-project.org/package=parsnip.

Sparklyr Cheat Sheet

Sparklyr Cheat Sheet

Max, Kuhn, and Hadley Wickham. 2018. Tidymodels: Easily Install and Load the ’Tidymodels’ Packages. https://CRAN.R-project.org/package=tidymodels.

Sheet

Sparklyr Cheat Sheet Printable

Ryan, Jeffrey A., and Joshua M. Ulrich. 2018. Quantmod: Quantitative Financial Modelling Framework. https://CRAN.R-project.org/package=quantmod.

Rstudio Sparklyr Cheat Sheet

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer. https://ggplot2-book.org/.

Sparklyr Cheat Sheets

———. 2020. Mastering Shiny. O’Reilly Media. https://mastering-shiny.org/.

Sparklyr Cheat Sheet Fortnite

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.