Appendix A: GitHub

GitHub can be a very useful platform to arrange, store, and share the code of your analytics projects even if it is typically used for collaborative software development. If you are unfamiliar with Git or GitHub, the steps below will assist you in getting started.

Initiate a new repository

  1. Log in to your GitHub account and click on the plus sign in the upper right corner. From the drop-down menu select New repository.
  2. Give your repository a name, for example, bigdatastat. Then, click on the big green button, Create repository. You have just created a new repository.
  3. Open Rstudio, and and navigate to a place on your hard-disk where you want to have the local copy of your repository.
  4. Then create the local repository as suggested by GitHub (see the page shown right after you have clicked on Create repository: “…or create a new repository on the command line”). In order to do so, you have to switch to the Terminal window in RStudio and type (or copy and paste) the commands as given by GitHub. This should look similar to the following code chunk:
echo "# bigdatastat" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin \
https://github.com/YOUR-GITHUB-ACCOUNTNAME/bigdatastat.git
git push -u origin master

Remember to replace YOUR-GITHUB-ACCOUNTNAME with your GitHub account name, before running the code above.

  1. Refresh the page of your newly created GitHub repository. You should now see the result of your first commit.
  2. Open README.md in RStudio, and add a few words describing what this repository is all about.

Clone this book’s repository

  1. In RStudio, navigate to a folder on your hard-disk where you want to have a local copy of this book’s GitHub repository.
  2. Open a new browser window, and go to https://github.com/umatter/BigData.
  3. Click on Clone or download and copy the link.
  4. In RStudio, switch to the Terminal, and type the following command (pasting the copied link).
git clone https://github.com/umatter/BigData.git

You now have a local copy of the repository which is linked to the one on GitHub. You can see this by changing to the newly created directory, containing the local copy of the repository:

cd BigData

Whenever there are some updates to the book’s repository on GitHub, you can update your local copy with:

git pull

(Make sure you are in the BigData folder when running git pull.)

Fork this book’s repository

  1. Go to https://github.com/umatter/BigData, and click on the ‘Fork’ button in the upper-right corner (follow the instructions).

  2. Clone the forked repository (see the cloning of a repository above for details). Assuming you called your forked repository BigData-forked, you run the following command in the terminal (replacing <yourgithubusername>):

git clone https://github.com/`<yourgithubusername>`/BigData-forked.git
  1. Switch into the newly created directory:
cd BigData-forked
  1. Set a remote connection to the original repository:
git remote add upstream https://github.com/umatter/BigData.git

You can verify the remotes of your local clone of your forked repository as follows:

git remote -v

You should see something like

origin  https://github.com/<yourgithubusername>/BigData-forked.git (fetch)
origin  https://github.com/<yourgithubusername>/BigData-forked.git (push)
upstream    https://github.com/umatter/BigData.git (fetch)
upstream    https://github.com/umatter/BigData.git (push)
  1. Fetch changes from the original repository. New material has been added to the original book repository, and you want to merge it with your forked repository. In order to do so, you first fetch the changes from the original repository:
git fetch upstream
  1. Make sure you are on the master branch of your local repository:
git checkout master
  1. Merge the changes fetched from the original repo with the master of your (local clone of the) forked repository:
git merge upstream/master
  1. Push the changes to your forked repository on GitHub:
git push

Now your forked repo on GitHub also contains the commits (changes) in the original repository. If you make changes to the files in your forked repo, you can add, commit, and push them as in any repository. Example: open README.md in a text editor (e.g. RStudio), add # HELLO WORLD to the last line of README.md, and save the changes. Then:

git add README.md
git commit -m "hello world"
git push