Davida Rosenstrauch

Dec 14, 2020

7 min read

How to Stop GitHub From Making You Cry

The first thing I learned as a data science bootcamp student was how to use Git and GitHub. And then I cried. Working off of the anxiety I felt starting this daunting program, I had done prep work on Python, statistics, linear algebra, basic model-building, and more, but on Day 1 we were learning something I’d literally never seen before. This certainly didn’t help with said anxiety.

Since then, I’ve figured out enough about GitHub to get by, and thankfully I haven’t cried over it again, but there is still plenty that trips me up about it. Some frustration has also come because from feeling like I shouldn’t “waste my time” figuring out GitHub details that were tripping me up when there’s so much “actual” data science I need to learn to keep up with my program. Why should I be fiddling around with commits and repos when I came here to learn how to build predictive models?

Sharing your work is a huge part of being a data scientist. Whether it’s in the context of working on a project with a partner, sharing projects with a current or potential employer, or organizing work for your own benefit, knowing how to maintain and update a GitHub account is a key component of functioning as a data scientist within the larger field.

I’ve walked through below the basic process of forking, cloning, and editing a repository, highlighting a few snags that, until looking into them for this blog, kept tripping me up. Selfishly, I plan to use this as my own cheat sheet to return to next time I need a reminder of how to fix something. But on a more generous note, I also hope it helps anyone out there who has been faced with similar frustrations in their work. Please note that this is by no means comprehensive, and if you already have a good basic understanding of GitHub, you may want to be on your merry way, but if you’re new, or, like me, still get a little tripped up when even the slightest thing goes a way you weren’t expecting, I hope this is helpful.

Forking and Cloning a Repository

When: You’ll use this technique any time you want to start with a repo someone else has begun, but you want to make your own separate changes without affecting the original.

How:

· Once you’re in the repo you want, click “Fork”. This will create a separate, identical repo under your own name in your own account.

· After a few seconds, you’ll be automatically redirected to said personal repo copy.

· Great! Now you have a repo under your own account that has all the same stuff as the original one. But how do you download it locally? First, in your terminal, make sure you are in the local folder into which you want to download the repo. I’m not going to go into detail here on how to navigate using your terminal, but there are lots of great resources out there to help with this!

Once you’re in the right folder, copy the url of your forked repo, either straight from the url bar or by clicking “Code” and copying the url that pops up. Then, in your terminal, type git clone followed by the pasted url. Once you press enter, all repo contents will be on your local computer as a file.

Adding, Committing, and Pushing Updates to your Repositories

Don’t be like Stevie. Always commit.

When: Any time you want to update anything in your repository. This process not only updates your repo, but it also represents a snapshot of your most recent changes. This way anyone looking at your repo can see your process. This is helpful for people to become familiar with your work process, or, if you’re working as part of a group, for your partner(s) to see what you changed at a glance. Keep this in mind when we’re talking about commit messages later on.

On a recent project, my instructor rightfully pointed out that I had uploaded all of my files instead of properly adding, committing, and pushing to the repo. Sure, this is handy if you’re, oh, I don’t know, working on minimal sleep a few hours before your project is due, not very adept at GitHub processes, and desperate for any method that will give you a submittable project in as little time as possible, but it does take away from anyone else’s ability to see your work process. And frankly, once you get the hang of it, it really is a lot easier.

How:

Pick this up either right after you’ve completed the forking/cloning process (after all, the whole point of forking and cloning your repo was to be able to make updates without affecting the original one), or as you’ve created a new repo for yourself from scratch. Once you’ve update your cloned files, or created some new ones, it is time to add or update those files to your repo. This actually takes place in 3 steps: adding, committing, and pushing. Adding is basically staging your changes in preparation for committing. Committing creates a new version of your code, specifically updated from the last version. Finally, you need to actually push your local code, or upload it up to GitHub so your repo is up-to-date with your most recent local copy.

My bootcamp coach shared an analogy that I’m probably misremembering in detail (sorry, Justin), but really helped me understand the difference between these 3 elements. Think of your “add” action as buying a train ticket, committing as arriving at the station, and pushing as actually getting on the train. If you skip one of those elements, you’re not getting to your destination, and they each bring you one step closer to that goal.

Looks someone forgot to push.

In terms of actual code, you can do all of these actions in three separate steps with:

git add
git commit -m "commit message"
git push

“Commit message” refers to a note that will accompany this commit to indicate what updates you’ve made in that commit. This is an important part of the process to keep track of what changes you made throughout this process.

You can also combine the adding and committing actions and condense to 2 lines:

git commit -am “commit message”
git push

This is a handy shortcut, but I’ve run into a few snags where this may miss some of the updates I’ve made, which then don’t get added, committed, or pushed back up to GitHub from my local drive. I’ve gotten into the habit of starting each commit with

git add .

, which, with that extra period, will add everything that has been updated in that repo. This has been particularly helpful if I’ve created a new file within the repo as opposed to simply editing an existing one.

And voila! If you look back at your repo on GitHub, you’ll see not only that your files have been updated, but that your commit message is right there on the cover page, next to any files or folders that were affected in your last commit.

Checking Status

When: Any time you’re unsure of a status of any file in your repo.

How:

git status

That’s it!

For some more detail, this will tell you the status of your locally-saved files as compared to your repo. It may tell you some have been updated (and will list them), which will tell you you’ll need to add/commit/push. Or it may tell you that you’ve done one or two of those actions, but you still need to remember to actually push it back up to your repo. Or maybe it will tell you that everything you have saved locally is perfectly up-to-date with your repo, which tells you there’s nothing to push back up. It’s a handy tool at any point during your project to keep track of what you have saved locally and what is out there for the world to see on your repo.

There is a lot more that is worth covering as related to GitHub, and plenty that I still haven’t figured out for myself. But these basics are a really helpful foundation for getting started, and should make everything that comes next a little easier. I’ve technically known all of these elements for a while now, but it wasn’t until really looking through them in detail while writing this blog and realizing that wait, I wasn’t really adding properly before my commits, or right, you still have to push after you’ve committed, that they really clicked for me. I’m looking forward (ok, maybe that’s optimistic) for the next time I run into a GitHub snag (and it won’t be long, I’m sure), because I’m really excited to figure out the solution and learn another element in such an important component of data science.

You, not crying

Resources