Archive

All published posts

gatsby-remark-embedder

Inspired by discourse’s link expansion I am rolling out expansions for one line links on the blog waylonwalker. I was able to find a gatsby plugin gatsby-remark-embedder that expands one line links for social cards for popular platforms like twitter and YouTube through a repose from Kyle Mathews to my tweet.

https://twitter.com/kylemathews/status/1329817928666005504

This covers a couple of use cases I have with very little effort.

...

1 min read

Expand One Line Links

I wanted a super simple way to cross-link blog posts that require as little effort as possible, yet still looks good in vanilla markdown in GitHub. I have been using a snippet that puts HTML into the markdown. While this works, it’s more manual/difficult for me does not look the best, and does not read well as

The new card should be fully automated to expand with title, description, and cover image. Bonus if I am able to attach a comment behind it.

If you can call it a card 🤣. This card was just an image wrapped in an anchor tag and a paragraph tag. I found this was the most consistent way to get an image narrower and centered in both GitHub and dev.to.

...

Find and Replace in the Terminal.

grepr ¶ # grepr() {grep -iRl "$1" | xargs sed -i "s/$1/$2/g"} ```bash grepr() {grep -iRl "$1" | xargs sed -i "s/$1/$2/g"} grepd ¶ # grepd() {grep -iRl "$1" | xargs sed -i "/^$1/d"} CocSearch ¶ # :CocSearch published: false -g *.md

Resume Tips

customize for the job Why are you a good fit? What will you bring to the role? Give real outcomes give real experience Stop tech vomiting if you link to GitHub Make a profile readme Guide me to your best work have some activity if you link to LinkedIn Provide some benefit that is not on your resume Have a logical flow of experience (dont make me hunt for past experience) Keep it under 2 pages Who you know. Reference real experience Deployed 12 data pipelines with over 500 nodes to process 200GB of data at a Fortune 100 company vs Knowledge of Data Engineering methodology with python EC2 Dont be so fluffy
1 min read

Codeit Bro Interview

use this profile image

Please share your professional role as a data scientist? [Also feel free to share about your personal projects, publications, etc.]

I graduated with a Mechanical Engineering Degree 8 years ago. Much of my work early in my career was wrapped around analyzing larger datasets for my group to understand quality, drive changes to improve quality or prove that quality was already good.

...

7 min read

reasons-to-kedro

There are many reasons that you should be using kedro. If you are on a team of Data Scientists/Data Engineers processing DataFrames from many data sources should be considering a pipeline framework. Kedro is a great option that provides many benefits for teams to collaborate, develop, and deploy data pipelines

What is Kedro

Kedro makes it super easy to get started with their cli that utilizes cookiecutter under the hood.

...

Reasons to Kedro

Reasons to Kedro ¶ # collaboration Sharable catalog small nodes over monolithic notebooks catalog easily load anything without needing to run No need to write read/write code pipeline No need to keep execution order in your head easily run a slice of a pipeline plugins pip install make your own hooks flexible expandable cli Reasons Not to Kedro ¶ # Already utilizing another DAG framework Data is not in a widely supported format Micro short-lived project Large Project / Deadline Use a lower profile project to learn first Team not willing to change Need minimal dependencies God Project - kedro owns everything??
1 min read

What's New in Kedro 0.16.6

Kedro 0.16.6 is out! Let’s take a look through the release notes

This is really exciting to see more deployment options coming from the kedro team. It really shows the power of the framework. The power of some of these orchestrations options is incredible.

Most of them hinge on a sweet combination of the kedro cli, docker image, and the pipeline knowing your nodes dependencies.

...

A brain dump of stories

I started making stories as kind of a brain dump a few times per day and posting them to [LinkedIn](https://www.linkedin.com/in/waylonwalker/(https://www.linkedin.com/in/waylonwalker/). Here are the last 11 days of stories.

I store all the stories on my website with the hopes of doing something with them on my own platform eventually. For now it makes it easy to make these posts.

cd static/stories ls | xargs -I {} echo '![](https://waylonwalker.com/stories/{})'

Stories 10-10-2020 - 10-21-2020 #

Fix git commit author

I was 20 commits into a hackoberfest PR when I suddenly realized they they all had my work email on them instead of my personal email 😱. This is the story of how I corrected my email address on 19 individual commits after already submitting for a PR.

stop the bleeding

Before anything else set the email correctly!

...

3 min read
git

Designing a "Router" for kedro

I released a router-like plugin for kedro back in April 2020. This was not the first design, the idea actually came from one of the QB folks who taught me kedro nearly a year before. We were assembling our pipelines with something called nodes_global. It worked fairly well but did have some issues around being set as a global variable.

But…

One thing in particular that it did not lend itself well to was being able to create a packagable pipeline that I could pip install and append into any of my existing pipelines. Something I am still trying to work out, maybe I don’t need this. I think I have it working for our internal pipelines and it seems like the way to go, but we don’t necessarily end up using it.

...

4 min read

Reclaim memory usage in Jupyter

Today I ran into an issue where we had a one-off script that just needed to work, but it was just chewing threw memory like nothing.

It started with a colleague asking me How do I clear the memory in a Jupyter notebook, these are the steps we took to debug the issue and free up some memory in their notebook.

How do I clear the memory in a Jupyter notebook?

...

3 min read

Strip Trailing Whitespace from Git projects

A common linting error thrown by various linters is for trailing whitespace. I most often use flake8. I generally have [pre-commit](https://waylonwalker.com/pre-commit-is-awesome hooks setup to strip this, but sometimes I run into situations where I jump into a project without it, and my editor lights up with errors. A simple fix is to run this one-liner.

bash

git grep -I --name-only -z -e '' | xargs -0 sed -i -e 's/[ \t]\+\(\r\?\)$/\1/'

pre-commit article

...

Chrome Extensions I use

There are many useful chrome extensions out there. I probably have way too many installed, here are four that I am currently using.

This post was inspired from Chris over at daily-dev-tips

Love it or hate it passwords are hard to manage. Everyone needs a password manager to avoid the dreaded password reuse, and to be able to quickly rotate them with a service. I use lastpass, thus it’s browser extension is my most used extension.

...

2 min read

Creating Reusable Bash Scripts

Bash is a language that is quite useful for automation no matter what language you write in. Bash can do so many powerful system-level tasks. Even if you are on windows these days you are likely to come across bash inside a cloud VM, Continuous Integration, or even inside of docker.

I have three techniques that help me write more composable bash scripts.

Break scripts down into reusable components

...

Three things to Automate with Python using Pandas

Here are three things that I see my non programming counterparts doing every single day. These really sum up so much of what folks do within an office. So many of us dabble in or become power users of spreadsheets without knowing there is an alternative out there that can save us time, automate boring things, and allow us to open up our minds for the part that we add value, Thinking about the data.

Lets face it, stitching together spreadsheets is zero value add by itself, but if you can see something in the data and take action on it, this can be huge value add to your company. Learning just a bit of python will help focus more of your attention on “value add operations” and leave the mundane stuff to your computer.

I see this one all the time. One team gets a spreadsheet from another team once per month and they need to stich all the pieces together. Excel really opens the door for some nasty hidden bugs in your manually stiched together data. It also takes time out of your day that you dont need to spend.

...

4 min read

How to Install miniconda on linux (from the command line only)

miniconda is a python distribution from continuum. It’s a slimmed-down version of their very popular anaconda distribution. It comes with its own environment manager and has eased the install process for many that do not have a way to compile c-extensions. It made it much easier to install the data science stack on windows a few years ago. These days windows are much better than it was back then at compiling c-extensions. I still like its environment manager, which installs to a global directory rather than a local directory for your project.

Installing miniconda on Linux can be a bit tricky the first time you do it completely from the terminal. The following snippet will create a directory to install miniconda into, download the latest python 3 based install script for Linux 64 bit, run the install script, delete the install script, then add a conda initialize to your bash or zsh shell. After doing this you can restart your shell and conda will be ready to go.

mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh ~/miniconda3/bin/conda init bash ~/miniconda3/bin/conda init zsh

Options #

The miniconda.sh script comes with a few basic options. Most notably we used -b to be able...

...

How to crush amazing posts on DEV

This post was inspired by a comment I left on @dsteenman’s post.

{% post dsteenman/how-long-should-a-blogpost-be-2k6n %}

Most of the time I prefer short as I am more likely to read the whole thing. If its setup as a series I am more likely to work my way through the whole series in a matter of a few sessions. Just my preference

...