Microlearning

The coding mindset for data scientists: How to think like a programmer, even if you don’t write code

Written by Nazly Santos | Oct 2, 2023 11:06:35 AM

Whether you’re an experienced data scientist, a first-timer in this ever-growing industry, or even if you work in an unrelated field but interact with data analysis often, you probably face the same pivotal question: “Should I learn to code?” let's find out the answer!

👉 Connect with me on LinkedIn to share ideas and keep the conversation going!

I am a computer scientist. I started learning code at 17. My career shifted, and I found myself in the world of data after obtaining my engineering diploma. 

I can tell you from my experience that coding is not strictly necessary for a data science role. In my first data role, I did not write a single line of code and spent my days checking data sources and manually inspecting text files. However, my familiarity with databases and SQL allowed me to automate some tasks and replace Excel file manipulation by handling the information directly where it was stored.

My team was also used to analysing data directly in Excel, with formulas and pivot tables, as well as visualising data there. There was both convenience and resistance to change. Some senior analysts used the no-code tool IBM SPSS Modeller.

So, back to the question, “Should I learn to code?” The simple answer is yes. But let’s understand exactly what, why, and how.

 

 

Coding and data science

We know coding is hard, and it takes a long, intensive journey to master it, but we are here to prove how embracing coding can quickly and significantly transform the way we solve problems and overcome obstacles.

Let’s start by asking one more question, a better one: “Do I already know how to code?”
If your answer is No, think again. Have you ever created formulas in Excel to sum and count values? Well, there you were, technically coding. 

You were instructing the computer what to do using a set of instructions and a language. Coding is all about using a specific language to talk to a computer to give these instructions. Each language has its own constraints and specific syntax.

And although becoming a senior developer takes years of training, learning even the basics of coding can pay off more quickly than one might imagine. 

Setting aside the fact that data scientists with coding skills are more likely to get hired and paid more, fundamental coding can help with every day:

  • Automation: Simple scripts can automate repetitive tasks, saving countless hours.
  • Data Manipulation: Extracting, transforming, and loading (ETL) data becomes more straightforward when you know how to code. You can tailor-make solutions to unique problems.
  • Advanced Analytics: With a coding foundation, advanced statistical techniques and machine learning algorithms become more accessible.

On top of that, and perhaps most importantly, learning code makes it easier for data scientists to communicate with developers, and enables them to transform innovative ideas into tangible solutions without always depending on external software or tools.

 

 

Beyond automation

Coding isn't just about automating tasks or writing scripts. The key is adopting a distinct, powerful Coding Mindset. 

The coding mindset is a game-changer in data-centric roles.

It encourages critical thinking, perseverance, attention to detail, and creativity by using programming methodology as a framework for problem-solving.

 Looking at challenges as puzzles to solve and obstacles as opportunities for innovation enables you to understand the data analytics process, including algorithms and data manipulation, solve problems, increase efficiency, and produce more customised results.

 

 

 

Taking your first steps in the world of coding

Just like learning a new spoken language, learning to code is a marathon, not a sprint. It takes time and practice to master a programming language because you are training your brain to think differently. I, for instance, as a Spanish speaker, have been learning English for more than 20 years, and I am still discovering more complex grammar rules and different ways to convey the same message to English speakers.

Learning to code is a journey, not a destination. Enjoy the process, and don't be afraid to make mistakes. The important thing is to learn from your mistakes and keep practising.

 

What you can expect your code learning curve will look like...

 

Classic coding: the old-school way

So now that we understand what coding is all about and the tight relationship between coding and data science, what do you actually need to learn?

The most in-demand programming languages for data scientists are Python, R, and SQL.

Python is a general-purpose language known for its simplicity and versatility. It is widely used for data cleaning, analysis, machine learning, and visualisation.

R is a language specifically designed for statistical computing and data visualisation. It is popular for its statistical capabilities and ability to create clear and informative data visuals.

SQL is a query language that is used to interact with relational databases. It is a relatively simple language to learn, but it can be very powerful for querying and manipulating large datasets.

 

 

Some tips on choosing a programming language to start with:

  • First, I recommend starting with only one language.
  • Consider your career goals. If you are interested in working in a specific industry, such as finance or healthcare, you may want to choose a language popular in that industry.
  • Consider the learning curve. Some languages are easier to learn than others. Python, R and SQL all have big communities, so you can check which language is easier to learn depending on your background.
  • Basic SQL is often considered a must-have in many data roles. It has a different purpose and functionality than Python or R, but it will still give you a coding mindset. 

 

Low-code coding: Democratising code

Low-code software development is a visual approach that makes coding more accessible and user-friendly, democratising digital creation and making tech development more inclusive. Instead of traditional manual coding, low-code platforms use drag-and-drop interfaces and visual builders, allowing a broader audience to participate.

While low-code doesn't replace classic code in terms of depth and flexibility, it provides a valuable opportunity for those seeking to harness the power of technology without any prior programming knowledge and offers a great stepping stone into the world of code.

Here are a few of the many low-code tools available for data scientists today:

KNIME is a visual programming environment for data analysis and machine learning. It offers a variety of nodes for data cleaning, data manipulation, and model building.

Dataiku is a low-code environment for data preparation, machine learning, and data visualisation. It is known for its user-friendly interface and its extensive library of pre-built components.

PyCaret is an open-source, low-code machine learning (ML) library in Python. It provides a unified interface for data preparation, model selection, hyperparameter tuning, and model evaluation. PyCaret supports multiple ML algorithms, making it an ideal tool for data scientists of all levels.

The coding mindset can be applied even if you don't write traditional code:

Decomposition: Even in a drag-and-drop environment, approach challenges by breaking them down into smaller, manageable components. For instance, if you're designing a complex workflow, rather than viewing it as one massive task, segment it into individual processes or stages. This decomposition technique helps ensure every part of your solution is effective and efficient.

 

Logical reasoning: Ensure that your processes and workflows in these platforms follow a logical order. Just as in coding, each step should follow naturally from the previous one. Always ask, "What needs to happen first? What comes next?"

 

Consistency: Embrace the process of iteration. Build a version of your solution, test it, gather feedback, and refine it. While no-code/low-code platforms might allow for faster implementation, the principle of refining and improving remains constant.

 

Creativity: One of the beautiful things about no-code/low-code platforms is that they often allow for quick prototyping. Use this to your advantage to explore different solutions, layouts, or workflows. Remember, in coding and in these platforms, there's often more than one way to achieve an outcome.

 

🗓️ Optional deep-dive to review later: Check out Luke Barousse and Alex Freberg, two of my favourite data analytics YouTubers, Both have great videos on the relationship between coding and data science!

Wrapping up

In data science, coding isn't just about writing lines of code. It's about a way of thinking that can improve your analytical skills, boost your creativity, and help you solve problems better. Even if you don't code every day, having a coding mindset can be incredibly helpful for data analysis, communication, and innovation.

Whether you learn a traditional programming language like Python, R, or SQL or start with a more accessible low-code platform, the key is to learn how to use logical reasoning, decomposition, iterative processes, and a little creativity to build effective solutions. It’s not just about the language you speak, but the mindset you develop.

Learning a new spoken language opens up new cultures, perspectives, and opportunities, as does learning to code. It gives you access to a universe of digital possibilities, allowing data scientists and anyone who works with data to communicate fluently with technology and innovate in ways that were never before possible.