Getting Started with Data Science Projects Using VS Code and Anaconda

- June 05, 2024

Hey everyone,

In this post, we'll dive into the practical aspects of setting up your environment for data science projects using VS Code and Anaconda. We'll cover how to create and manage environments, install necessary packages, and run Python scripts or Jupyter notebooks within VS Code. Let's get started!

Setting Up Your Environment

Step 1: Install Anaconda and VS Code

First, ensure you have installed Anaconda and VS Code. If you haven't done this yet, you can download Anaconda from its official website and follow the installation instructions. Similarly, download and install VS Code.

Step 2: Create a Python Environment

Creating a specific environment for your project ensures that you have all the necessary packages and libraries without conflicts. Follow these steps:

Open a Terminal in VS Code: Go to the Terminal menu and select "New Terminal".
Create the Environment: Run the following command to create a new environment with Python 3.10:
```
conda create -n venv python=3.10
```
Activate the Environment: Activate the environment with:
```
conda activate venv
```

Step 3: Install Required Packages

Create a file named requirements.txt and list the packages you need:

scikit-learn
pandas
numpy

Install the packages using the following command:

pip install -r requirements.txt

Working with VS Code

Step 1: Open a Jupyter Notebook

You can open Jupyter notebooks directly in VS Code. Here’s how:

Create a Jupyter Notebook: Click on the "New File" icon and save the file with a .ipynb extension.
Select the Kernel: If prompted, select the Python environment you created (venv).

Step 2: Run Python Scripts

You can also run Python scripts in VS Code. Here’s an example:

Create a Python File: Click on the "New File" icon and save the file with a .py extension (e.g., test.py).

Write Some Code:

import pandas as pd
import numpy as np

print("Pandas and NumPy are installed and working!")

Run the Script: Open a terminal and run the script using:
```
python test.py
```

Using Jupyter Notebooks in VS Code

Step 1: Install IPyKernel

To use Jupyter notebooks in VS Code, you need to install ipykernel:

pip install ipykernel

Step 2: Create and Run a Notebook

Create a New Notebook: In VS Code, create a new file with a .ipynb extension.

Write Some Code:

import pandas as pd
import numpy as np

# Create a DataFrame
df = pd.DataFrame({
    'A': np.random.rand(5),
    'B': np.random.rand(5)
})

df

Run the Notebook: Click the "Run" button to execute the cells.

Example: Linear Regression

Let's walk through an example of creating and running a linear regression model using scikit-learn.

Step 1: Create a New Notebook

Create a new Jupyter notebook file named linear_regression.ipynb.

Step 2: Write the Code

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate some sample data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Convert to DataFrame
data = pd.DataFrame(np.hstack([X, y]), columns=['X', 'y'])

# Split the data
X_train, X_test, y_train, y_test = train_test_split(data[['X']], data['y'], test_size=0.2, random_state=0)

# Create the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Plot the results
import matplotlib.pyplot as plt

plt.scatter(X_test, y_test, color='red', label='Actual')
plt.plot(X_test, y_pred, color='blue', label='Predicted')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Step 3: Run the Notebook

Click the "Run" button to execute the cells and observe the output.

Conclusion

In this post, we covered the essentials of setting up your environment for data science projects using VS Code and Anaconda. We walked through creating and managing environments, installing packages, and running Python scripts or Jupyter notebooks. By following these steps, you'll be well-equipped to tackle any data science project. Stay tuned for more tutorials and examples as we continue to explore the world of data science.

Thank you for reading, and happy coding!

Revanth Tech Trends