Concept:
Combining datasets is a common task in data analysis. In Pandas, the two primary methods are \texttt{merge()} and \texttt{join()}, which allow combining data based on keys or indices, similar to SQL joins.
Step 1: {\color{red}Using Merge Operation}
The \texttt{merge()} function combines datasets based on one or more common columns:
- Similar to SQL joins
- Allows inner, left, right, and outer joins
Example:
\begin{verbatim}
import pandas as pd
df1 = pd.DataFrame({'id': [1, 2], 'name': ['A', 'B']})
df2 = pd.DataFrame({'id': [1, 2], 'score': [90, 85]})
merged = pd.merge(df1, df2, on='id', how='inner')
\end{verbatim}
Step 2: {\color{red}Types of Merge Joins}
- Inner Join: Common rows only
- Left Join: All rows from left dataset
- Right Join: All rows from right dataset
- Outer Join: All rows from both datasets
Step 3: {\color{red}Using Join Operation}
The \texttt{join()} method combines datasets based on index alignment:
- Default is left join
- Useful when indices represent relationships
Example:
\begin{verbatim}
df1 = df1.set_index('id')
df2 = df2.set_index('id')
joined = df1.join(df2)
\end{verbatim}
Step 4: {\color{red}Key Differences}
- merge(): Column-based joining, more flexible
- join(): Index-based joining, simpler syntax
Step 5: {\color{red}When to Use Which}
- Use \texttt{merge()} for relational-style joins
- Use \texttt{join()} when working with indexed data