Difference between loc and iloc
In the field of data analysis and manipulation, especially when working with pandas, two commonly used functions are `loc` and `iloc`. Both functions are used to access data in a DataFrame, but they differ in their approach and usage. Understanding the difference between `loc` and `iloc` is crucial for efficient data manipulation and analysis.
What is loc?
The `loc` function is primarily used for label-based indexing. It allows you to access data in a DataFrame by specifying the row labels and column labels. The `loc` function is particularly useful when you want to select data based on specific labels or conditions. It is similar to using `.loc` in NumPy arrays.
For example, consider the following DataFrame:
“`python
import pandas as pd
data = {‘Name’: [‘Alice’, ‘Bob’, ‘Charlie’, ‘David’],
‘Age’: [25, 30, 35, 40],
‘City’: [‘New York’, ‘Los Angeles’, ‘Chicago’, ‘Houston’]}
df = pd.DataFrame(data)
print(df.loc[‘Alice’, ‘City’])
“`
Output:
“`
New York
“`
In this example, `df.loc[‘Alice’, ‘City’]` returns the value of the ‘City’ column for the row with the label ‘Alice’.
What is iloc?
On the other hand, the `iloc` function is used for integer-location based indexing. It allows you to access data in a DataFrame by specifying the integer positions of the rows and columns. The `iloc` function is particularly useful when you want to select data based on the position of the rows and columns, rather than the labels.
Continuing with the previous example, let’s see how to use `iloc`:
“`python
print(df.iloc[1, 2])
“`
Output:
“`
30
“`
In this example, `df.iloc[1, 2]` returns the value of the ‘Age’ column for the second row (index 1) and the third column (index 2).
Key Differences between loc and iloc
1. Label-based vs. Integer-location based: The primary difference between `loc` and `iloc` is that `loc` uses labels for indexing, while `iloc` uses integer positions.
2. Order of Parameters: When using `loc`, the order of parameters is row label followed by column label. In contrast, when using `iloc`, the order of parameters is row position followed by column position.
3. Use Cases: `loc` is generally used when you have specific labels and want to select data based on those labels. `iloc` is used when you have specific positions and want to select data based on those positions.
4. Indexing: `loc` can be used with multi-level indexing, while `iloc` cannot.
Understanding the difference between `loc` and `iloc` is essential for efficient data manipulation and analysis in pandas. By choosing the appropriate function based on your requirements, you can effectively access and manipulate data in your DataFrame.