How to Find the Longest Sequence in a Collection in Python
Finding the longest sequence in a collection of data is a classic algorithmic problem. "Longest sequence" can refer to multiple things: the longest consecutive run of numbers (e.g., 1, 2, 3, 4 in [10, 1, 2, 3, 4, 20]), the longest increasing subsequence, or simply the longest repeating element.
This guide explores the most common interpretation: Finding the longest consecutive sequence in an unsorted array, and also covers finding the longest increasing subsequence.
Scenario 1: Longest Consecutive Sequence (Unsorted)
Problem: Given [100, 4, 200, 1, 3, 2], the longest consecutive elements sequence is [1, 2, 3, 4]. The length is 4.
Efficient Solution (O(N)): Use a set for instant lookups. Iterate through the set, and for every number x that is the start of a sequence (meaning x-1 is not in the set), count upwards.
def longest_consecutive(nums):
if not nums:
return 0
num_set = set(nums)
longest_streak = 0
for num in num_set:
# Only start counting if 'num' is the start of a sequence
if num - 1 not in num_set:
current_num = num
current_streak = 1
# Count upwards
while current_num + 1 in num_set:
current_num += 1
current_streak += 1
longest_streak = max(longest_streak, current_streak)
return longest_streak
# Example
data = [100, 4, 200, 1, 3, 2]
print(f"Longest consecutive length: {longest_consecutive(data)}")
Output:
Longest consecutive length: 4
Scenario 2: Longest Increasing Subsequence
Problem: Given [10, 9, 2, 5, 3, 7, 101, 18], the longest increasing subsequence is [2, 3, 7, 101] (or [2, 5, 7, 18], etc).
Dynamic Programming Solution (O(N^2)):
For each element, find the longest subsequence ending at that element by checking all previous elements.
def length_of_lis(nums):
if not nums:
return 0
# dp[i] stores the length of the longest increasing subsequence ending at index i
dp = [1] * len(nums)
for i in range(len(nums)):
for j in range(i):
if nums[i] > nums[j]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
data = [10, 9, 2, 5, 3, 7, 101, 18]
print(f"Longest Increasing Subsequence Length: {length_of_lis(data)}")
Output:
Longest Increasing Subsequence Length: 4
There is a more optimized O(N log N) solution using binary search (bisect module) for extremely large lists.
Scenario 3: Longest Run of Identical Elements
Problem: Given [1, 1, 0, 1, 1, 1, 0], finding the longest run of 1s (length 3). This is common in signal processing.
Solution: itertools.groupby is perfect for this.
from itertools import groupby
data = [1, 1, 0, 1, 1, 1, 0, 1]
# Find the max length of consecutive identical items
max_len = 0
max_val = None
for key, group in groupby(data):
# Convert iterator to list to get length
length = len(list(group))
if length > max_len:
max_len = length
max_val = key
print(f"Longest run is {max_len} consecutive '{max_val}'s")
Output:
Longest run is 3 consecutive '1's
Conclusion
To find the longest sequence in Python:
- Consecutive Integers (Unsorted): Use a
setand iterate logic (if num-1 not in set). - Increasing Subsequence: Use Dynamic Programming (
dparray). - Consecutive Identical Items: Use
itertools.groupby().