How to Scrape TikTok with Python
TikTok is a dynamic, JavaScript-heavy platform. Most of the data you see in the browser isn’t in the page’s initial HTML. Requests to the page return only partial information: no video list, no profile stats, sometimes not even a username.
To collect complete data, you need to look deeper — into scripts, API endpoints, and JSON responses that TikTok loads behind the scenes.
In this guide, we’ll walk you through several ways to extract profile data, video lists, comments, and search results. We’ll also explain how to handle common issues such as IP limits, expired links, and 403 errors.
TikTok Page Structure
The first idea that comes to mind when scraping a TikTok page is to find CSS selectors or XPath paths and extract data from them:

But that data is incomplete, and extracting it this way is extremely time-consuming. There’s an easier, faster approach: in the HTML code, you’ll find a script with the ID __UNIVERSAL_DATA_FOR_REHYDRATION__ that contains all profile data in a structured JSON format.

This works for profile data and individual video pages. For other pages, it’s often simpler to find the right endpoint that returns the required data:

Scraping TikTok Profiles
Using standard parsing from the profile page, you can only extract part of the data:

Instead, we can use the script content, which contains much more detailed, actionable information:
| Field | Description |
|---|---|
| user.uniqueId | Public username (handle) |
| user.nickname | Display name |
| user.signature | Bio or profile description |
| user.avatarLarger | URL of the large profile picture |
| user.avatarMedium | URL of the medium profile picture |
| user.avatarThumb | URL of the thumbnail profile picture |
| user.verified | Boolean; indicates if the account is verified |
| followerCount | Number of followers |
| followingCount | Number of accounts the user follows |
| heart | Total likes received |
| videoCount | Number of videos posted |
| title | Page title for sharing |
| desc | Page description for sharing |
The JSON includes extra metadata you can ignore. Apply filtering logic to extract only relevant fields before saving results.
Install and import the required libraries:
Requestsfor fetching the page contentBeautifulSoupfor parsing the HTML and finding the scriptJSONfor processing the data
Before running the script, update the target URL. The following example uses the HasData Web Scraping API, which automatically handles proxies, rate limits, and CAPTCHAs, bypassing TikTok’s native limit of roughly 100 requests per hour. You don’t need to rotate IPs or manage sessions manually, HasData does it for you:
import requests
import json
from bs4 import BeautifulSoup
url = "https://www.tiktok.com/@funny_funny66066"
#Get your API key at https://hasdata.com/sign-up
api_key = "YOUR-API-KEY"
api_url = "https://api.hasdata.com/scrape/web"
payload = json.dumps({
"url": url,
"proxyType": "residential",
"proxyCountry": "DE",
"blockResources": False,
"wait": 1000,
"jsRendering": True
})
headers = {
"Content-Type": "application/json",
"x-api-key": api_key
}
# Send the request to HasData API
response = requests.post(api_url, headers=headers, data=payload)
# Parse the response as JSON
data = response.json()
# Extract content from the response
content_html = data.get("content")
# Parse HTML
soup = BeautifulSoup(content_html, "html.parser")
# Find the script with the JSON data
script = soup.find("script", id="__UNIVERSAL_DATA_FOR_REHYDRATION__")
if script:
data = json.loads(script.string)
# Extract user details from __DEFAULT_SCOPE__
user_detail = data.get("__DEFAULT_SCOPE__", {}).get("webapp.user-detail")
# Save only the user-detail section
with open("tiktok_user_detail.json", "w", encoding="utf-8") as f:
json.dump(user_detail, f, ensure_ascii=False, indent=4)The output is a JSON file containing data like this:
{
"userInfo": {
"user": {
"uniqueId": "funny_funny66066",
"nickname": "Animals Funny Videos",
"signature": "Follow me and good-looking things with you\nhttp//: guxw.net",
"avatarLarger": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:1080:1080.jpeg",
"avatarMedium": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:720:720.jpeg",
"avatarThumb": "https://p16-sign-va.tiktokcdn.com/tos-maliva-avt-0068/7343166051334488106~tplv-tiktokx-cropcenter:100:100.jpeg",
"verified": false,
"createTime": 1666686070,
"secUid": "MS4wLjABAAAAyl5blLK97VXJ0vZ_MQ7y5tnFfgEvYxYdleqw1-JAnwRcoyAGGpkkXxUvlQMwG5Xc",
"privateAccount": false,
"language": "en"
},
"stats": {
"followerCount": 18600,
"followingCount": 808,
"heart": 126300,
"videoCount": 186,
"friendCount": 345
}
},
"shareMeta": {
"title": "Animals Funny Videos on TikTok",
"desc": "@funny_funny66066 18.6k Followers, 808 Following, 126.3k Likes - Watch awesome short videos created by Animals Funny Videos"
}
}If the profile contains contact details, they will appear in the signature field. This field usually surfaces the profile description, so you’ll need to use regular expressions to isolate the email address from the rest of the text:
import json
import re
# Load JSON from file
with open("tiktok_user_detail.json", "r", encoding="utf-8") as f:
data = json.load(f)
# Extract signature text
signature = data.get("userInfo", {}).get("user", {}).get("signature", "")
# Find email with regex
email = re.search(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", signature)Keep in mind that users rarely include contact details, so don’t rely on this field being present.
Scraping TikTok Videos
This part is a bit more complex, since the video list isn’t included in the same JSON we parsed earlier. Still, we can use that data to get videos from the profile.
Get the video list
To get the video list, you can use either a) a browser automation library that scrolls through the page like a real user:

Or b) an endpoint that returns all needed data:
https://www.tiktok.com/api/post/item_list/Using an endpoint involves two main steps:
- Get the
secUidvalue from the JSON we extracted in the previous section. - Use it to call the endpoint and retrieve the videos.
Note that the API returns no more than 35 videos per request. To get more, use the cursor parameter (the position or token of the last video) to load the next page.
You can check the hasMore parameter, if it’s true, there are more videos available.
Finally, extract and save the clean, actionable data from the returned JSON, excluding metadata and unnecessary clutter.
import requests
import json
from bs4 import BeautifulSoup
import time
profile_url = "https://www.tiktok.com/@funny_funny66066" # put the username page here
api_base = "https://www.tiktok.com/api/post/item_list/"
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
"accept": "*/*",
"referer": profile_url,
}
# fetch profile page and extract secUid
r = requests.get(profile_url, headers=headers, timeout=15)
r.raise_for_status()
soup = BeautifulSoup(r.text, "html.parser")
script = soup.find("script", id="__UNIVERSAL_DATA_FOR_REHYDRATION__")
if not script or not script.string:
raise SystemExit("Could not find rehydration script on profile page.")
raw = json.loads(script.string)
secuid = (
raw.get("__DEFAULT_SCOPE__", {})
.get("webapp.user-detail", {})
.get("userInfo", {})
.get("user", {})
.get("secUid")
)
if not secuid:
raise SystemExit("secUid not found in page JSON. Page structure may differ.")
print("secUid:", secuid)
# paginated fetch using secUid
result = []
cursor = 0
has_more = True
while has_more:
params = {
"aid": "1988",
"count": "35",
"cursor": cursor,
"device_platform": "web_pc",
"secUid": secuid,
}
resp = requests.get(api_base, headers=headers, params=params, timeout=15)
# handle non-json responses
try:
data = resp.json()
except ValueError:
print("Non-JSON response, status:", resp.status_code)
break
for item in data.get("itemList", []):
# collect only the clean video shape you asked for
video = {
"profileURL": profile_url,
"id": item.get("id"),
"desc": item.get("desc"),
"createTime": item.get("createTime"),
"duration": item.get("video", {}).get("duration"),
"cover": item.get("video", {}).get("cover"),
"originCover": item.get("video", {}).get("originCover"),
"dynamicCover": item.get("video", {}).get("dynamicCover"),
"playAddr": item.get("video", {}).get("playAddr"),
"downloadAddr": item.get("video", {}).get("downloadAddr"),
"videoID": item.get("video", {}).get("videoID"),
"width": item.get("video", {}).get("width"),
"height": item.get("video", {}).get("height"),
"size": item.get("video", {}).get("size"),
"definition": item.get("video", {}).get("definition"),
"ratio": item.get("video", {}).get("ratio"),
"videoQuality": item.get("video", {}).get("videoQuality"),
"stats": {
"playCount": item.get("stats", {}).get("playCount"),
"diggCount": item.get("stats", {}).get("diggCount"),
"commentCount": item.get("stats", {}).get("commentCount"),
"shareCount": item.get("stats", {}).get("shareCount"),
"collectCount": item.get("stats", {}).get("collectCount")
},
"music": {
"authorName": item.get("music", {}).get("authorName"),
"title": item.get("music", {}).get("title"),
"playUrl": item.get("music", {}).get("playUrl"),
"duration": item.get("music", {}).get("duration")
},
"textExtra": item.get("textExtra", []),
"challenges": [ch.get("title") for ch in item.get("challenges", [])]
}
result.append(video)
# pagination tokens
cursor = data.get("cursor", 0)
has_more = bool(data.get("hasMore", False))
time.sleep(0.5)
print("Collected videos:", len(result))
# Save cleaned output
with open("videos_clean_full.json", "w", encoding="utf-8") as f:
json.dump(result, f, ensure_ascii=False, indent=2)Download TikTok Videos
One of the fields in the API response (and in our saved JSON) is the download link, downloadAddr.
We could download videos directly during data extraction, but to keep things organized, we’ll do that in a separate script.
We’ll use the links from the saved JSON file. The downloadAddr links in TikTok’s API responses expire within about an hour or less; old links return 403 Access Denied errors.
import requests
import os
# Folder for downloaded videos
download_folder = "tiktok_videos"
os.makedirs(download_folder, exist_ok=True)
# Download a single video from URL
def download_video(url, filename):
try:
# Stream download to avoid memory overload
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36",
"accept": "*/*",
"referer": "https://www.tiktok.com",
}
response = requests.get(url, headers=headers, stream=True)
response.raise_for_status()
filepath = os.path.join(download_folder, filename)
with open(filepath, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
print(f"[OK] {filename} downloaded")
except Exception as e:
print(f"[ERROR] {filename} failed: {e}")
# Load JSON with video info
with open("videos_clean_full.json", "r", encoding="utf-8") as f:
videos = json.load(f)
# Iterate over videos and download
for video in videos:
download_url = video.get("downloadAddr")
video_id = video.get("id")
if download_url:
filename = f"{video_id}.mp4"
download_video(download_url, filename)
else:
print(f"[WARN] No download link for video {video_id}")Scraping TikTok Comments
The next step is extracting comments from videos.

Use TikTok’s comment API endpoint to extract comments efficiently. Note that TikTok limits comment scraping to 5,000 per video; pagination stops afterward:
import requests
import json
# list of TikTok video IDs to fetch comments from
video_ids = ["7518505042194861325", "7501766490077678891"]
# number of comments to fetch per video
count = 30
# standard desktop browser User-Agent to avoid blocking
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/141.0.0.0 Safari/537.36"
}
comments = []
# iterate through each video ID and fetch comments
for vid in video_ids:
url = f"https://www.tiktok.com/api/comment/list/?aid=1988&aweme_id={vid}&count={count}"
resp = requests.get(url, headers=headers)
# attempt to parse the JSON response
try:
data = resp.json()
except ValueError:
print(f"Non-JSON response for video {vid}, status:", resp.status_code)
continue
# loop through each comment object
for c in data.get("comments", []):
comments.append(
{
"video_id": c.get("aweme_id"),
"video_desc": c["share_info"].get("desc"),
"video_title": c["share_info"].get("title"),
"video_url": c["share_info"].get("url"),
"comment_id": c.get("cid"),
"text": c.get("text"),
"likes": c.get("digg_count"),
"language": c.get("comment_language"),
"timestamp": c.get("create_time"),
"author": {
"name": c["user"].get("nickname"),
"username": c["user"].get("unique_id"),
"user_id": c["user"].get("uid"),
"avatar": (
c["user"]["avatar_thumb"]["url_list"][0]
if c["user"].get("avatar_thumb")
else None
),
},
"comment_url": c["share_info"].get("url"),
}
)
# save all collected comments into a JSON file
with open("comments.json", "w", encoding="utf-8") as f:
json.dump(comments, f, ensure_ascii=False, indent=2)
print(f"Saved {len(comments)} comments to comments.json")Scraping TikTok Search
Search scraping is the hardest part, because TikTok dynamically generates headers. Endpoints like:
api/search/general/full/or:
api/search/item/full/require valid, time-sensitive headers.
To capture the responses, use a headless browser (e.g., Playwright) and intercept API calls directly.
import json
from playwright.sync_api import sync_playwright
keyword = "cat funny video"
search_url = f"https://www.tiktok.com/search?q={keyword}"
def extract_useful_data(api_response):
# Extract only the fields we care about from TikTok API
useful = []
data_list = api_response.get("data", [])
for entry in data_list:
item = entry.get("item", {})
video = item.get("video", {})
author = item.get("author", {})
music = item.get("music", {})
stats = item.get("stats", {})
hashtags = [
tag.get("hashtagName")
for tag in item.get("textExtra", [])
if tag.get("hashtagName")
]
# Build simplified dict with relevant info
useful.append({
"id": item.get("id"),
"desc": item.get("desc"),
"createTime": item.get("createTime"),
"video_url": video.get("playAddr"),
"cover": video.get("cover"),
"author_name": author.get("nickname"),
"author_id": author.get("id"),
"author_uniqueId": author.get("uniqueId"),
"author_avatar": author.get("avatarThumb"),
"music_title": music.get("title"),
"music_author": music.get("authorName"),
"hashtags": hashtags,
"stats": {
"likes": stats.get("diggCount"),
"comments": stats.get("commentCount"),
"shares": stats.get("shareCount"),
"plays": stats.get("playCount"),
"favorites": stats.get("collectCount")
}
})
return useful
def run(playwright):
# Launch browser with visible window for debugging
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
api_response = None
# Listen for all network responses and capture TikTok search API
def handle_response(response):
nonlocal api_response
if "api/search/general/full/" in response.url: # check for API endpoint
try:
api_response = response.json() # store JSON for later parsing
except Exception:
pass # ignore invalid responses
page.on("response", handle_response)
# Open TikTok search page
page.goto(search_url, timeout=60000)
page.wait_for_timeout(10000) # wait a bit for the API to respond
if api_response:
# Parse API response into simplified structure
useful_data = extract_useful_data(api_response)
# Save to JSON file
with open("search_clean.json", "w", encoding="utf-8") as f:
json.dump(useful_data, f, ensure_ascii=False, indent=2)
# Close browser cleanly
browser.close()
# Start Playwright and run scraping
with sync_playwright() as playwright:
run(playwright)If you prefer to stick with the Requests library instead of using Playwright, you can copy the headers manually from your browser and reuse them for a while, until they expire.
Final Thoughts
Now you have a working setup for scraping TikTok profiles and related data. You can improve it by adding error handling, managing rate limits, or handling CAPTCHAs to avoid being blocked.
The gist is, most data is already accessible via JSON scripts and endpoints, so scraping it mostly comes down to collecting it reliably or integrating it with other tools.
All examples are available in the repository.


