{"version":"1.0","provider_name":"Portfolio Oscar Javier Munoz Marciales","provider_url":"https:\/\/artechsano.pro","author_name":"oscarxxi","author_url":"https:\/\/artechsano.pro\/author\/oscarxxi\/","title":"Getting Started with Pandas: A Beginner\u2019s Guide - Portfolio Oscar Javier Munoz Marciales","type":"rich","width":600,"height":338,"html":"<blockquote class=\"wp-embedded-content\" data-secret=\"IyBzJsnaJl\"><a href=\"https:\/\/artechsano.pro\/articles\/getting-started-with-pandas-a-beginners-guide\/\">Getting Started with Pandas: A Beginner\u2019s Guide<\/a><\/blockquote><iframe sandbox=\"allow-scripts\" security=\"restricted\" src=\"https:\/\/artechsano.pro\/articles\/getting-started-with-pandas-a-beginners-guide\/embed\/#?secret=IyBzJsnaJl\" width=\"600\" height=\"338\" title=\"&#8220;Getting Started with Pandas: A Beginner\u2019s Guide&#8221; &#8212; Portfolio Oscar Javier Munoz Marciales\" data-secret=\"IyBzJsnaJl\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" class=\"wp-embedded-content\"><\/iframe><script type=\"text\/javascript\">\n\/* <![CDATA[ *\/\n\/*! This file is auto-generated *\/\n!function(d,l){\"use strict\";l.querySelector&&d.addEventListener&&\"undefined\"!=typeof URL&&(d.wp=d.wp||{},d.wp.receiveEmbedMessage||(d.wp.receiveEmbedMessage=function(e){var t=e.data;if((t||t.secret||t.message||t.value)&&!\/[^a-zA-Z0-9]\/.test(t.secret)){for(var s,r,n,a=l.querySelectorAll('iframe[data-secret=\"'+t.secret+'\"]'),o=l.querySelectorAll('blockquote[data-secret=\"'+t.secret+'\"]'),c=new RegExp(\"^https?:$\",\"i\"),i=0;i<o.length;i++)o[i].style.display=\"none\";for(i=0;i<a.length;i++)s=a[i],e.source===s.contentWindow&&(s.removeAttribute(\"style\"),\"height\"===t.message?(1e3<(r=parseInt(t.value,10))?r=1e3:~~r<200&&(r=200),s.height=r):\"link\"===t.message&&(r=new URL(s.getAttribute(\"src\")),n=new URL(t.value),c.test(n.protocol))&&n.host===r.host&&l.activeElement===s&&(d.top.location.href=t.value))}},d.addEventListener(\"message\",d.wp.receiveEmbedMessage,!1),l.addEventListener(\"DOMContentLoaded\",function(){for(var e,t,s=l.querySelectorAll(\"iframe.wp-embedded-content\"),r=0;r<s.length;r++)(t=(e=s[r]).getAttribute(\"data-secret\"))||(t=Math.random().toString(36).substring(2,12),e.src+=\"#?secret=\"+t,e.setAttribute(\"data-secret\",t)),e.contentWindow.postMessage({message:\"ready\",secret:t},\"*\")},!1)))}(window,document);\n\/\/# sourceURL=https:\/\/artechsano.pro\/wp-includes\/js\/wp-embed.min.js\n\/* ]]> *\/\n<\/script>\n","thumbnail_url":"https:\/\/artechsano.pro\/wp-content\/uploads\/2025\/11\/Gemini_Generated_Image_vo2mchvo2mchvo2m.png","thumbnail_width":1024,"thumbnail_height":1024,"description":"If you&#8217;re new to Python and curious about data analysis, you&#8217;ve probably heard of Pandas. This powerful library is an essential tool for anyone working with data\u2014from spreadsheets to databases to web-scraped information. It&#8217;s the foundation for most data science workflows in Python. In this post, we\u2019ll explore what Pandas is, why it\u2019s so useful, and walk through the core operations you need to start analyzing data today. \ud83d\udce6 What Is Pandas? Pandas is an open-source Python library built for data manipulation and analysis. It provides high-performance, easy-to-use data structures that make working with &#8220;relational&#8221; or &#8220;labeled&#8221; data (like you&#8217;d find in a spreadsheet) simple and intuitive. It introduces two primary data structures: Series: A one-dimensional labeled array. Think of it as a single column in a spreadsheet, like a list of ages, but with custom labels (called an index). DataFrame: A two-dimensional labeled data structure with columns of potentially different types. This is the workhorse of Pandas. It&#8217;s the whole spreadsheet\u2014a collection of Series (columns) that share the same index (rows). You can easily create them from scratch to experiment: \ud83d\udc0d filename.py import pandas as pd # A Series (one column) s = pd.Series([10, 20, 30], index=[&#039;a&#039;, &#039;b&#039;, &#039;c&#039;]) # A DataFrame (multiple columns) data = {&#039;Product&#039;: [&#039;Apples&#039;, &#039;Oranges&#039;, &#039;Bananas&#039;], &#039;Price&#039;: [0.5, 0.4, 0.25]} df = pd.DataFrame(data) \ud83d\ude80 Why Use Pandas? \u00a0 Here\u2019s why Pandas is a favorite among data analysts, scientists, and Python developers: Simple &amp; Intuitive: The syntax is designed to be readable and expressive, letting you accomplish complex tasks in just a few lines. Flexible Data Handling: It can read and write data from a huge variety of formats, including CSV, Excel, JSON, SQL databases, HTML, and more. Powerful Operations: It makes complex operations simple. You can effortlessly filter, group, merge, pivot, and reshape data. It also has specialized, powerful tools for working with time series data. Performance: It&#8217;s built on top of NumPy, which means many of its operations are vectorized and optimized for speed. Integration: It plays perfectly with other libraries in the scientific Python ecosystem, like NumPy (for computation), Matplotlib\/Seaborn (for plotting), and Scikit-learn (for machine learning). \u00a0 \ud83d\udee0\ufe0f Getting Started: Installation &amp; Loading Data \u00a0 To begin, you&#8217;ll need to install Pandas. The most common way is with pip: \ud83d\udc0d pip install pandas Once installed, you import it into your Python script (the as pd is a standard, widely-used alias): \ud83d\udc0d import pandas as pd While you can create DataFrames from scratch (as shown above), you&#8217;ll usually load data from a file. Let\u2019s load a simple CSV file: \ud83d\udc0d # Reads a CSV file into a DataFrame df = pd.read_csv(&#039;your_data.csv&#039;) # You can also easily read from other file types # df_excel = pd.read_excel(&#039;your_data.xlsx&#039;) \ud83d\udd0d Exploring Your Data: The Basic Workflow \u00a0 Once your data is loaded into a DataFrame, the first step is always inspection. You need to understand what you&#8217;re working with. Let&#8217;s assume we loaded a df. Here are the most critical commands: \u00a0 1. Inspecting Your Data \u00a0 See the first few rows: df.head() (by default, it shows 5). See the last few rows: df.tail() Get a concise summary: df.info() This is crucial! It shows row\/column counts, column names, data types (e.g., int64, float64, object), and, most importantly, the number of non-null (non-empty) values. Get quick statistics: df.describe() For numerical columns, this shows count, mean, standard deviation, min, max, and percentiles. See the dimensions: df.shape (returns a tuple: (rows, columns)) See the column names: df.columns \u00a0 2. Selecting &amp; Filtering Data \u00a0 This is where you&#8217;ll spend a lot of your time. Select a single column (returns a Series): df['column_name'] Select multiple columns (returns a new DataFrame): df[['col1', 'col2']] Select rows by position (integer-based): df.iloc[0] (gets the very first row) df.iloc[0:5] (gets the first five rows) Select rows by label\/index (label-based): df.loc['index_label'] (if your index is a name, e.g., &#8216;a&#8217;, &#8216;b&#8217;, &#8216;c&#8217;) Conditional Filtering (Boolean Masking): This is the most powerful way to select data. df[df['age'] &gt; 30] (selects all rows where the &#8216;age&#8217; column is over 30) df[(df['age'] &gt; 30) &amp; (df['city'] == 'New York')] (use &amp; for &#8220;and&#8221;, | for &#8220;or&#8221;) \u00a0 3. Manipulating &amp; Cleaning Data \u00a0 Create a new column: df['new_column'] = df['col1'] + df['col2'] Handle missing data: df.dropna() (drops all rows with any missing values) df.fillna(value=0) (fills all missing values with 0) Rename columns: df = df.rename(columns={'old_name': 'new_name', 'another_old': 'another_new'}) \u00a0 4. Aggregating &amp; Grouping Data \u00a0 This is the key to summarizing your data. The &#8220;split-apply-combine&#8221; pattern is famous for a reason. The groupby method: df.groupby('category').mean() (calculates the mean of all numeric columns for each unique category) Let&#8217;s look at the simple example from your original post. Imagine you have sales data: \ud83d\udc0d # 1. Create a new &#039;revenue&#039; column df[&#039;revenue&#039;] = df[&#039;price&#039;] * df[&#039;quantity&#039;] # 2. Group by product and sum the revenue revenue_by_product = df.groupby(&#039;product&#039;)[&#039;revenue&#039;].sum() print(revenue_by_product) You can also get multiple statistics at once using .agg(): \ud83d\udc0d # Get total revenue and average price per product stats = df.groupby(&#039;product&#039;).agg({ &#039;revenue&#039;: &#039;sum&#039;, &#039;price&#039;: &#039;mean&#039; }) print(stats) 5. Combining DataFrames \u00a0 Merging (like a SQL JOIN): Combines DataFrames based on a common column. pd.merge(df1, df2, on='id_column') Concatenating (stacking): Stacks DataFrames on top of each other (if they have the same columns). pd.concat([df1, df2]) \u00a0 \ud83d\udcc8 Bonus: Quick Visualization \u00a0 One of the best features of Pandas is its built-in plotting, which uses Matplotlib under the hood. This lets you get a quick visual check of your data without importing another library. After our groupby example above, you could instantly plot the results: \ud83d\udc0d # Assuming &#039;revenue_by_product&#039; is the Series from the last example revenue_by_product.plot(kind=&#039;bar&#039;, title=&#039;Total Revenue by Product&#039;) Or, you could plot a histogram of a single column from your original DataFrame: \ud83d\udc0d df[&#039;price&#039;].plot(kind=&#039;hist&#039;, bins=20, title=&#039;Price Distribution&#039;) \ud83d\udcda Learning Resources \u00a0 Ready to go deeper? These resources are fantastic for leveling up your Pandas skills: Official Pandas Documentation: The user guide and API reference are essential. Kaggle Learn: They have an excellent, free, hands-on micro-course on Pandas. Real Python: Features numerous"}