These are projects posted by the students of Dr. Gove Allen at Brigham Young University. These students have taken one semester-long course on VBA and generally have had no prior programming experience

Thursday, December 10, 2009

Ebay Data Extraction

Executive Summary

Many businesses use Ebay to sell surplus inventory and many individuals and small businesses use Ebay as a marketplace to make a profit. When buying or selling on Ebay, knowing the previous selling prices of the item in question can be very useful in making decisions. Fortunately, Ebay allows registered users to access data on the past 30 days of auctions. Unfortunately, the data cannot be download into an excel readable format or pasted into excel for obtaining statistics and doing trend and other analysis. My application accesses Ebay data and populates an Excel worksheet with it. It provides a user friendly interface to allow the user to easily browse or search for the results that he or she wants to download.

Implementation Documentation

The main process of the application works by extracting the title, price, shipping, whether the item sold, date/time, and link to the item. This data is extracted from the HTML of the completed items page selected by the user. Most results are contained on more than one page, so the application gets data from each page by first determining how many pages of data contain the results, then extracting the data from each page.

The following explains each step in the process.

First the user is prompted to click next. Internet explorer then pops up to Ebay’s categories page. The user can browse to any category results page or perform any search.


The user then returns to the spreadsheet and clicks OK to confirm the selection. Internet explorer disappears. At this point, the application does a few checks. The following is a list of the checks and the application’s response to each:

(1) Is the selected page a results page?
Response: If the page is not a valid results page, the user is prompted to try again and Internet Explorer again becomes visible. Otherwise the application continues.

(2) Is the selected page a completed auctions listing (as opposed to a listing of currently open auctions)? If the user browsed to a regular results page, the application follows the "Completed" link to get to the completed listings. Otherwise, the application continues.

(3) Is the user logged in?
In order to view the completed items listing, the user must be logged in. If the "Completed" link is followed and the user is not already logged in, he or she will immediately be taken to the login page. The application finds out if the current page is the login page and if so, asks the user for his or her login info and logs the user in. Otherwise the application can assume the user is already logged in and at the completed listings page and continue.

(4) Are the results set to 200 items per page?
Ebay can display 25, 50, 100, or 200 items per page. To make the extraction process as efficient as possible, the application finds out how many items this setting is currently set to and changes it to 200 if needed.

(5) How many pages of items are in the results?
The application finds out how many pages of items are in the results so it knows how many pages to get data from.


A progress bar appears and the extraction process takes place.


Challenges/Issues

The biggest challenge in creating this application was finding a way to constantly identify each piece of data in the HTML. After I thought I had done it, I discovered that sometimes the number of records the application had placed in the worksheet was far fewer than the number of results reported by the Ebay page. After much frustration, I discovered that one of the reasons for this was that I assumed that the url to every item was proceeded by a tag that contained: class=v4lnk href=". Accordingly I coded the application to simply search for every instance of that string then extract the data following it. I didn't realize that titles that are bold are instead proceeded by: class=v4lnk b-g href=" and the application would just skip over these. I ran into similar problems with other data but was able to solve this particular one by simply searching for v4lnk then finding the next instance of href=".

Another challenge I faced was related to the progress bar and cancel button. Extractions with several pages can take up to a couple minutes, and without a progress bar, users are simply presented with an hour glass (or equivalent cursor). From the user's prospective it can be difficult to tell whether the program has malfunctioned and frozen, or whether it is simply taking a long time. I wanted to add a progress bar to remedy this problem, but I didn't know how to provide a form that stayed visible while the application continued to run. Through some internet research I discovered how to display a modeless form which, coupled with the DoEvents function, does just that. I was also able to add a cancel button so the user can stop the process at anytime. The progress bar was constructed with two labels: one for the background box and the other to be the actual bar. The label’s width is changed for each page that is finished.

Another challenge was that each price contains dollar signs and other characters which aren't harmful but make doing calculations and analysis impossible. To fix this problem I wrote a function that takes a string and extracts the numerical characters and returns a Double. The function simply loops through each character in the string and uses the IsNumeric function to determine whether it is numeric and appends this character to the output string if it is.

This project was challenging but fun. I learned many new things and it has given me many ideas for other projects that I am eager to try out.

Excel File (EbayExtraction.xlsm)

1 comment:

  1. Hi,

    I tried it but it didnt work. Keep asking me to choose a category. I did exactly how it is taught above.

    ReplyDelete

Blog Archive