Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

We\'re accessing an API of a web system for obtaining product information. We re

ID: 653378 • Letter: W

Question

We're accessing an API of a web system for obtaining product information. We require some additional information, which is not available through the API. This information is publically available for each product visually - through the source code for each item on its item page. We've written an algorithm which parses this information from the web page for each item, but that will be highly ineffective in the long run, since the algorithm will simply stop working if and when they decide to change the source code ( for example - redesign of the front end ).

I feel like there should be a supservised learning approach to this problem, however I'm unaware if there exist such solutions.

What are some good aproaches to this kind of problems ?

Regards.

Explanation / Answer

In short, this is probably difficult.

Your question is not fully specified, since you did not write the details of the data you are parsing and since estimating the possible changes to it is a matter of guesswork.

Once this web site changes format, then if we assume that the products remain the same, you are essentially looking at supervised learning task: given a list of known product web pages, make an "extractor function" that outputs the desired field.

If you are lucky and all the product pages are identically structured then this problem might be quite easy. You simply need to identify the path to the node of the HTML in which the desired information lies. On the other hand, if the web pages have a varied structure, or no structure at all, then the problem may be much more difficult.

There are some academic works in these areas, but don't get your hopes up. Look for the keywords "automatic web extraction".

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote