[kwlug-disc] Help!

Khalid Baheyeldin kb at 2bits.com
Wed Dec 31 08:52:46 EST 2014


You should not be using regexps for something like this, since the
language has libraries for parsing HTML properly without regexps.

For example, this code snippet
http://jsoup.org/cookbook/extracting-data/example-list-links

And the links in the first reply here
http://stackoverflow.com/questions/677038/how-to-use-regular-expressions-to-parse-html-in-java


On Wed, Dec 31, 2014 at 8:42 AM, Joe Wennechuk
<youcanreachmehere at hotmail.com> wrote:
> Hello All,
>
> Slightly off topic, but I know you guys can help. I have applied for a job,
> and they have asked me to write a java class that searches html from
> websites for links. I am using this regex ...(Pattern pattern =
> Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL |
> Pattern.CASE_INSENSITIVE);) to find them but based on the constraints I
> don't think I'm doing it right, as I am not finding all of the links. Here
> are the constraints.. Can anyone help??
>
> Implementation constrains:
>    * For simplification assume that the link is defined as
> '<[whitespace]a[whitespace]' or '<[whitespace]A[whitespace]'.
>      ('<a ', '< a h', '<A >', '<a attr=' are all valid links)
>
> _______________________________________________
> kwlug-disc mailing list
> kwlug-disc at kwlug.org
> http://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>



-- 
Khalid M. Baheyeldin
2bits.com, Inc.
Fast Reliable Drupal
Drupal optimization, development, customization and consulting.
Simplicity is prerequisite for reliability. --  Edsger W.Dijkstra
Simplicity is the ultimate sophistication. --   Leonardo da Vinci
For every complex problem, there is an answer that is clear, simple,
and wrong." -- H.L. Mencken





More information about the kwlug-disc mailing list